Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

1 votes

1 answers

33 views

What are recycled-commitlogs?

Cassandra/ScyllaDB I have a node with 51000+ recycled commitlogs and want to know if I can just blitz them. I am Linux not Scylla/Cassandra but these are filling up a datadisk. Normal commitlogs appear to be created and used/removed so I am assuming these are from some failed effort somewhere and ma...

                                  Cassandra/ScyllaDB

I have a node with 51000+ recycled commitlogs and want to know if I can just blitz them.

I am Linux not Scylla/Cassandra but these are filling up a datadisk.  Normal commitlogs appear to be created and used/removed so I am assuming these are from some failed effort somewhere and may be just copies of existing or commits already run.  If these are for commits which have failed and need to be re-done would a restart of the service kickstart a purge?

I cannot find anything relating to recycled versions and am hoping some useful individual out there can shed some techlight.

Hpoonis (11 rep)

Jun 24, 2025, 12:45 PM • Last activity: Jun 30, 2025, 02:35 PM

0 votes

1 answers

32 views

Unexpected Behavior with TimeWindowCompactionStrategy in ScyllaDB 6.1 Open Source

cassandra nosql scylladb

I’m using ScyllaDB 6.1 Open Source and have a table configured to store 30 days of data with the following compaction strategy: ``` compaction = { 'class': 'TimeWindowCompactionStrategy', 'compaction_window_size': '3', 'compaction_window_unit': 'DAYS', 'max_threshold': '32', 'min_threshold': '4' } `...

I’m using ScyllaDB 6.1 Open Source and have a table configured to store 30 days of data with the following compaction strategy:

compaction = {
  'class': 'TimeWindowCompactionStrategy',
  'compaction_window_size': '3',
  'compaction_window_unit': 'DAYS',
  'max_threshold': '32',
  'min_threshold': '4' 
}

## Observations Pre-7 Jan Behavior: Until January 7, the table had SSTables grouped in 3-day windows, such as (Dec 23, Dec 25, Dec 28, Dec 31, Jan 3). This aligned with the expected behavior of the configured compaction strategy. On 7 Jan: After triggering an autocompaction, a new SSTables was created for January 7 only, deviating from the 3-day window grouping behavior. ## Additional Issue Upon further investigation, I noticed that within the same 3-day window, there are multiple small SSTables instead of one large SSTable. These smaller SSTables are not being compacted into a single SSTable, even though the compaction strategy specifies min_threshold = 4 and max_threshold = 32. ## Questions - Why did the compaction on January 7 result in a new SSTable for just that day instead of following the 3-day grouping? - Why are the smaller SSTables within the same 3-day window not being compacted into a single large SSTable as expected? - Are there specific conditions under which TimeWindowCompactionStrategy skips compaction or behaves differently for insert-only workloads? - Could this behavior be linked to autocompaction triggering mechanisms or internal thresholds not accounted for in the current configuration? I’d appreciate any insights or suggestions for troubleshooting and resolving this issue. Thank you in advance! I configured the TimeWindowCompactionStrategy with a 3-day window, expecting SSTables to compact into larger ones within each window. However, after autocompaction on January 7, new SSTables was created for just that day, and multiple small SSTables remained instead of being compacted into a single larger one.

Naman kaushik (11 rep)

Jan 15, 2025, 06:03 AM • Last activity: Jan 31, 2025, 02:15 AM

1 votes

0 answers

41 views

After nodetool refresh number of records in the source is much larger than in the target instance

migration scylladb

**Restore snapshots to another cluster (migration)** **source**: Scylla v. 4.4.1 on Kubernetes, 4 nodes e.g. a1,a2,a3,a4 **target**: Scylla v. 5.4.6 - no Kubernetes installation 4 nodes e.g. b1,b2,b3,b4 **Steps**: 1. we built a cluster with the same number of nodes 2. the number of tokens is the sam...

                                  
**Restore snapshots to another cluster (migration)**

**source**:
Scylla v. 4.4.1 on Kubernetes,
4 nodes  e.g. a1,a2,a3,a4

**target**:
Scylla v. 5.4.6 - no Kubernetes installation
4  nodes  e.g. b1,b2,b3,b4

**Steps**:
1. we built a cluster with the same number of nodes
2. the number of tokens is the same on the source and on the new installation
3. we took a snapshot of the table in nodetool
4. create the table manually (on new Scylla)
5. we copied the snapshot to the new Scylla to the upload directory:
 snapshot from a1->b1, a2->b2, a3->b3, a4->b4
6. then on each node:
 nodetool refresh keyspace_name table_name -las
7. nodetool cleanup keyspace_name

  and... **the number of records in the source is much larger than in the target instance**.

Where is the mistake?

                                

ania wieczorek (11 rep)

Jun 19, 2024, 10:34 PM

1 votes

1 answers

75 views

Is there a full copy of data on each node if RF=3 and DC has 3 nodes?

backup cassandra scylladb

When backing up a Scylla cluster (or cassandra for that matter) I understand that its best practice to take a snapshot of every node. However if I have a 6 node distributed cluster spanning 2 datacenters with RF3 in each datacenter for the keyspace I want to back up would I logically be able to back...

                                  When backing up a Scylla cluster (or cassandra for that matter) I understand that its best practice to take a snapshot of every node. 

However if I have a 6 node distributed cluster spanning 2 datacenters with RF3 in each datacenter for the keyspace I want to back up would I logically be able to back up the keyspace on just one of those nodes as there is a full copy of the data on each one?

imbrian21 (11 rep)

Jun 12, 2024, 04:54 PM • Last activity: Jun 18, 2024, 03:14 AM

0 votes

1 answers

127 views

Is a multi-table batch within the same node atomic and isolated?

cassandra scylladb

Im trying to understand this. cassandra batches are atomic always, and if the batch only modifies a single partition of a single table, then that batch is also isolated. but what about multi table batches partitioned by the same key? assume this tables: ``` orders ( order_id pk, created_at, user_id...

orders (
    order_id pk,
    created_at,
    user_id
)

order_items (
    order_id,
    product_id,
    quantity,
    primary key (order_id, product_id)
)

both tables are partitioned by the same key. if i want to atomically create an order, let say order_id = 123, like this:

begin batch 
  insert into orders ... (123) 
  insert into order_items .... where order_id = 123
  insert into order_items .... where order_id = 123
apply batch

is this batch atomic and isolated ? provided that orders table partition for 123 and order_items table partition for 123 reside in the same node.

InglouriousBastard (3 rep)

May 10, 2024, 01:24 AM • Last activity: May 10, 2024, 01:47 PM

0 votes

1 answers

117 views

How can I make (game_id, user_id) unique, yet (game_id, score) indexed/clustered, in ScyllaDB?

index nosql cassandra clustered-index scylladb

See this in ScyllaDB: ``` CREATE TABLE scores_leaderboards ( game_id int, score int, user_id bigint, PRIMARY KEY (game_id, score, user_id) ) WITH CLUSTERING ORDER BY (score DESC); ``` The idea is that we can get the user IDs with the top scores for a game. This means that `(game_id, score)` needs to...

See this in ScyllaDB:

CREATE TABLE scores_leaderboards (
    game_id int,
    score int,
    user_id bigint,
    PRIMARY KEY (game_id, score, user_id)
) WITH CLUSTERING ORDER BY (score DESC);

The idea is that we can get the user IDs with the top scores for a game. This means that (game_id, score) needs to be indexed, and that's why I put it like that in the Primary Key. However, I had to include user_id, so that 2 users can have the exact same score. The problem is that, like this, (game_id, user_id) isn't unique. I want to make sure the table never contains 2+ pairs of the same (game_id, user_id). My questions: 1) **What do you suggest I can do, so that (game_id, user_id) is unique, yet (game_id, score) is indexed?** 2) Ideally, (game_id, user_id) would be the primary key, and then I'd create a compound index with (game_id, score). However, if I try to create a compound index, CREATE INDEX scores_leaderboards_idx ON scores_leaderboards (game_id, score); I get the following: > InvalidRequest: Error from server: code=2200 [Invalid query] message="Only CUSTOM indexes support multiple columns" But I'm not finding how I can create a CUSTOM index... is this an extension I need to install? Is there any recommendation against using custom indexes?

Nuno (829 rep)

Jul 8, 2023, 08:05 PM • Last activity: Jul 18, 2023, 08:08 AM

1 votes

1 answers

474 views

How do I manage Cassandra/Scylla snapshots?

backup cassandra snapshot scylladb cql

I am new to Scylla, and I'm looking to setup a proper Backup & Restore solution. I've just tested running `nodetool snapshot -t my_backup`, and see that what it does is create a snapshot folder caled `my_backup`, inside each keyspace & table folder. This causes a few limitations in my view: 1) I can...

                                  I am new to Scylla, and I'm looking to setup a proper Backup & Restore solution.

I've just tested running nodetool snapshot -t my_backup, and see that what it does is create a snapshot folder caled my_backup, inside each keyspace & table folder.

This causes a few limitations in my view:

1) I can't easily save the backups on another server, in case this specific server dies

2) I can't easily restore a backup on another server (e.g. as a daily snapshot for production support purposes/tests)

**How do DBAs normally store backups in another server and restore a whole database into another server ?**

--

Another issue I noticed is that nodetool listsnapshots doesn't seem to mention the snapshot creation date in the output.

So I don't seem to be able to find a way to purge old snapshots.

**How do I "delete snapshots older than 10 days" or "keep last 3 backups", for example?**

Nuno (829 rep)

Jun 25, 2023, 12:10 AM • Last activity: Jun 26, 2023, 03:33 PM

2 votes

2 answers

579 views

Can you restore a ScyllaDB from only a backup of /var/lib/scylla/data folder?

scylladb

I had a single node scyllaDB which was lost, but only a backup of the `/var/lib/scylla/data` folder was done. My question is, can a single node scylla be restored from only this backup? I already checked and the files don't contain any snapshots

                                  I had a single node scyllaDB which was lost, but only a backup of the /var/lib/scylla/data folder was done.

My question is, can a single node scylla be restored from only this backup? I already checked and the files don't contain any snapshots

perrohunter (121 rep)

Feb 8, 2021, 03:52 AM • Last activity: Feb 9, 2021, 06:49 AM

3 votes

2 answers

367 views

Implementating user notifications list using Aerospike

database-recommendation aerospike scylladb

I need to choose the right DB for a notifications system that needs to handle billions of notifications. The record structure is as follows: [user_id, item_type, item_id, created_at, other_data] The inserts are going to be in bulks up to hundreds of thousands at spikes. And it needs to support thous...

                                  I need to choose the right DB for a notifications system that needs to handle billions of notifications. The record structure is as follows: 

    [user_id, item_type, item_id, created_at, other_data]

The inserts are going to be in bulks up to hundreds of thousands at spikes.
And it needs to support thousands of selects per minute of this kind:

    select * from user_notifications where user_id=12345 order by created_at limit 10
    select * from user_notifications where user_id=12345 and item_type='comment' order by created_at limit 10
    --- and for the pagination next page:
    select * from user_notifications where user_id=12345 and item_type='comment' and created_at>'2020-11-01 10:50' order by created_at limit 10

It should also allow quick updates and deletes and ideally have TTL on each record.
Right now it's implemented using MySQL, we only have 400M rows and it's already slow as hell. And bulk cleanup is just impossible.

Initially, I thought ScyllaDB/Cassandra is ideal for that. If I set the primary key to be [user_id, item_type, item_id] (user_id being the partition key) for inserts/updates/deletes and [user_id, item_type, created_at] as secondary index. CQL seems straightforward in this case and it should work fast (correct me if I'm wrong). The problem is that we are Ruby-On-Rails based and there's no good ruby client library for that. The one listed in the ScyllaDB clients' list (https://github.com/datastax/ruby-driver)  is in maintenance mode and I'm not sure it will be updated with new Ruby versions, etc.

Recently, I heard about Aerospike and their benchmarks look really cool, but I couldn't figure out how to implement the above requirement using Aerospike's architecture. Especially as their secondary index seems to be always in memory which makes it impossible to index billions of rows.

This notifications schema seems to me like something very common, but still, I couldn't find a good article describing all the ideal ways to implement it. 
Any suggestions are welcome.

Thanks
                                

Kaplan Ilya (173 rep)

Dec 2, 2020, 10:15 PM • Last activity: Dec 3, 2020, 06:44 PM

2 votes

2 answers

1669 views

Correct way to perform backup of Cassandra/Scylladb

backup cassandra scylladb

What's the correct(advisable) method to backup a cassandra or scylladb so that we can restore it on development environment with ease?

                                  What's the correct(advisable) method to backup a cassandra or scylladb so that we can restore it on development environment with ease?
                                

Kokizzu (1403 rep)

Mar 2, 2017, 06:39 AM • Last activity: Feb 21, 2019, 05:28 PM

3 votes

2 answers

322 views

Allocate space in Cassandra

cassandra scylladb

We run a Cassandra cluster with 2 DC, 3 nodes each (2 RF). The state of the cluster is quite bad (never repaired) and the disks were getting full so we added additional nodes which after the bootstrap procedure joined successfully the cluster. According to the Cassandra documentation is supposed to...

                                  We run a Cassandra cluster with 2 DC, 3 nodes each (2 RF). The state of the cluster is quite bad (never repaired) and the disks were getting full so we added additional nodes which after the bootstrap procedure joined successfully the cluster. 
According to the Cassandra documentation is supposed to run cleanup after adding a new node, but in our case:

Since we never run repair (and now it too late to run since disk usage in old nodes is > 90 %) Is it safe for the data to run:

    nodetool cleanup

in each node in order to free space?

Cheers,

Jbrl

Jibrilat (131 rep)

Dec 12, 2018, 02:29 PM • Last activity: Feb 18, 2019, 04:39 AM

Showing page 1 of 11 total questions