Optimizing Cassandra Table Design for Queries Involving Two Partition Keys

0 votes
1 answer
268 views
                          I’m designing a Cassandra table and need advice on the best schema design to handle my query patterns efficiently. The primary queries are as follows:

1.	90% of the queries involve both account_id and actor_id.
2.	10% of the queries involve only account_id.

Given these requirements, I initially thought of using both account_id and actor_id as partition keys. Here are my specific questions and concerns:

Current Considerations and Schema

**Option 1**: Composite Partition Key

    CREATE TABLE search_table (
        account_id UUID,
        actor_id UUID,
        other_columns ...,
        PRIMARY KEY ((account_id, actor_id))
    );

•	Pros: Efficient for queries with both account_id and actor_id.

•	Cons: Inefficient for queries with only account_id since it requires scanning multiple partitions.

**Option 2**: Single Partition Key with Clustering Column

    CREATE TABLE search_table (
        account_id UUID,
        actor_id UUID,
        other_columns ...,
        PRIMARY KEY (account_id, actor_id)
    );



 - Pros: Efficient for both types of queries without requiring ALLOW
   FILTERING
 - Cons: Risk of very large partitions due to billions of records.

**Considering Bucketing**

To manage partition sizes effectively, I considered using bucketing:j

    CREATE TABLE search_table (
        bucket_id INT,
        account_id UUID,
        actor_id UUID,
        other_columns ...,
        PRIMARY KEY ((bucket_id, account_id), actor_id)
    );

•	Pros: Balanced partition sizes, efficient for both query types.

•	Cons: Requires calculating bucket_id in application logic and might add complexity.

**Using IN Clause for Efficient Queries**

If both account_id and actor_id are partition keys:

    SELECT * FROM search_table 
    WHERE account_id = ? AND actor_id IN (?, ?, ...);

•	Pros: Simplifies querying multiple actor_id values within the same account_id.

•	Cons: Performance overhead if the list of actor_id values is large.

Questions

1.	What is the best table design to handle my query patterns efficiently?
2.	Is using both account_id and actor_id as partition keys recommended, or should I rely on a different approach like bucketing?
3.	How do IN queries perform in Cassandra when used with partition keys, and are there practical limits on their usage?

Any insights, recommendations, or alternative strategies to manage partition sizes while maintaining efficient queries would be greatly appreciated!
                        
Asked by Chillax (131 rep)
May 22, 2024, 06:55 PM
Last activity: Jun 6, 2025, 04:04 AM
Optimizing Cassandra Table Design for Queries Involving Two Partition Keys

Related Questions