More efficient accumulator in SQL?

3 votes

2 answers

334 views

I'm writing a ledger system where every transaction can have multiple classifications. For example, if someone purchases a widget for $50, I can categorize that transaction as having an account of "Revenue" and an SKU as "SKU1". Users can then select the dimensions they wish to report on, and I can generate aggregates. When my database has 10M+ transactions, the following query is prohibitively slow. After about 10s I receive a Memory limit exceeded error on my 8GB laptop. Thus the question: I don't actually care about the individual rows, I only care about the accumulation of these values. In my test, I only expect about 10 rows returned after aggregation. Here is a fiddle: http://sqlfiddle.com/#!17/4a7d8/10/0

select
   year,
   sum(amount),
   t1.value as account,
   t2.value as sku
from 
    transactions 
left join
    tags t1 on transactions.id = t1.transaction_id and t1.name ='account'
left join
    tags t2 on transactions.id = t2.transaction_id and t2.name = 'sku'
group by
    year,
	t1.value,
    t2.value;

Here is the query plan:

Expression ((Projection + Before ORDER BY))
  Aggregating
    Expression (Before GROUP BY)
      Join (JOIN)
        Expression ((Before JOIN + (Projection + Before ORDER BY)))
          Join (JOIN)
            Expression (Before JOIN)
              ReadFromMergeTree (default.transactions)
            Expression ((Joined actions + (Rename joined columns + (Projection + Before ORDER BY))))
              ReadFromMergeTree (default.tags)
        Expression ((Joined actions + (Rename joined columns + (Projection + Before ORDER BY))))
          ReadFromMergeTree (default.tags)

And, finally, here is the schema:

CREATE TABLE default.transactions
(
    id Int32,
    date Date,
    amount Float32
)
ENGINE = MergeTree
PRIMARY KEY id
ORDER BY id
SETTINGS index_granularity = 8192

CREATE TABLE default.tags
(
    transaction_id Int32,
    name String,
    value String,
    INDEX idx_tag_value value TYPE set(0) GRANULARITY 4,
    INDEX idx_tag_name name TYPE set(0) GRANULARITY 4
)
ENGINE = MergeTree
PRIMARY KEY (transaction_id, name)
ORDER BY (transaction_id, name)
SETTINGS index_granularity = 8192

My questions are: - Is there a different schema, or different set of Clickhouse features I might use? - Should I instead pre-compute aggregates? - Is there a different DB which can perform this kind of calculation more efficiently?

Asked by poundifdef (141 rep)

Jun 10, 2022, 01:15 PM
Last activity: Mar 11, 2025, 05:02 AM

More efficient accumulator in SQL?

Related Questions