Sample Header Ad - 728x90

Replicating PostgreSQL data into Citus/Greenplum?

1 vote
2 answers
1047 views
I need to integrate data from 3 different PostgreSQL databases (OLTP application backends) in a data warehouse. For the data warehouse itself I consider using Citus or Greenplum. There is a requirement that the data from applications has to be synced with the data warehouse as close to real time as possible (everything above 3-5 minutes delay is unacceptable, real time replication would be the best). In this regard I have the following questions: 1. Will Postgres logical replication work with Citus? Citus is a Postgres extension, can you treat a Citus cluster as an ordinary Postgres database? If yes, then logical replication should theoretically work, but how does it deal with distributed tables? 2. Greenplum is a Postgres fork, so will Postgres logical replication work with it at all? I have also read that Greenplum is not optimized for OLTP workloads, does that mean it will break when I try to ingest OLTP data into it? 3. If logical replication does not work with Citus/Greenplum, then how to stream data from Postgres? Do I need to stream logical-level WAL into Kafka and then write custom logic for translating it into SQL statements on the target database? Are there any tools for that? Bonus question: does anyone have experience with both Citus and Greenplum, especially with their SQL limitations? I know that Citus does not fully support correlated subqueries and recursive CTEs, does Greenplum have any similar limitations? I would appreciate any help with these questions, I tried googling but there is little or no info on the subject, could you please give at least some direction?
Asked by Denis Arharov (101 rep)
Feb 4, 2021, 12:44 PM
Last activity: Jan 13, 2022, 10:15 AM