Replicating PostgreSQL data into Citus/Greenplum?
1
vote
2
answers
1047
views
I need to integrate data from 3 different PostgreSQL databases (OLTP application backends) in a data warehouse. For the data warehouse itself I consider using Citus or Greenplum. There is a requirement that the data from applications has to be synced with the data warehouse as close to real time as possible (everything above 3-5 minutes delay is unacceptable, real time replication would be the best). In this regard I have the following questions:
1. Will Postgres logical replication work with Citus? Citus is a Postgres extension, can you treat a Citus cluster as an ordinary Postgres database? If yes, then logical replication should theoretically work, but how does it deal with distributed tables?
2. Greenplum is a Postgres fork, so will Postgres logical replication work with it at all? I have also read that Greenplum is not optimized for OLTP workloads, does that mean it will break when I try to ingest OLTP data into it?
3. If logical replication does not work with Citus/Greenplum, then how to stream data from Postgres? Do I need to stream logical-level WAL into Kafka and then write custom logic for translating it into SQL statements on the target database? Are there any tools for that?
Bonus question: does anyone have experience with both Citus and Greenplum, especially with their SQL limitations? I know that Citus does not fully support correlated subqueries and recursive CTEs, does Greenplum have any similar limitations?
I would appreciate any help with these questions, I tried googling but there is little or no info on the subject, could you please give at least some direction?
Asked by Denis Arharov
(101 rep)
Feb 4, 2021, 12:44 PM
Last activity: Jan 13, 2022, 10:15 AM
Last activity: Jan 13, 2022, 10:15 AM