Sample Header Ad - 728x90

Postgres JSONs transformation

0 votes
1 answer
111 views
I would very appreciate your help here, an important question about architecture modification. - We use the ETL process to fetch data from external services (for example, Github). - Extract the data (e.g. issues and PRs) and create raw_data objects - Transform the data to our unique objects - let's call them assets. - Load the assets as JSONs (jsonb field) to postgres DB (general assets table). We have a few problems with this approach and now we are considering changing our pipeline. **Bidirectional asset connection**: our assets have a direct relationship between them, for example, every user has groups, and every group has users. Currently, we manually fill the data from one of the sides again and again (before loading the data to Postgres). We manage this in-memory (scale has entered the meeting) **Very slow analysis**: we need the ability to present the whole JSON, but we also need to make an analysis between all of the assets. The problem is that if we have a lot of assets (a lot of JSONs inside of the assets table), the analysis is very slow. For example, 'find all issues that are related to the pull request', in this case, we need to iterate over all of the internal JSONs and search with regex on a specific field. What would you recommend in this case? Suggestions: **Manage it in Postgres**, use functions to convert JSONs into tables, and create triggers to fill the Bidirectional relation. **Data warehouse**? I lack knowledge on the subject, but in general not sure it will be ideal, first of all, because of OLAP vs OLTP, we need to make analysis but also the possibility to present the whole rows. And how would we fill the bidirectional connection? **GraphDB**? Sounds ideal for the bidirectional asset, but not sure about scaling problems. What do you think guys? What would you do in this case?
Asked by Ewen Field (1 rep)
Feb 28, 2023, 11:15 AM
Last activity: Mar 1, 2023, 10:17 AM