Is Big data solutions a good option for 500 million temporal records?
1
vote
0
answers
107
views
So I have about 250k items x 2000 days (2010 to 2019) ~= 500 million records.
For each item I have variable number of fields. To begin with we have 50 fields defined for each item but in the future we want the ability to add more fields.
With that said, we thought we could pack all of the fields of each item in a JSON blob. Postgres allows us to query the JSON blobs. We did some prototyping and it took around 3 seconds for the QC queries on about 1 year of data with a smaller subset of items.
Since we are in Prototyping stage, I was thinking of building a Prototype using HBase. Since writing a spark job to bulk load data into it would take atleast me a couple of days I wanted to check online if HBase is a good option for this problem.
Also are there any other RDBMS solutions we should consider for this problem. MongoDB is a no-go because of some bureaucratic reasons.
PS: our QC queries are mostly, given a date get me all the items. Given an item get me all the data available for that item on all dates. Occasionally get me all the data for a particular field for a particular item.
Asked by Aditya
(123 rep)
Feb 13, 2019, 09:11 PM
Last activity: Feb 13, 2019, 09:29 PM
Last activity: Feb 13, 2019, 09:29 PM