Fast DB for replacement of flat files that are hard to work with
0
votes
1
answer
42
views
Right now I have a python script that caches data to the file system for use in reinforcement learning. This works fine for small scale but as everything gears up, it is becoming a hassle.
A bit about the data structure:
- There are currently about 6M records, with the growth to about 20M records.
- The files are standalone, they don't reference anything and they are unrelated to each other.
- Each file is roughly 6kb.
My needs:
- A key/value store. Each data piece can stand on it's own.
- Read speed is very important as the faster these can be accessed the faster my ML code can train.
- Portability. As I move from prototype to more decentralized compute, moving, syncing and accessing these files will become a giant pain.
What I've tried:
- Postgresql with single JSONB columns, this was pretty slow on the read speeds.
- Cassandra, faster on reads than PG.
- Redis, the fastest of what I have tried, but still pretty slow relative to the flat files.
Any suggestions on what can be a very fast read for a key/value store that is portable?
There are roughly 10,000,000 files. They are not of uniform structure. Some are arrays, others are hashes. No real way to normalize the data as the structure is inconsistent.
Asked by Romuloux
(101 rep)
Jan 14, 2025, 03:53 PM
Last activity: Jan 16, 2025, 11:46 AM
Last activity: Jan 16, 2025, 11:46 AM