I am not a database administrator, but only a scientist who would appreciate your help in solving my issue with storing histogram data in an SQLite database.
I have several of them to be stored and to be later analysed with pandas (python).
Each histogram is made by two arrays,
1. one for the bins or buckets that are regularly spaced, let's say from min to max with a given step.
2. one for the values.
First question: how would you store the two arrays? They are rather long, up to 65k. I don't need to store the bin values, I can in principle recalculate them having the min, max and step. The value array may have several zeros, so it may be convenient to store them sparsely.
Second question: I would like to retrieve them with a select returning something like:
bin1, value1
bin2, value2
...
binN, valueN
Sorry if my questions is looking too stupid to you, but I'm scratching my head with this problem since too long without finding any way out.
Thanks in advance for your help!
## Update
As a preliminary, not really disk space effective solution, I have implemented something like the suggestion of @Whitel Owl. Instead of storing the two arrays as text, I'm storing them as binary BLOBs.
HEre is my code:
CREATE TABLE HistogramTable (
HistogramID as INTEGER PRIMARY KEY,
ImageID as INTEGER,
Bins as BLOB,
Histo as BLOB,
FOREIGN KEY ImageID REFERENCE ImageTable(ImageID)
);
To get the two blobs I'm using pickle.
import pickle
import sqlite3
import numpy as np
db.connect('mydb.db')
histo, bins = np.histogram(data)
histo_blob = sqlite3.Binary(pickle.dumps(histo))
bins_blob = sqlite3.Binary(pickle.dumps(bins))
Asked by user41796
(111 rep)
Jun 28, 2023, 08:35 PM
Last activity: Jul 5, 2025, 07:06 AM
Last activity: Jul 5, 2025, 07:06 AM