Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
0
votes
1
answers
72
views
Database Design for a Furniture Store E-commerce Site
I'm building my first web project: a modular furniture e-commerce platform using Next.js and PostgreSQL. I’ve designed a normalized relational schema to handle: - Categories and subcategories - Furniture sets and individual furniture items - Color variants for both items and sets - Dynamic propertie...
I'm building my first web project: a modular furniture e-commerce platform using Next.js and PostgreSQL.
I’ve designed a normalized relational schema to handle:
- Categories and subcategories
- Furniture sets and individual furniture items
- Color variants for both items and sets
- Dynamic properties (e.g., brand, dimensions, material)
- Multiple images per furniture and set
- Set-item relationships with quantity and order
Here’s a summary of the main tables:
-
furnitures
, furniture_sets
, furniture_sets__and_furniture
- furniture_colors
, furniture_set_colors
, colors
- furniture_images
, furniture_set_images
, images
- furniture_properties
, furniture_set_properties
, properties
- categories
(with hierarchy support via parent_id
and category_path
)
My concern is that this structure may be **too complex or over-normalized** for an e-commerce project. Since this is my first full-stack application, I want to ensure the schema won’t cause unnecessary complexity or performance issues down the line.
### My questions:
1. Does this level of normalization make sense for an e-commerce project, or should I simplify it?
2. Would storing some dynamic fields (like furniture_properties
) as JSONB be better?
3. Are there common best practices or anti-patterns I should be aware of for this type of use case?
ASINCAP
(9 rep)
Jul 5, 2025, 01:46 PM
• Last activity: Jul 7, 2025, 07:22 AM
2
votes
1
answers
43
views
Choosing the right database for filtering purposes
Assuming we want to build a shop like Amazon with millions of products in dozens of different categories, where each category has its own specific properties but also shares some common properties, I am wondering: is a SQL-based database like `MySQL or PostgreSQL` a better choice, or should I go wit...
Assuming we want to build a shop like Amazon with millions of products in dozens of different categories, where each category has its own specific properties but also shares some common properties, I am wondering: is a SQL-based database like
MySQL or PostgreSQL
a better choice, or should I go with a NoSQL solution like Cassandra combined with Elasticsearch
?
I’m not a professional web developer but I’m working on an MVP app for a startup and would like to understand which option would be better.
I have more experience with Cassandra and its denormalization concept, and I also know a bit about Elasticsearch, but I’m wondering if I might be going in the wrong direction if using PostgreSQL or MySQL would be a better option.
**My main concern is to have a home screen like Amazon’s, where users can filter products by one, two, or even ten different properties.**
For example, they should be able to choose to see only products from categories A and C, or see only products from category B that are within a specific price range and have a red color and will be shipped tomorrow, etc. etc.
user3486308
(151 rep)
Jun 12, 2025, 01:12 AM
• Last activity: Jun 16, 2025, 12:45 PM
1
votes
2
answers
333
views
How do I resolve having a M:N relationship in my Crow's Foot Model? Am I on the right track?
I am very much a beginner and taking a class on databases. My assignment involves using Crow's Foot notation to represent the following tables: [![enter image description here][1]][1] This is my design for it so far. I am nowhere near done but am I at least on the right track? I would have to use a...
I am very much a beginner and taking a class on databases. My assignment involves using Crow's Foot notation to represent the following tables:
This is my design for it so far. I am nowhere near done but am I at least on the right track? I would have to use a composite entity to resolve the M:N relationships shown with these tables, right?
(Those M:N relationships being Employee to Plan and Benefit to Plan)
Please be forgiving of my work haha.
I really appreciate all of the help and insight!
Hope you all have a good one wherever you are!


RagingCalm
(11 rep)
Jun 19, 2021, 08:48 AM
• Last activity: May 1, 2025, 10:02 PM
0
votes
3
answers
1186
views
Overwriting MySQL database to only store 1 month of data
We are logging data on hardware with only small memory (short on disk storage) only 4GB. We only require the data to be stored for 1 month and then be over written in a way that it overrides older data first. The memory on hardware is very small so cannot continue to record indefinitely. We are usin...
We are logging data on hardware with only small memory (short on disk storage) only 4GB.
We only require the data to be stored for 1 month and then be over written in a way that it overrides older data first. The memory on hardware is very small so cannot continue to record indefinitely.
We are using a MySQL data base, the hardware it is running on is not always powered on as it is in a vehicle. The data will be viewed in a graph to show historical data over time.
A few options I have thought of but not sure how to execute it:
Let’s assume I will record 1 million rows of data in a month
When the table (table1) gets to 1 million rows, move this table to another and start new table (table2). When table2 reaches 1 million rows. Delete table1, move table2 to new table and create table3 etc...
This way there will be minimum 1 month of entries.
Second option (not sure if possible):
When the table gets to 1 million rows it starts to override from row 1 again.
Phil
(1 rep)
Jan 29, 2020, 09:09 PM
• Last activity: Apr 14, 2025, 03:00 AM
1
votes
1
answers
3390
views
Sql Server - Best Practices for Growing Large Database Files
I know this kind of question has been asked previously but I don't get the answer that I'm looking for. I'm reconsidering setting up Autogrowth for DB Primary & Data files for Log(partitioned table) and Log Files. I am going through some articles and what I conclude is: Microsoft is suggesting ***Au...
I know this kind of question has been asked previously but I don't get the answer that I'm looking for.
I'm reconsidering setting up Autogrowth for DB Primary & Data files for Log(partitioned table) and Log Files. I am going through some articles and what I conclude is:
Microsoft is suggesting ***Autogrowth to 1/8th of file size which is 12.5%*** on another hand Brent Ozar and others are suggesting ***256MB for data files and 128MB for log files***.
Now, I'm confused as my main database file is approximate 545GB and Logfile is around 330GB, and there will be big difference (either too big or too small) if I chose any option.
Currently, Autogrowth is set to 500MB for main data file, Logfile is set to 10% and for other NDF files Autogrowth is set to 5MB and 64MB
What would you recommend and how can I calculate it?
Hakoo Desai
(111 rep)
Jul 10, 2023, 02:33 AM
• Last activity: Apr 13, 2025, 12:02 AM
1
votes
2
answers
551
views
Hierarchical data with versioning
I have a domain where I deal with data which has a parent/child relationship, of arbitrary depth. Also, full time traversal needs to be enabled, to show the state of the data at a specific point in time. Currently I am looking at two different types of databases, and was curious as to which would be...
I have a domain where I deal with data which has a parent/child relationship, of arbitrary depth.
Also, full time traversal needs to be enabled, to show the state of the data at a specific point in time.
Currently I am looking at two different types of databases, and was curious as to which would be more appropriate, or how I would overcome some issues in either of the solutions.
Any links to articles regarding this information would also be appreciated.
Solution 1: rdbms
+ good fit for time versioning with a solution like temporal tables in sql server (although the db needs to be open source and free, so I would have to add this to something like PostgreSQL)
- not that great a fit for arbitrary hierarchical data; need to implement adherent list/nested set
Solution 2: graphdb
+ natural fit for hierarchical data
+ I don’t know a proper way to model time versioning in a way which is performant.
So I am looking on some feedback on the advantages/disadvantages of either database type, and how to overcome some of the shortcomings of either.
I personally was leaning towards an graphdb solution, where I add a start_time and end_time attribute to all nodes and relationships, but I am not sure about the performance and if there are any better ways to get time versioning.
The two main considerations are of course performance and simplicity of the query.
I realize this is a rather open question; I am merely looking to see if I am overthinking this or perhaps fail to take some other positives/negatives inti account.
moi2877
(19 rep)
Apr 17, 2020, 07:06 AM
• Last activity: Mar 7, 2025, 04:06 AM
0
votes
1
answers
41
views
Fast DB for replacement of flat files that are hard to work with
Right now I have a python script that caches data to the file system for use in reinforcement learning. This works fine for small scale but as everything gears up, it is becoming a hassle. A bit about the data structure: - There are currently about 6M records, with the growth to about 20M records. -...
Right now I have a python script that caches data to the file system for use in reinforcement learning. This works fine for small scale but as everything gears up, it is becoming a hassle.
A bit about the data structure:
- There are currently about 6M records, with the growth to about 20M records.
- The files are standalone, they don't reference anything and they are unrelated to each other.
- Each file is roughly 6kb.
My needs:
- A key/value store. Each data piece can stand on it's own.
- Read speed is very important as the faster these can be accessed the faster my ML code can train.
- Portability. As I move from prototype to more decentralized compute, moving, syncing and accessing these files will become a giant pain.
What I've tried:
- Postgresql with single JSONB columns, this was pretty slow on the read speeds.
- Cassandra, faster on reads than PG.
- Redis, the fastest of what I have tried, but still pretty slow relative to the flat files.
Any suggestions on what can be a very fast read for a key/value store that is portable?
There are roughly 10,000,000 files. They are not of uniform structure. Some are arrays, others are hashes. No real way to normalize the data as the structure is inconsistent.
Romuloux
(101 rep)
Jan 14, 2025, 03:53 PM
• Last activity: Jan 16, 2025, 11:46 AM
0
votes
1
answers
61
views
Most resilient RDMS DB to unexpected power losses?
If we need to choose a RDMS option to run, not in a server environment, but somewhere (not even a building) where power can go off at any moment a few times a day, what options would be the best? Also if that system would run on a CoW FS like ZFS or BTRFS, would that change anything? The hardware wi...
If we need to choose a RDMS option to run, not in a server environment, but somewhere (not even a building) where power can go off at any moment a few times a day, what options would be the best? Also if that system would run on a CoW FS like ZFS or BTRFS, would that change anything?
The hardware will be running unattended, so my biggest priority is returning to a safe working state after sudden power loss without admin intervention. In the other hand a few seconds or even minutes of data loss before the power loss would be no problem in this application.
Of course the best solution would be to use a UPS, but that's not always possible. The reasons for UPS being a problem can be space, weight, costs, regulations...
Héctor
(307 rep)
Dec 23, 2024, 02:29 PM
• Last activity: Dec 24, 2024, 08:17 PM
3
votes
2
answers
2960
views
Choosing the right database for stock price history
The model is `[(stock_id, period, ts), open, high, low, close, last, volume]` We write new prices for all stocks (120,000) each minute and delete old once when they go out of retention time. It doesn't matter if retention cleanup will happen automatically or we'll do daily cleanup process. Periods a...
The model is
[(stock_id, period, ts), open, high, low, close, last, volume]
We write new prices for all stocks (120,000) each minute and delete old once when they go out of retention time. It doesn't matter if retention cleanup will happen automatically or we'll do daily cleanup process.
Periods are 1 minute, 10 minutes, 1 day, 7 days and about 1,000 to 10,000 last points data retention for each period.
Currently there are about 200,000,000 rows (40GB) in postgres table and the performance is sometimes bad, especially if autovacuum is triggered.
When we query we usually pull the whole period to show chart, but sometimes we access specific dates.
We try to understand if there are some optimizations that can be done in postgresql itself or maybe some other database will work better.
NoSQL databases that I consider for testing are MongoDB and Aerospike with the following document model. The document key would be stock_id in this case.
{
1111: {
"symbol": "aapl",
"hist_1m": {
12334: {
"last": 123.1,
"close": 123.2,
...
}, ...
},
"hist_10m": {...},
"hist_1d": {...},
"hist_7d": {...},
},
2222: {...}
But I'm really not sure about performance of such model where each sub-hash will be 1000 to 10,000 hashes or maybe even more in the future. In aerospike there's per-map max size of write-block-size
(1mb default), in Mongo the limit per document is 16mb but it doesn't explain the performance.
How fast or efficient are individual additions to large collections in MongoDB or Aerospike? Are they happen in-place or require loading whole collection in memory and rewriting it back to disk, like it would be with postgresql jsonb column?
In postgres we just do thousands of inserts and it's very fast. The performance issue happens because of nature of infinite insert/delete - gaps in the tablespace and autovacuum. Also very it takes quite a long time to do global backfills.
I even thought about timeseries DBs like Prometheus or InfluxDB but really don't think they're designed for realtime high-load querying.
Please suggest in which database/model you think is ideal for this purpose.
I searched for existing question with the same requirement (as I think thousands of systems store similar historical data in some way).
Thanks.
Kaplan Ilya
(173 rep)
Jun 16, 2022, 01:22 PM
• Last activity: Dec 21, 2024, 09:03 AM
0
votes
0
answers
37
views
Data management pattern for temporarily storing/retrieving excessive quantities of data
I have been tasked with creating an application to ingest raw binary data from a software-defined radio (SDR) and display it in a web frontend. One of the purposes of this app is to allow a user to take a "capture" from a radio, allowing them to download a short historical period of raw data. I am s...
I have been tasked with creating an application to ingest raw binary data from a software-defined radio (SDR) and display it in a web frontend. One of the purposes of this app is to allow a user to take a "capture" from a radio, allowing them to download a short historical period of raw data. I am struggling with the best way to store and process this data.
The goal of this app is to be able to demonstrate frequencies well into the megahertz, requiring sample rates that could approach the gigahertz range. Given that a single complex sample requires eight bytes, the rate of ingest could easily surpass 1 GB/s. This will stretch the memory of my program. I believe it is far too great of a data rate to write to simply write to disk.
I need some advice on database systems to manage this high-volume data pattern. The important part is that no data analysis is necessary (all data is strictly binary), and some delay in data recall can be tolerated.
**Edit:** I realized that I should clarify better how I am thinking of storing the data. Here's the preconditions and my thoughts:
* Binary data is received incrementally from the radio.
* Data needs to be able to be queried in some way related to time. However, this does not *necessarily* mean that timestamps need to be stored. If there is little to no data loss, then the size of a required return payload can be deduced based on the number of samples requested (since each sample is known to be eight bytes).
* Rather than exact timing, the important part is that data is stored in a continuous, non-lossy manner, to produce accurate waveforms. For example, supposing that data is stored in 1GB incremental records, the system can approximate how many records are needed for one request, and then trim the data server-side.
* The data can be regularly flushed once a certain capacity is reached, if this is beneficial to performance. Captures need not be more than 10GB each.
* Data will be associated relationally with a radio, whether this be through a table-based structure or simply an ID. (this is the easy part)
moltarze
(101 rep)
May 21, 2024, 06:12 PM
• Last activity: May 22, 2024, 03:56 PM
1
votes
2
answers
981
views
What database to store 20~ billions rows
I plan on archiving messages, and I'm not sure which I should use. It should be able to hold a maximum of 20 billions rows (that's what I expect the total number to tend to) Each row will have three columns : user_id, message, date The user_id is a string of 30 characters. The message is anywhere be...
I plan on archiving messages, and I'm not sure which I should use.
It should be able to hold a maximum of 20 billions rows (that's what I expect the total number to tend to)
Each row will have three columns : user_id, message, date
The user_id is a string of 30 characters. The message is anywhere between 1 - 20 000 characters. On average I expect it to be 140 characters. (UTF-8, it should allow emojis, different alphabets, etc)
I only want an index for the user_id, not for the message/date.
I only plan on doing INSERT queries, and very simple SELECT * WHERE user_id = XXXXX
There will be very little SELECTs, I expect at its peak 10 / minute.
The SELECTs don't need to be fast, anything between 1-20 seconds is fine.
But there will be a lot of INSERTs. Probably about 5000-10000 per seconds.
My server will have:
CPU: AMD Ryzen™ 9 7950X3D
RAM: 128 GB DDR5 ECC
Drives: 1x 7.68 TB NVMe SSD Datacenter
(From Hetzner)
lyeaf
(307 rep)
Nov 27, 2023, 06:59 PM
• Last activity: Nov 28, 2023, 01:51 PM
260
votes
6
answers
481294
views
How large should be mysql innodb_buffer_pool_size?
I have a busy database with solely InnoDB tables which is about 5GB in size. The database runs on a Debian server using SSD disks and I've set max connections = 800 which sometimes saturate and grind the server to halt. The average query per second is about 2.5K. So I need to optimize memory usage t...
I have a busy database with solely InnoDB tables which is about 5GB in size. The database runs on a Debian server using SSD disks and I've set max connections = 800 which sometimes saturate and grind the server to halt. The average query per second is about 2.5K. So I need to optimize memory usage to make room for maximum possible connections.
I've seen suggestions that innodb_buffer_pool_size should be up to %80 of the total memory. On the other hand I get this warning from tuning-primer script:
Max Memory Ever Allocated : 91.97 G
Configured Max Per-thread Buffers : 72.02 G
Configured Max Global Buffers : 19.86 G
Configured Max Memory Limit : 91.88 G
Physical Memory : 94.58 G
Here are my current innodb variables:
| innodb_adaptive_flushing | ON |
| innodb_adaptive_hash_index | ON |
| innodb_additional_mem_pool_size | 20971520 |
| innodb_autoextend_increment | 8 |
| innodb_autoinc_lock_mode | 1 |
| innodb_buffer_pool_instances | 1 |
| innodb_buffer_pool_size | 20971520000 |
| innodb_change_buffering | all |
| innodb_checksums | ON |
| innodb_commit_concurrency | 0 |
| innodb_concurrency_tickets | 500 |
| innodb_data_file_path | ibdata1:10M:autoextend |
| innodb_data_home_dir | |
| innodb_doublewrite | ON |
| innodb_fast_shutdown | 1 |
| innodb_file_format | Antelope |
| innodb_file_format_check | ON |
| innodb_file_format_max | Antelope |
| innodb_file_per_table | ON |
| innodb_flush_log_at_trx_commit | 2 |
| innodb_flush_method | O_DIRECT |
| innodb_force_load_corrupted | OFF |
| innodb_force_recovery | 0 |
| innodb_io_capacity | 200 |
| innodb_large_prefix | OFF |
| innodb_lock_wait_timeout | 50 |
| innodb_locks_unsafe_for_binlog | OFF |
| innodb_log_buffer_size | 4194304 |
| innodb_log_file_size | 524288000 |
| innodb_log_files_in_group | 2 |
| innodb_log_group_home_dir | ./ |
| innodb_max_dirty_pages_pct | 75 |
| innodb_max_purge_lag | 0 |
| innodb_mirrored_log_groups | 1 |
| innodb_old_blocks_pct | 37 |
| innodb_old_blocks_time | 0 |
| innodb_open_files | 300 |
| innodb_purge_batch_size | 20 |
| innodb_purge_threads | 0 |
| innodb_random_read_ahead | OFF |
| innodb_read_ahead_threshold | 56 |
| innodb_read_io_threads | 4 |
| innodb_replication_delay | 0 |
| innodb_rollback_on_timeout | OFF |
| innodb_rollback_segments | 128 |
| innodb_spin_wait_delay | 6 |
| innodb_stats_method | nulls_equal |
| innodb_stats_on_metadata | ON |
| innodb_stats_sample_pages | 8 |
| innodb_strict_mode | OFF |
| innodb_support_xa | ON |
| innodb_sync_spin_loops | 30 |
| innodb_table_locks | ON |
| innodb_thread_concurrency | 4 |
| innodb_thread_sleep_delay | 10000 |
| innodb_use_native_aio | ON |
| innodb_use_sys_malloc | ON |
| innodb_version | 1.1.8 |
| innodb_write_io_threads | 4 |
A side note that might be relevant: I see that when I try to insert a large post (say over 10KB) from Drupal (which sits on a separate web server) to database, it lasts forever and the page does not return correctly.
Regarding these, I'm wondering what should be my innodb_buffer_pool_size for optimal performance. I appreciate your suggestions to set this and other parameters optimally for this scenario.
alfish
(3004 rep)
Oct 21, 2012, 03:03 PM
• Last activity: Jun 21, 2023, 11:54 PM
-2
votes
1
answers
108
views
Is there any database uses lzma?
according to [this article](https://www.percona.com/blog/evaluating-database-compression-methods/), lzma has a great performance on compression ratio. I think lzma database is a good choice if I don't care about compression speed, so is there any database that uses it?
according to [this article](https://www.percona.com/blog/evaluating-database-compression-methods/) , lzma has a great performance on compression ratio. I think lzma database is a good choice if I don't care about compression speed, so is there any database that uses it?
destination
(1 rep)
May 25, 2023, 02:06 AM
• Last activity: May 25, 2023, 12:51 PM
1
votes
0
answers
187
views
Are there any databases that properly support sum types/tagged unions?
I want to model various different events. Every event happens at a specific time, but other than that, each event needs to have different associated data, depending on the type of event. In Rust, I can easily model this using a sum type (or tagged union if you will): ```rust // Pretend these aren't...
I want to model various different events. Every event happens at a specific time, but other than that, each event needs to have different associated data, depending on the type of event. In Rust, I can easily model this using a sum type (or tagged union if you will):
// Pretend these aren't empty...
struct Datetime;
struct Place;
struct Event {
datetime: Datetime,
kind: EventKind,
}
enum EventKind {
Birth {
place: Place,
// Could have additional data, for example to signify if it was a C-section.
},
NameChange {
new_name: String,
},
Death {
place: Place,
// Could have additional data, for example to signify cause of death.
},
// There might be more variants...
}
These are just some examples. There could be a lot more events with other kinds of associated data. I would want the system to be able to be expanded in the future with more kinds of events, with arbitrary associated data.
I'm looking for a database that can handle this kind of data in a proper way. Specifically, it should ensure that a certain kind of event always has the required associated data. It should also make it easy to query all events simultaneously, to iterate over all the various events without having to do multiple queries. Basically in order to deserialize whatever format the database stores the data in to the Event
struct in Rust.
Are there any production-grade databases that can do this? I've thought about trying to model this with SQL, but [you need to jump through all kinds of hacks](https://stackoverflow.com/questions/1730665/how-to-emulate-tagged-union-in-a-database) to make it work and even then it's not easy to, for instance, iterate over all events in a single query.
PostgreSQL has an inheritance feature which at first sounds a little promising, but inheritance is not the same as a sumtype and [there are caveats with using PostgreSQL inheritance](https://www.postgresql.org/docs/current/ddl-inherit.html#DDL-INHERIT-CAVEATS) .
I've thought about NoSQL JSON-based databases. I don't have a lot of experience with these, so I can't comment too much, but my fear is that it would be hard to statically guarantee that an event always has the associated data (and nothing more) since the data is inherently unstructured in such databases (as far as I understand).
Are there any databases or methods that support this kind of data "natively"?
Victor Nordam Suadicani
(11 rep)
May 13, 2023, 05:16 PM
4
votes
3
answers
2273
views
Choosing relational DB for a small virtual server with 1Gb RAM
I am trying to find a decent relational database that would run on a small server, would be easy to administer and have much community love. I need a lightweight relational database for my personal Ubuntu server with only 1Gb of RAM. It would be used for occasional reads and writes, so high performa...
I am trying to find a decent relational database that would run on a small server, would be easy to administer and have much community love.
I need a lightweight relational database for my personal Ubuntu server with only 1Gb of RAM. It would be used for occasional reads and writes, so high performance is not required.
I considered popular options like MySQL and PostgreSQL, but those consume too much resources, even MariaDB may be too heavy since other projects should run on my server too.
I considered Firebird, but it turned out to be not very intuitive for me and troubleshooting is difficult without much community-generated guides.
I'm wondering if it's worth learning a niche database like Firebird for small pet projects. Are there any alternatives that require minimal RAM, have intuitive administration, drivers and ORM, and community support?
Artem Sevos
(59 rep)
Apr 7, 2023, 10:34 AM
• Last activity: Apr 8, 2023, 09:03 AM
0
votes
1
answers
231
views
Best database solution for highly interconnected entities?
I'm building a service (or rather a set of microservices) to serve as the backend for a social-network-like website. In short that means the following for my data: - millions of entities - with dozens of attributes - with potentially thousands of connections between the entities that change over tim...
I'm building a service (or rather a set of microservices) to serve as the backend for a social-network-like website. In short that means the following for my data:
- millions of entities
- with dozens of attributes
- with potentially thousands of connections between the entities that change over time (like on Facebook, someone might have thousands of "friends").
- (there is more than one type of connection, each potentially numbering in the thousands)
- connection examples:
- entity A knows entity B
- entity A has blocked entity B
- etc.
- conceptually each entity maintains long lists of identifiers of other entities
- and I need to be able to do searches where
- I can search all entities based on a set of attributes to match
- while filtering out already existing connections of the initiating entity
I'm trying to figure out what would be the best database solution to store this data. I'm not well-versed in database tech, so I need some suggestions to consider.
I know SQL / relational databases can easily scale for the first 2 criteria (number of entities and number of attributes), but I'm not sure how well suited they are for managing the connections.
I need a suitable database tech, which can also be set up in a distributed fashion - and ideally available in a cloud environment. If that is a SQL database, how would I store and manage the connections?
Babszem
(3 rep)
Mar 26, 2023, 05:23 PM
• Last activity: Mar 28, 2023, 07:21 AM
1
votes
2
answers
212
views
What database product would support 8000 columns, unlimited row size and no character limit on text fields?
Imagine a spreadsheet that's as wide in column count as it is long in row count - say 10,000 rows x 10,000 columns. That's what I'm trying to find an equivalent DB product to achieve. The column headings also stretch out to about 500 characters but the corresponding cell values on all but 2 of the c...
Imagine a spreadsheet that's as wide in column count as it is long in row count - say 10,000 rows x 10,000 columns. That's what I'm trying to find an equivalent DB product to achieve. The column headings also stretch out to about 500 characters but the corresponding cell values on all but 2 of the columns (the first two) are either
1
or 0
(translating a binary data file). So, column 1 and 2 in the structure have conventional string and floating point values and short column names. And then everything else is a 1
or a 0
with the bloated column heading text values taking up more space than the corresponding row values.
I have tried SQLite, Postgres, MariaDB and MySQL but I keep running into limits either on the number of columns, on the row size or on the column width (mainly because of the headings).
Is there any database product flexible enough to support a single table with all of these features?
sixestwentyeight
(11 rep)
Mar 17, 2023, 02:15 AM
• Last activity: Mar 27, 2023, 11:14 AM
-1
votes
1
answers
34
views
Database recommendation storing a growing set of recordings
Suppose I have a large volume of recordings coming in from various sources. Encoded as JSON-lines, they might look like this: ```json { "metric": "temperatureRange", "values": [ "new-york", "2023-02-06T11:42:09", "18.1", "24.6" ] } { "metric": "windSpeed", "values": [ "tokyo", "2023-01-06T09:21:30",...
Suppose I have a large volume of recordings coming in from various sources. Encoded as JSON-lines, they might look like this:
{ "metric": "temperatureRange", "values": [ "new-york", "2023-02-06T11:42:09", "18.1", "24.6" ] }
{ "metric": "windSpeed", "values": [ "tokyo", "2023-01-06T09:21:30", "7.1" ] }
{ "metric": "humidity", "values": [ "london", "2023-01-06T09:21:30", "0.65" ] }
{ "metric": "windSpeed", "values": [ "london", "2023-02-04T12:54:55", "5.6" ] }
The data is non-relational (in the sense of no foreign keys) and I only ever want to append data. Recordings are never modified after writing. Notice that the values
list is not of a fixed size. That said, I think I could choose a reasonable upper-bound of—say—16 values.
In terms of querying, I want to be able to do look-ups like:
* Find all rows where the metric is windSpeed
* Find all rows where the metric is humidity
and first value is "new-york"
* Find all rows where metric is temperatureRange
and the second value is "24.6"
I can model this in a traditional RDBMS, however I don't think that is optimized for this use-case. This is because I anticipate a high volume of writes but only very simple queries.
**What is a good database system for this type of work load?**
sdgfsdh
(99 rep)
Feb 6, 2023, 11:52 AM
• Last activity: Feb 6, 2023, 01:45 PM
3
votes
3
answers
1714
views
Recommendation for storage of series of time series
Just a few words describing the data: In my application, there are acceleration measurements (for example at 25kHz) for the exemplary duration of one second. These measurements get repeated in not necessery äquidistant time steps for that measurement point. (Maybe each five or ten minutes). Thi...
Just a few words describing the data:
In my application, there are acceleration measurements (for example at 25kHz) for the exemplary duration of one second. These measurements get repeated in not necessery äquidistant time steps for that measurement point. (Maybe each five or ten minutes). This is a kind of interrupted permanent monitoring, somehow two-periodic:
- short time period of measurement is 25.000 Hz (the resolution of the measurements)
- long time periodic (not in a strong sense, might differ) every 5 minutes
There are 20 or more of those points.
As dealing with time series, the first idea might be the usage of a time series db. On the other hand, for me it seems as if the main purpose of time series db is storage of scalar values. Of course, my measurements are scalar values. But I'm not sure if it would be a good idea to store every scalar value as a (time/value/measpos_id)-triple - leading to an enormous number of entries. I think single of those entries would never be evaluated.
Another idea could be the storage of the measurement vector (all values from that second) together with the starting time and the measpos_id. But howto do that? Taking all values as a blob? Not every system is capable of dealing with vectors - and maybe they differ in length. Are there concepts in timeseries-db for such problems, which I'm not aware of?
Further for evaluation (extraction) I think maybe the exctraction of the complete vector would be the most used case.
Please feel free to ask, if my description is incomplete or some more details could help in finding a good solution.
What are your recommendations? NoSQL or relational SQL? Further ideas? Every hint is welcome. Thanks in advance.
additions:
- A rough idea for the volume is steady growing in size of about 1 TB a year
- Giving a sample is not that easy - I'll try to describe:
Think of 1 column with 25000 float values for each measurement (each minute rougly and for each measurement position), timestamped each of these columns (at begin).
- Usage for big data evaluation (means testing many kind of algorithms); windowing data, fft (spectral analysis), comparison, aggregation like energetic sum, value of max amplitude, pos (freq) of max amplitude, many more
- purpose (focus) of evaluation: wear detection for condition monitoring of for example rolling devices (gears, generator sets, turbines, shafts, bearings)
- evaluation would (from todays view) focus on each seperate column and maybe compare to others - but not combine (stack) columns together.
- data size example: 25.000 float values in each column for 20 measured engines each 5 minutes (12 per hours) results in 6e6 floats each hour or 5.25e10 floats each year.
Bastian Ebeling
(175 rep)
Jul 3, 2017, 06:45 AM
• Last activity: Jan 29, 2023, 10:17 AM
-3
votes
2
answers
273
views
Best database for array like collections
I'm going to build a Go application that will check id(s) in a list of ids. Structure will be something like this for database: Company A -- List 1 -> 100.000 ids List 2 -> 250.000 ids Company B -- List 1 -> 100.000 ids List 2 -> 250.000 ids etc.. These ids will be strings and I need to efficently q...
I'm going to build a Go application that will check id(s) in a list of ids. Structure will be something like this for database:
Company A
--
List 1 -> 100.000 ids
List 2 -> 250.000 ids
Company B
--
List 1 -> 100.000 ids
List 2 -> 250.000 ids
etc..
These ids will be strings and I need to efficently query multiple ids and get which are those exists in the collection. Write operation will be only once for each list so write performance isn't really important but read should be fast. What would be the best database solution for this on?
Svnd
(1 rep)
Jan 19, 2023, 02:00 PM
• Last activity: Jan 20, 2023, 09:25 AM
Showing page 1 of 20 total questions