Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
0
votes
1
answers
219
views
Does combining multiple tables into one table with more than 100 columns improve performance and overall design?
I have 10 tables without any type of foreign key relations between them. All the tables have the same primary key. In this case, "stock_id" is the primary key. All table columns describe a metric of the stock_id. For eg: product_fees is a column to describe the total fees of stocks. Like this, we ha...
I have 10 tables without any type of foreign key relations between them. All the tables have the same primary key. In this case, "stock_id" is the primary key. All table columns describe a metric of the stock_id. For eg: product_fees is a column to describe the total fees of stocks. Like this, we have various metrics in all the tables. I use all 10 tables using Django API for the frontend website to show all visualizations (20+ charts). I don't think this design is a good one.
**My solution:**
As all attributes have stock_id I'm thinking of combining the tables into one table which will contain about 110 columns. There is one extra table that will have the prediction data with dates (1000+ rows for each stock) for all stocks. This prediction data will have a foreign key (1:M) to the main table with 110 columns. To improve readability, I am thinking of using 20 views to debug the data in the future instead of scanning through all the column names.
The most important thing is all the values of stocks in the 20 tables change every day and I usually do delete and insert them in each table instead of using updates especially for the stock prediction table with dates.
Is it a good idea to have more than 100 columns and use views for aggregating smaller sections of the table for debugging purposes?
Kaushik Ganesan
(1 rep)
Jan 14, 2022, 08:20 PM
• Last activity: Jun 15, 2025, 11:05 AM
0
votes
1
answers
579
views
Table with only one column because of many-to-many relationship?
Right now I have three tables: `person`, `alias`, and `person_alias`. As suggested by the names of tables, one person can have multiple aliases and one alias can be used by many people, and hence I add a joining table `person_alias` during the process normalization. However, the table `alias` essent...
Right now I have three tables:
person
, alias
, and person_alias
. As suggested by the names of tables, one person can have multiple aliases and one alias can be used by many people, and hence I add a joining table person_alias
during the process normalization.
However, the table alias
essentially only contains one column (if I do not choose to use numeric surrogate key). In addition, let's say the table person
also contains a bunch of other useful information that is irrelevant to the question here.
I am not a bit stuck at justifying setting up the entity relationship map this way. What is the problem that I delete the table alias
all together? Or do I misunderstand many-to-many relationship/normalization?
ark
(85 rep)
Jan 7, 2019, 01:01 PM
• Last activity: Feb 4, 2025, 05:02 AM
1
votes
0
answers
58
views
How to Preserve Past Order Details Amidst Customer Information Changes?
I'm facing a scenario in my e-commerce setup where after an order is marked as `Done`, any subsequent changes made by the customer to their information (like shipping address) are reflected incorrectly in past orders. I stumbled upon a blog post that discusses three solutions in PostgreSQL: 1. denor...
I'm facing a scenario in my e-commerce setup where after an order is marked as
Done
, any subsequent changes made by the customer to their information (like shipping address) are reflected incorrectly in past orders. I stumbled upon a blog post that discusses three solutions in PostgreSQL:
1. denormalizing the Order table
2. using Temporal Tables
3. storing snapshots as JSON blobs.
I am keen to understand which of these methods, or any other recommendations you have, effectively tackle this issue while being scalable for a small business setup. I value insights from individuals who have witnessed or implemented these solutions in various company sizes and am interested in learning about what worked or didn't work in your experience.
Source [Link](https://dev.to/egorpavlikhin/temporal-tables-or-json-storage-3gbg)
Ashwin Jacob
(111 rep)
Oct 26, 2023, 02:38 PM
-4
votes
1
answers
43
views
Is it normal to have 300 queries to display a single page?
The total number of queries is: Q1 * Q2 * Q3, it might be 25 * 3 * 4 = 300, or maybe even more. all of these is for a single user displaying a single page! Should I duplicate the data and store them directly in the first queried table, this would reduce the number of queries dramatically from 300+ t...
The total number of queries is: Q1 * Q2 * Q3, it might be 25 * 3 * 4 = 300, or maybe even more.
all of these is for a single user displaying a single page!
Should I duplicate the data and store them directly in the first queried table, this would reduce the number of queries dramatically from 300+ to a single query.
Or should I leave the database normalized with no duplications?
---
for example:
SELECT * FROM items
JOIN descriptions on descriptions.id = items.description_id
JOIN properties on properties.id = descriptions.property_id
JOIN images on images.id = properties.image_id;
so I assumed the first table has 25 items, for each item there is 3 items and so on ..
the number of joins should be a lot
Alsaraha
(1 rep)
Nov 10, 2022, 04:21 AM
• Last activity: Nov 10, 2022, 10:55 AM
1
votes
1
answers
241
views
A clarification about data normalization
I've been a software developer for 8+ years now and I think I have a pretty good grasp of the concepts but as an entirely self-taught developer, I have some unfortunate gaps in proper terminology. This has not been a problem so far (understanding concepts while not knowing their names), but I'm writ...
I've been a software developer for 8+ years now and I think I have a pretty good grasp of the concepts but as an entirely self-taught developer, I have some unfortunate gaps in proper terminology. This has not been a problem so far (understanding concepts while not knowing their names), but I'm writing some tutorials now and would very much like to get everything right.
What I'd like to verify with you, my peers, is the use of the term „data normalization“.
I have a very simple example in XML on which I try to demonstrate normalized vs. not normalized data:
Horní Dolní
123 00
U slepých
53
U slepých 53, Horní Dolní, 123 00
I have always thought that splitting larger chunks of data into smaller, more specific pieces of data is a perfect example of data normalization. (Or in the context of DBs, instead of having one column „address“, having more, specific columns - like in the XML above.)
But the more I google on the topic, the stronger feeling I have, that perhaps I have no clue of what normalization actually is :D
So, do you concur with my data normalization example? Or am I off?
Patrik Šimunič
(111 rep)
Sep 15, 2022, 08:42 AM
• Last activity: Sep 15, 2022, 12:52 PM
1
votes
1
answers
346
views
Junction table or denormalize?
I'm using MariaDB. I'm designing a simple application that can send notifications to subscribers for when certain websites they've subscribed to have been updated. A user can be subscribed to several different sites and multiple users can subscribe to same site. So both tables are independent in a v...
I'm using MariaDB.
I'm designing a simple application that can send notifications to subscribers for when certain websites they've subscribed to have been updated.
A user can be subscribed to several different sites and multiple users can subscribe to same site.
So both tables are independent in a vacuum.
Simple schema for both, just a
1. sites table containing ids and urls
2. subscribers table containing ids and phone#s
My tentative flow for efficiency and scalability is something like the following
1. Cron job every hour that gets all Sites from sites table
2. Makes parallel web requests for all sites
3. Detect sites that have been changed (this will be compared against to another column in sites table)
4. Alert users of changed sites which sites got updated (prob also send alerts in parallel)
Is introducing a junction table even necessary for this? I feel like it'd introduce complexity bc I'd have to join on that everytime and update that everytime a change has been made to either subscribers table.
Or.. should I just denormalize..
Seems like sites is kind of the "common denominator" here (i.e. I'll always check if EVERY site has changed, but I don't necessarily need all subscribers)
So given ^, I was thinking of maybe just adding a Sites.subsriber_ids field..
Or..should I just go for a NoSQL approach?
Any thoughts and recommendations ?
Thanks!
CREATE TABLE provsub (
provider_id int(11) NOT NULL,
subscriber_id int(11) NOT NULL,
created timestamp NOT NULL DEFAULT current_timestamp(),
updated timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
PRIMARY KEY (provider_id,subscriber_id),
KEY subscriber_id (subscriber_id),
CONSTRAINT provsub_ibfk_1 FOREIGN KEY (provider_id) REFERENCES providers (id),
CONSTRAINT provsub_ibfk_2 FOREIGN KEY (subscriber_id) REFERENCES subscribers (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 |
user2402616
(211 rep)
May 14, 2022, 04:45 PM
• Last activity: May 28, 2022, 12:42 AM
2
votes
1
answers
960
views
Should a counter column with frequent update be stored in a separate table?
I have a MySQL/MariaDB database where posts are stored. Each post has some statistical counters such as the number of times the post has been viewed for the current day, the total number of views, number of likes and dislikes. For now, I plan to have all of the counter columns updated in real-time e...
I have a MySQL/MariaDB database where posts are stored. Each post has some statistical counters such as the number of times the post has been viewed for the current day, the total number of views, number of likes and dislikes.
For now, I plan to have all of the counter columns updated in real-time every time an action happens - a post gets a view, a like or a dislike. That means that the
post_stats
table will get updated all the time while the posts
table will rarely be updated and will only be read most of the time.
The table schema is as follows:\
posts(**post_id**, author_id, title, slug, content, created_at, updated_at)\
post_stats(**post_id**, total_views, total_views_today, total_likes, total_dislikes)
The two tables are connected with a post_id
foreign key. Currently, both tables use InnoDB. The data from both tables will be always queried together to be able to show a post with its counters, so this means there will be an INNER JOIN
used all the time. The stats are updated right after reading them (every page view).
My questions are:
1. For best performance when the tables grow, should I combine the two tables into one since the columns in post_status
are directly related to the post
entries, or should I keep the counter/summary table separate from the main posts
table?
2. For best performance when the tables grow, should I use MyISAM for the posts
table as I can imagine that MyISAM can be more efficient at reads while InnoDB at inserts?
This problem is general for this database and also applies to other tables in the same database such as users
(counters such as *the total number views of their posts*, *the total number of comments written by them*, *the total number of posts written by them*, etc.) and categories
(*the number of posts in that category*, etc.).
P.S. The views per day counters are reset once daily at midnight with a cron job. One reason for having posts and post_stats as two tables is concerns about caching.
kataba
(33 rep)
Mar 3, 2022, 08:45 AM
• Last activity: Mar 3, 2022, 01:40 PM
2
votes
2
answers
2589
views
Data normalization (from 0NF to 1NF)
I am designing a relational database in mySQL - a library management system. I want to show examples of the data in 1NF, 2NF, 3NF, BCNF, etc. My example of 1NF looks something like this ![enter image description here][1] [1]: https://i.sstatic.net/lRFuI.png "A relation is in first normal form if the...
I am designing a relational database in mySQL - a library management system. I want to show examples of the data in 1NF, 2NF, 3NF, BCNF, etc. My example of 1NF looks something like this
"A relation is in first normal form if the domain of each attribute contains only atomic values, and the value of each attribute contains only a single value from that domain."
I am unsure of if the data is in 1NF. For example is the attribute 'ISBN' atomic?
ISBN is a unique numeric commercial book identifier. An ISBN consists of 4 parts (if it is a 10 digit ISBN) or 5 parts (for a 13 digit ISBN). The different parts is prefix element, registration group, registrant, publication and check digit.
Would you divide the identification number into smaller units, to guarantee an atomic value?

Jamgreen
(131 rep)
Apr 15, 2015, 07:10 PM
• Last activity: Jun 24, 2021, 10:41 AM
0
votes
2
answers
132
views
Is it possible to store historical configuration settings for each row of data without cramming all the configuration settings into each row of data?
For background: I was recently hired as a database engineer for a water treatment company. We deploy water treatment machines to sites across the country, and the machines treat water and send continuous data back to us regarding the state of incoming water (flow rate, temperature, concentration of...
For background: I was recently hired as a database engineer for a water treatment company. We deploy water treatment machines to sites across the country, and the machines treat water and send continuous data back to us regarding the state of incoming water (flow rate, temperature, concentration of X in incoming water, etc.), and regarding the treatments the machine applied to that water at that point in time. Over time, sites (and their various components) change a lot: a machine might break down and need to be replaced, a different concentration of chemical may be used to fill the machine's tanks, its flow meters and other sensors might be recalibrated or set to scale differently, its chemical pumps might be replaced, and on and on. These affect the interpretation of the data: for example, if 5 mL of chlorine was added to the incoming water at 01/01/2021 12:00:05, that means two completely different things if the chlorine was 5% concentrated or 40% concentrated.
Water treatment datapoints are identified by a composite key consisting of the ID of the site, and a timestamp. It would be easy to store configurations if the only data that mattered was current data, as I could keep configuration settings in the Site table and pull them up for datapoints as needed. But we need to be able to correctly interpret older data. So, I thought about storing configurations in its own table table, which would track all the settings for each site over each time period, but it's not possible to create a foreign key between the continuous timestamps of the datapoints and the start/end dates of the configurations - the closest thing would be some kind of range check, like "Datapoint.TimeStamp BETWEEN Configuration.Start AND Configuration.End". So the only other option I see is to store every configuration setting for every datapoint alongside each datapoint, but that seems like a terrible solution given how many configuration settings there are and how many datapoints are generated, especially since most of the settings don't even change often.
So, is there a way to store historical configurations for each row of data in a way that is at all normalized, or is the only possible solution to cram all the settings into each datapoint?
Nick Muise
(105 rep)
Feb 5, 2021, 07:01 PM
• Last activity: Feb 5, 2021, 09:13 PM
1
votes
1
answers
122
views
Can I use Azure Data Lake in place of Data Warehouse to separate Compute and Storage of RDBMS analytics
I am from a telemetry data engineering background and am looking for some help on a use-case I am not familiar with as to whether my proposed architecture is practically feasible in Azure. The use-case is that I am trying to build some intelligence from a relational database that is updated regularl...
I am from a telemetry data engineering background and am looking for some help on a use-case I am not familiar with as to whether my proposed architecture is practically feasible in Azure. The use-case is that I am trying to build some intelligence from a relational database that is updated regularly.
Currently, a data warehouse is in place which uses ADF to copy database tables in 3NF to some staging tables. The staging tables are then used to upsert to their respective tables in the 'Data Warehouse'. No denormalisation has been done. This has been pushed to the end-user to do in PowerBI. I inherited this.
The solution is expensive and places high demand on the end-user analysts. I wanted to use a Kimball style approach and have a fact table with some dimension tables. In order to reduce costs, I want to do this in Azure Data Lake and then use DataBricks to carry out the denormalisation and store the resulting outputs in a separate zone in Data Lake. I would then take the daily snapshots of the database tables and update the fact and dimension tables where data has changed/been added. (I still need to determine how to determine the daily changes without a watermark in the source tables).
I do not see any reference architectures where this is done. It seems the data is always put into Synapse or SQLDB, even if it does make a stop in Data Lake. In order to reduce the costs would my approach be ok where I use Data Lake and DataBricks for everything (Staging, denormalisation, and serving the fact/dim tables)? The analysts would then have a much simpler model to connect up to PowerBI and use Dataflows to carry out their aggregations.
Any advice would be greatly appreciated.
Snek
(11 rep)
Nov 24, 2020, 12:13 PM
• Last activity: Nov 26, 2020, 12:52 PM
11
votes
3
answers
2512
views
Integrity constraints in a relational database - should we overlook them?
I’m in a permanent discussion with the developers of the company where I work because they say it is better to get rid of relationship enforcement (via FOREIGN KEY constraint definitions) in a relational database in order to speed up large queries and to gain better performance. The platform under c...
I’m in a permanent discussion with the developers of the company where I work because they say it is better to get rid of relationship enforcement (via FOREIGN KEY constraint definitions) in a relational database in order to speed up large queries and to gain better performance.
The platform under consideration is MySQL 5.x, and no FOREIGN KEY has been set up, even some PRIMARY KEY constraints of the relevant tables are missing which, at least for me, is not reasonable. Maybe they’re right and I’m wrong, but I don’t have enough arguments to discuss about this situation.
This has been the preferred approach for three years now. I’m new in this company (only one month) but, as the product “works”, there is hesitation to enhance the database; nevertheles, the first thing I noticed is one page taking 1 minute to load (yes, 60 seconds!).
One of the claims behind the current state of affairs is that a “denormalized” database is faster than a normalized one, but I don’t believe that’s true.
Most of the relevant queries include JOIN operations, which makes them run very, very, very slow with large amounts of data (the database contains millions of rows).
Commonly, the handling of “CRUD” operations is implemented at the application program code level; for example, in order to DELETE some data FROM, let’s say,
TableA
:
- it is necessary to first check **on the fly** if there is some relationship between the rows of TableA
and TableB
,
- in case that said relationship is “detected”, then the app program code won’t allow to DELETE the pertinent row(s), **but**
- if for some reason the app program code fails, then the DELETE operation will “succeed”, no matter if there is any relationship regarding the involved rows and tables.
## Question ##
Could you help me to elaborate a good, accurate and solid answer to enrich the debate?
**Note**: Maybe something like this has been asked (and answered) before, but I couldn’t find anything by means of Google.
ReynierPM
(1888 rep)
Oct 17, 2016, 01:27 PM
• Last activity: Nov 20, 2020, 07:46 PM
0
votes
3
answers
135
views
How do I ensure I have split my tables without loss of data?
This is w.r.t Database Normalisation, when I have a redundant table that causes anomalies, I would want to consider splitting it in smaller tables as per the NF rules. BUT Say I split a big table into two eg. Student and Marks Table my questions is --- **How do I know I've done the right split?** Or...
This is w.r.t Database Normalisation, when I have a redundant table that causes anomalies, I would want to consider splitting it in smaller tables as per the NF rules.
BUT
Say I split a big table into two eg. Student and Marks Table my questions is ---
**How do I know I've done the right split?**
Or to put it in another way,
**How can I cross check if I haven't lost any information after the split i.e. I can still do a join on both tables and get my un-normalized table back?**
This is easy to inspect visually when you have a small number of attributes, but what if the table is really huge.
**How can I ensure that I've split the table in a right manner?**
*When I was in college my professor spoke of some equation which must always be true whenever you split tables, that ensures that the split is done without loss of any information. And by doing the right joins we can get the big dirty redundant table back.*
Oliver
(133 rep)
Sep 11, 2020, 08:57 PM
• Last activity: Sep 12, 2020, 02:49 PM
1
votes
1
answers
1588
views
database design - table with multiple nullable foreign keys
Context: - I have 3 unrelated entities (tables) Entity1, Entity2 and Entity3 - I have a table called Operation, which can have multiple targets. - A target for an operation can be Entity1, Entity2 or Entity3. To model this structure, I first modeled it like this (Approach 1): **Table OperationTarget...
Context:
- I have 3 unrelated entities (tables) Entity1, Entity2 and Entity3
- I have a table called Operation, which can have multiple targets.
- A target for an operation can be Entity1, Entity2 or Entity3.
To model this structure, I first modeled it like this (Approach 1):
**Table OperationTarget**
- FK to Operation (Not Nullable)
- FK to Entity 1 (Nullable)
- FK to Entity 2 (Nullable)
- FK to Entity 3 (Nullable)
- A constraint that enforces only one of them will be set, and others will be Null
Then, I didn't like the idea of creating a table where at each row, 2 of 3 columns will always be Null, so I created a different approach (Approach 2):
**Table OperationTarget**
- FK to Operation (Not Nullable)
**Table OperationTargetEntity1**
- FK to OperationTarget(Not Nullable)
- FK to Entity 1 (Not Nullable)
**Table OperationTargetEntity2**
- FK to OperationTarget(Not Nullable)
- FK to Entity 2 (Not Nullable)
**Table OperationTargetEntity3**
- FK to OperationTarget(Not Nullable)
- FK to Entity 3 (Not Nullable)
This approach feels better in terms of normalization, but in practice, this will make my codebase more complex, and also my queries more complex, as I will be needing joins. I need the parent table there for other application requirements.
I am currently going with approach 2. Is approach 1 something I should be avoiding at all costs, or when justified by simplicity and performance, it would be OK to go with it?
I've been reading on the topic for hours, some people don't like using Nullable columns at all, some say it's OK when application requires them (when it is possible you don't know the value for a column for a specific row, for instance where a user optionally can provide their birth date), but not sure if approach 1 is justifiable here, or should be avoided.
Ozgur Akcali
(119 rep)
Aug 10, 2020, 11:39 AM
• Last activity: Aug 11, 2020, 01:07 PM
8
votes
4
answers
1027
views
Store millions of rows of denomalized data or some SQL magic?
My DBA experience doesn't go much further than simple storage + retrieval of CMS style data - so this may be a silly question, I don't know! I have a problem whereby I need to lookup or calculate holiday prices for a certain group size and a certain number of days within a certain time period. E.g.:...
My DBA experience doesn't go much further than simple storage + retrieval of CMS style data - so this may be a silly question, I don't know!
I have a problem whereby I need to lookup or calculate holiday prices for a certain group size and a certain number of days within a certain time period. E.g.:
***How much is a hotel room for 2 people for 4 nights anytime in January?***
I have pricing and availability data for, say, 5000 hotels stored like so:
Hotel ID | Date | Spaces | Price PP
-----------------------------------
123 | Jan1 | 5 | 100
123 | Jan2 | 7 | 100
123 | Jan3 | 5 | 100
123 | Jan4 | 3 | 100
123 | Jan5 | 5 | 100
123 | Jan6 | 7 | 110
456 | Jan1 | 5 | 120
456 | Jan2 | 1 | 120
456 | Jan3 | 4 | 130
456 | Jan4 | 3 | 110
456 | Jan5 | 5 | 100
456 | Jan6 | 7 | 90
With this table, I can do a query like so:
SELECT hotel_id, sum(price_pp)
FROM hotel_data
WHERE
date >= Jan1 and date = 2
GROUP BY hotel_id
HAVING count(*) = 4;
results
hotel_id | sum
----------------
123 | 400
The
HAVING
clause here makes sure that there is an entry for every single day between my desired dates that has the spaces available. ie. Hotel 456 had 1 space available on Jan2, the HAVING clause would return 3, so we don't get a result for hotel 456.
So far so good.
However, is there a way to find out all the 4 night periods in January where there is space available? We could repeat the query 27 times - incrementing the dates each time, which does seem a little bit awkward. Or another way around could be to store all possible combinations in a lookup table like so:
Hotel ID | total price pp | num_people | num_nights | start_date
----------------------------------------------------------------
123 | 400 | 2 | 4 | Jan1
123 | 400 | 2 | 4 | Jan2
123 | 400 | 2 | 4 | Jan3
123 | 400 | 3 | 4 | Jan1
123 | 400 | 3 | 4 | Jan2
123 | 400 | 3 | 4 | Jan3
And so on. We'd have to limit max number of nights, and the max number of people we would search for - e.g. max nights = 28, max people = 10 (limited to the number of spaces available for that set period starting on that date).
For one hotel, this could give us 28*10*365=102000 outcomes per year. 5000 hotels = 500m outcomes!
But we'd have a very simple query to find the cheapest 4 night stay in Jan for 2 people:
SELECT
hotel_id, start_date, price
from hotel_lookup
where num_people=2
and num_nights=4
and start_date >= Jan1
and start_date <= Jan27
order by price
limit 1;
Is there a way to perform this query on the initial table without having to generate the 500m row lookup table!? e.g. generate the 27 possible outcomes in a temporary table or some other such inner query magic?
At the moment all data is held in a Postgres DB - if needs be for this purpose we can move the data out to something else more suitable? Not sure if this type of query fits the map/reduce patterns for NoSQL style DBs ...
Guy Bowden
(223 rep)
Jun 17, 2014, 12:37 PM
• Last activity: May 3, 2020, 12:33 AM
0
votes
1
answers
35
views
Is it correct way having status or Summary columns in parent table or should choose another precedure?
My Parent Table is "Jobs" table, and there is child table that records Job's actions named "JobActions" as below: Jobs (**JobId**, JobType, CurrentStage, AssignedTo) JobActions (**JobActionId**, *JobId*, CreateDate, ActionType, FromUserId, ToUserId, Message, Detail) in my grid in web page I list gri...
My Parent Table is "Jobs" table, and there is child table that records Job's actions named "JobActions" as below:
Jobs (**JobId**, JobType, CurrentStage, AssignedTo)
JobActions (**JobActionId**, *JobId*, CreateDate, ActionType, FromUserId, ToUserId, Message, Detail)
in my grid in web page I list grid columns as below:
JobId, CreateDate, CurrentStage, AssignedTo, LastActionType, LastActionDate,...
In grid for calculating below columns I use joins to child table (very slow when there is many users and many data)
CreateDate ==> First(or min) JobActions' CreateDate,
LastActionType ==> Last JobActions' ActionType,
LastActionDate ==> Last(or Max) JobActions' CreateDate
I have about five columns like this that needs many join to JobActions or Another tables, So I Decided to change Job table like below:
Jobs (**JobId**, JobType, CurrentStage, AssignedTo, CreateDate, LastActionType, LastActionDate)
the last three columns always updated whenever any jobaction added to JobActions.
- Question 1: Is this a correct version, (am I denormalized my tables?)
- Question 2: Should I use trigger for this type of jobs or should update from inside my code?
- Question 3: Is this best practice? is there any better solution for this type of problems.
mesut
(111 rep)
Apr 5, 2020, 10:07 AM
• Last activity: Apr 5, 2020, 12:34 PM
0
votes
1
answers
117
views
how to professionally normalize my database - screenshots included
Hello database administrators. I got some excel spreadsheet with data of "my customers" and I'd like to create professionally designed database out of it. Can anyone tell me how this data should be normalized, what tables should be created and how should they be related with each other, so that my d...
Hello database administrators.
I got some excel spreadsheet with data of "my customers" and I'd like to create professionally designed database out of it.
Can anyone tell me how this data should be normalized, what tables should be created and how should they be related with each other, so that my database would meet corporate industry standards?
I know I'm just a beginner in database creation but I've already learned about concept of database normalization, I'm eager to learn and I'll be grateful for any help and directions.
Here is the first chunk of my data imported to access table. As you can see I've already changed the job's names into numbers and I have separate table where these numbers are assigned into the jobs names.
Here it is in jobs table.
Jobs table with jobs names.
Here is the rest of the data - please help me organize it.
How about this *BESTMovie columns - do they need normalization? Some movies repeat 10 times and others don't repeat at all.
And one more question to this "*Profession" and "*BESTMovie" columns.
If I had some person, lets say clint eastwood or robert de niro, that I'd like to assign not only 3 but 5 movies, how to do this without adding another "*BESTMovie" column, where most of the fields would be blank for some other people who didn't have more than 3 movies?
I've heard about some rank column or rank table but I have no idea how should it look like.
The rest of my data looks like this:
Here is an "EyesColor" column and "Adjective" I use to describe a person. Can these columns be normalized and should the be normalized? The eyes color repeats many times of course, and adjectives repeat less frequently but they do repeat.
I hope the question is well put and I'm looking forward for answers from you guys.




michal roesler
(125 rep)
Nov 25, 2019, 09:30 PM
• Last activity: Nov 25, 2019, 10:33 PM
2
votes
3
answers
620
views
Are updates the only disadvantage of denormalization?
I am trying to understand what are the disadvantages of denormalization. Say that I have the following database (`School_has_Student` is a denormalized table): [![enter image description here][1]][1] I have read that the problems that you may face when using denormalization are related to the operat...
I am trying to understand what are the disadvantages of denormalization. Say that I have the following database (
I have read that the problems that you may face when using denormalization are related to the operations of
School_has_Student
is a denormalized table):

INSERT
, UPDATE
, and DELETE
.
Now I understand why we have the UPDATE
problem, because if I updated one piece of data, then I would have to update the rest of the related pieces of data (for example: if I updated the student name "Paul" in the Student
table, then I would have to also update the student name "Paul" that exist two times in the School_has_Student
table), but I don't understand why INSERT
and DELETE
are also problems.
Note: I know that the increase of storage space is also a disadvantage, but I do not think it is a big disadvantage with current storage devices which have very large capacity.
Tom
(129 rep)
Oct 19, 2018, 10:24 AM
• Last activity: Nov 25, 2019, 08:47 PM
-2
votes
1
answers
43
views
access database normalization :how to deal book volumes like 1-2 or 3-4
my question is about NF1 what is best practice to store book volumes in table . for example mostly it is like bookname = abc volume= 1 ect but some time it is abc volume 1-2.
my question is about NF1 what is best practice to store book volumes in table .
for example mostly it is like bookname = abc volume= 1 ect but some time it is abc volume 1-2.
Taqi
(3 rep)
Sep 6, 2019, 08:36 AM
• Last activity: Sep 6, 2019, 11:49 AM
16
votes
1
answers
6927
views
Designing a database for a video game business domain with multiple many-to-many relationships
I am relatively new to database design, and I decided to make my own hypothetical database for practice. However, I am having trouble modeling and normalizing it, as I esteem that there are numerous many-to-many (M:N) relationships. # General scenario description # The database is meant to retain da...
I am relatively new to database design, and I decided to make my own hypothetical database for practice. However, I am having trouble modeling and normalizing it, as I esteem that there are numerous many-to-many (M:N) relationships.
# General scenario description #
The database is meant to retain data about various *People* that have worked on the Zelda series. I want to keep track of the *Console(s)* that a *Game* can be played on, *Employees* that have had a part in the *Games* development, the *Jobs* the *Employee* had (many *Employees* worked on different *Jobs* across multiple *Games*), etc.
# Business rules #
- Multiple *Employees* can work on multiple *Games*.
- Multiple *Games* can be on the same *Console*.
- Multiple *Consoles* can be a platform for the same *Game*.
- Multiple *Employees* can have the same *Job*.
- An *Employee* can have multiple *Jobs*.
- A *Game* can have multiple *Employees*.
- A *Game* can have multiple types of *Jobs* in it's development
- Multiple *Games* can have the same type of *Job* attached.
- A *Console* can have multiple *People* working on it.
- A *Person* can work on multiple *Consoles*.
# Attribute names and sample values #
- *Employee Name*, which can be split into *First* and *Last* (for example “John” and “Doe” )
- *Game Title* (for example “Ocarina of Time”)
- *Job Title* (for example “Level Design”, “Director”, “Composure”, “Level Designer”, “Programmer”, “Localization”, etc.).
- *Console Name* (for example “Game Boy Advance”)
# The issue #
So far, it seems no matter what I design there are data redundancies and M:N relationships between the entity types of interest everywhere. However I feel that database designers must run into this kind of problem all the time, so there must be a solution.
*Note*: I am well able to find the data to *fill* the table, the problem is organizing it into a database with tables in a normalized form.
dpolaristar
(193 rep)
Jun 6, 2016, 04:20 PM
• Last activity: Jun 28, 2019, 03:48 PM
0
votes
1
answers
94
views
Does any MariaDB engine have the option to declaratively or configuratively abstract highly-redundant data?
I have an application which is architected in a "NoSQL" style around one fully denormalized "main table" which currently just holds a primary key and one JSON-valued column. For reasons which are outside the scope of this question, I want to retain this architecture: I do NOT want to go full relatio...
I have an application which is architected in a "NoSQL" style around one fully denormalized "main table" which currently just holds a primary key and one JSON-valued column. For reasons which are outside the scope of this question, I want to retain this architecture: I do NOT want to go full relational and create a proper normalized data model with entities, foreign keys, etc. I would be willing to parse out the JSON fields into their own columns, if that helps with the question below.
The data has a very high level of redundancy, e.g. some long-text column values such as names, addresses, and descriptions may be repeated 20 or 30+ times across different rows.
**Is there a MariaDB engine which can declaratively or configuratively deal with this type of redundancy by internally de-duplicating the values across rows?**
I imagine different ways this might be implemented in MariaDB would involve an extension to the BLOB or TEXT data types to leverage a content-based hash along with reference-counting or garbage collection.
What is meant by declaratively: it can be accomplished via a single "ALTER TABLE" statement. What is meant by configuratively: it can be accomplished by modifying one or more system variables.
I have read about ColumnStore, TokuDB, Mroonga, Parquet, and MyRocks but I'm not quite sure this is what they do, as the documentation is very sparse.
This question is specifically about a storage engine to help internally optimize the storage of redundant data. Please refrain from telling me to rearchitect, redesign, or refactor the application.
Alex R
(207 rep)
Mar 2, 2019, 08:24 PM
• Last activity: Mar 9, 2019, 03:46 AM
Showing page 1 of 20 total questions