Sample Header Ad - 728x90

Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

-1 votes
1 answers
167 views
What data "MUST" be stored inside a "relational database"?
I am working on a mobile/web app like Instagram. Considering that, I want to know for what part of the app it's better to use `relational` and for what other parts `non-relational` databases. I have chosen `MySQL` and `Cassandra` databases and this is my research result so far: `-Relational Database...
I am working on a mobile/web app like Instagram. Considering that, I want to know for what part of the app it's better to use relational and for what other parts non-relational databases. I have chosen MySQL and Cassandra databases and this is my research result so far: -Relational Databases: - For services we need as much as possible consistency like payment service or ordering and authentication services. - Non-relational Databases: - For services we need to summarize a lot of different data in a small table instead of a lot of columns with null values, like user services that we have a lot of personal settings. - For the time we need horizontal scalability and want more distributed system over different datacenters/clouds. - For faster read/write heavy systems. But I am still wondering to know: 1- What database is more appropriate for upload files(images, videos. documents) on it? 2- What database is more sufficient for posts/comments/likes, list of friends, and other user related things that they are also related to other users. (***I mean a post or a comment is related to one user but all the other users can also see it, and may affect them***).
best_of_man (117 rep)
Dec 25, 2022, 06:00 PM • Last activity: Jul 12, 2025, 11:05 AM
-3 votes
2 answers
503 views
What do developers need to know about Read Committed Snapshot Isolation?
My developers have only ever worked with Read Committed, but my new server has Read Committed Snapshot Isolation (RCSI) enabled. What do they need to do differently when writing application code?
My developers have only ever worked with Read Committed, but my new server has Read Committed Snapshot Isolation (RCSI) enabled. What do they need to do differently when writing application code?
J. Mini (1225 rep)
Feb 29, 2024, 07:04 PM • Last activity: Oct 2, 2024, 06:53 AM
-3 votes
1 answers
69 views
Are there mnemonics in the database design space that you find helpful when communicating database design concerns?
A popular and useful mnemonic in the software development space is SOLID. - **S**ingle responsibility - **O**pen–closed - **L**iskov substitution - **I**nterface segregation - **D**ependency inversion Are there **mnemonics** in the database design space that you find helpful when communicating datab...
A popular and useful mnemonic in the software development space is SOLID. - **S**ingle responsibility - **O**pen–closed - **L**iskov substitution - **I**nterface segregation - **D**ependency inversion Are there **mnemonics** in the database design space that you find helpful when communicating database design concerns with software developers? Normalization forms is the first to come to mind: - UNF: Unnormalized form - 1NF: First normal form - 2NF: Second normal form - 3NF: Third normal form - BCNF: Boyce–Codd normal form - Don't think I have ever made it past BCNF :-) However, the forms are often perceived as 'academic' by a development team in the public sector/banking/insurance space. Often teams will 'discover' an EAV solution - **E**ntity **A**ttribute **V**alue schema and think they have solved all peristence problems - and they code a *small* perisstence framework around their EAV "innovation". I find EAV schemas very probablematic and I am looking for something like SOLID - or anything more approachable than "Boyce Codd Third Normal Form" to help communicate concerns and ideas about what 'good' database design.
Brian (135 rep)
Sep 2, 2024, 04:03 PM • Last activity: Sep 3, 2024, 09:57 AM
1 votes
2 answers
307 views
Efficient way to store and retrieve big amount of data
I need to store a big amount of data (about 2.5 billion of new data rows per month), but also I need a very fast way to retrieve the latest value per group on specific time point. The data looking very simple: | ParameterId | Value | DateTime | |--------------|-------|---------------------| | 1 | 12...
I need to store a big amount of data (about 2.5 billion of new data rows per month), but also I need a very fast way to retrieve the latest value per group on specific time point. The data looking very simple: | ParameterId | Value | DateTime | |--------------|-------|---------------------| | 1 | 12.5 | 2023-04-21 14:35:03 | | 2 | 56.81 | 2024-03-01 16:21:17 | | 1 | 12.5 | 2024-05-22 14:35:03 | | 1 | 71.4 | 2024-05-31 18:27:03 | For example, we need the latest values by each parameter on 2024-04-31 17:40. The result will be as follows: | ParameterId | Value | DateTime | |--------------|-------|---------------------| | 1 | 12.5 | 2023-04-21 14:35:03 | | 2 | 56.81 | 2024-03-01 16:21:17 | This looks like it can be solved by a simple database storage, but I've some restrictions: 1. The disk storage is limited. That's why indexes cannot be used as it's almost an x2 of the data space 2. The max query time for any request is 5 seconds. 3. We have only 1 server I've already tried to use TimescaleDB (table partitioning), but because I've only right datetime condition (<= dt) it's very inefficient to search for the value in old chunks from the first one. Technically it's possible, because we have an old software developed by third-party company 10 years ago and it still working, but nobody knows how...
the_it_guy (11 rep)
Jun 21, 2024, 04:56 PM • Last activity: Aug 21, 2024, 04:22 PM
0 votes
1 answers
206 views
Does application need to do anything to handle database failover?
I'm trying understand database failover (mainly in SQL Server) and what handling is required from application side to have zero downtime (no failures) to handle DB failover? Assuming that DBAs have done the best set-up for SQL servers (primary and some secondary nodes with replication). Now, if one...
I'm trying understand database failover (mainly in SQL Server) and what handling is required from application side to have zero downtime (no failures) to handle DB failover? Assuming that DBAs have done the best set-up for SQL servers (primary and some secondary nodes with replication). Now, if one of the nodes fail (or any node is brought down for scheduled maintenance), 1. What happens to the ongoing queries in SQL server? Do they fail or are they transferred to another node? 2. Does the SQL Server jdbc driver connecting to the database get any failures? Will it retry automatically in such cases? Any insights on these basic questions?
Hitesh (3 rep)
Feb 19, 2024, 10:52 AM • Last activity: Feb 19, 2024, 03:35 PM
-1 votes
2 answers
155 views
Need help on database design for 5 entities complex relations
### What it is Have many stories (with many lines in each). Each translated in many languages. Each language have many accents. I'd want to offer user to choose the languages (can choose many of only which has all stories covered) then choose exactly one accent from all the accents of each language....
### What it is Have many stories (with many lines in each). Each translated in many languages. Each language have many accents. I'd want to offer user to choose the languages (can choose many of only which has all stories covered) then choose exactly one accent from all the accents of each language. ### Data example Suppose we've 2 stories, each translated in 3 languages and each language has 5 accents (audio vocal) ### Data access requirement Now think that user chosen
**Language 2 with accent 1** and **Language 3 with accent 4** Now audio will be played in like following
1. Story 1 > Line 1 from Language 2 of Accent 1 **then** Story 1 > Line 1 from Language 3 of Accent 4 2. Story 1 > Line 2 from Language 2 of Accent 1 **then** `Story 1 > Line 2 from Language 3 of Accent 4 3. // continues till the end of the story & then starts new story & play like this ### Possible entities/ tables 1. Stories 2. Lines 3. Languages 4. Accents 5. Audios Obviously I made up this, Need help here. ### Where the help is needed To fulfill my intended use of the data, how the tables should look like, what'd be the relations between them (like stories hasMany Languages, things like that).
And ...
How can I query data to offer user about the 1st selection in "What it is" section in top. About me: I'm experienced in software development but not that much in complex schema design. I (think) can handle fairly complex DB stuff but obviously not this. Info: I'm doing it in MySQL (MariaDB)
Md. A. Apu (123 rep)
Feb 12, 2024, 08:28 PM • Last activity: Feb 17, 2024, 09:49 AM
2 votes
2 answers
811 views
When should I reference the User record or User Profile record
I know it is a common paradigm to separate tables when building a User profile. For example, having a `user` table, and another table called `user_profile` with a foreign key to the `user` table. My understanding is that the `user` table is better for sensitive user account data or authentication da...
I know it is a common paradigm to separate tables when building a User profile. For example, having a user table, and another table called user_profile with a foreign key to the user table. My understanding is that the user table is better for sensitive user account data or authentication data, such as email, password, user type etc. The Profile table could have additional data of that user like first name, last name, date of birth and more. But, what about other data that can be related to the user, and can also be modeled with a table. Some examples could be Payments and Transactions. My first guess is to link those to the profile table and that way I don't have to make any joins just to have Transactions and the name of User together (which the user table does not have). Linking to the profile, I have the user info and the Transactions. But then, when is it useful to link to the user table? What are the common paradigms? Thanks in advance!
chris (121 rep)
May 15, 2020, 09:21 PM • Last activity: Dec 4, 2023, 09:51 AM
-3 votes
1 answers
41 views
What database model to use?
I am about to start a SaaS application for companies (clients of the app) to mange their employees, customers. Application would have different roles in it such as admin, manager etc. Do you use separate db for each client or single db for every client in such project? What database would be suitabl...
I am about to start a SaaS application for companies (clients of the app) to mange their employees, customers. Application would have different roles in it such as admin, manager etc. Do you use separate db for each client or single db for every client in such project? What database would be suitable for that?
jon (1 rep)
Sep 6, 2023, 10:49 AM • Last activity: Sep 7, 2023, 02:57 PM
0 votes
2 answers
126 views
Database design for tracking student exam performance at topic level with possibly 10 billion rows
I am developing a test/exam solution where We have around 100K questions (MCQ / Objective Type). Each question belongs to a topic and each topic belongs to a subject. Students practice these questions as part of weekly/monthly tests, but students can attempt questions just as practice workout and wi...
I am developing a test/exam solution where We have around 100K questions (MCQ / Objective Type). Each question belongs to a topic and each topic belongs to a subject. Students practice these questions as part of weekly/monthly tests, but students can attempt questions just as practice workout and without any test. ## Tables **subject** - id - name **Topic** - id - name - subject_id **Question** - id - text - options columns - correct_option - difficulty **topic_questions** - topic_id - question_id **question_attempts** - user_id - question_id - was_correct //if user got this question right - attempt_date ## Data size - Need to support around 100K questions - could have 100K users - question_attempt could go as big as 100K*100K or more. Users subscribe and use the app over the period of 2 to 3 years during which period they take tests, practice questions as workouts etc. All of the students are competing for a single exam. ## Use cases I need to support - Calculate question difficulty score based on success/failed attempts - Get list of unattempted questions for users to practice. - Each time user click next get an unattempted question for given topic, or subject. - Calculate score for user's strength/weakness for subject and topics based on user's question attempt success/fail - Get list of questions which user attempted last week but failed and let user practice them - Determine user's overall rank among other users based on his attempts Major problem I see is : Finding unattempted questions, As it would need join between question and question_attempt which is very huge and querying last weeks failed question for user For calculating topic/subject performance, i could do jobs which runs at midnight and could get away with lower performance. Currently we use Mysql
Sudhir N (101 rep)
Jun 1, 2023, 07:05 AM • Last activity: Jun 3, 2023, 05:59 PM
0 votes
1 answers
58 views
Changing data type of an existing column of SQL database
One of the field in an existing SQL table is Decimal (5,5) with no foreign key relationships. Developers wants to change it to Decimal (7,5). What will be the downside of making such changes wich has existing data? Is there any chance of the application not working after making such changes?
One of the field in an existing SQL table is Decimal (5,5) with no foreign key relationships. Developers wants to change it to Decimal (7,5). What will be the downside of making such changes wich has existing data? Is there any chance of the application not working after making such changes?
SQL_NoExpert (1117 rep)
Feb 2, 2023, 02:07 AM • Last activity: Feb 7, 2023, 08:50 AM
0 votes
0 answers
157 views
Read+Write intense queries - should I split into a read-write and a readonly db - and then replicate data?
## Background and problem - Providers read/write and block, end-users read I run a system (up-to-date availability of resources), that constantly takes in a fair amount of data points from many different providers (think sources of data like scraped sites, API consumed etc), resulting in a lot of re...
## Background and problem - Providers read/write and block, end-users read I run a system (up-to-date availability of resources), that constantly takes in a fair amount of data points from many different providers (think sources of data like scraped sites, API consumed etc), resulting in a lot of read/insert/deletes. At any given time, ~2-3 providers are read/writing in the table. There are about 2 million rows in the table. ### Providers are OK The providers reading/writing works just fine - does not matter if that process is sometimes a bit slow. No performance issues of any concern here. ### Users not so much, sadness ensues At the same time, users are querying the same database/table, and sometimes this seems to mean that blocking results in very long query-times. A normal query-time is 100ms, but every so often, 20-30 sec queries happen - which is not great. ## Potential solution Other than throwing more virtual metal at the problem, I have considered a different design: ### Two databases, replicated - One database for all the providers to mess with, read, write - Another database in which this table is read-only, so (hopefully) nothing will block user queries - Replication from one to the other every 5-10 minutes (which I hope is intelligent enough to not cause blocking) How does that sound? Would I be back at square one, because the replication causes the same issues?
Kjensen (189 rep)
Jan 3, 2023, 05:13 PM
4 votes
3 answers
459 views
For what type of data it's better to use relational, and for what type of data, non-relational databases?
I am trying to write my first big backend project. This is a mobile/web application like Instagram but for different purposes. As I searched through the internet I found that Instagram uses PostgreSQL and Cassandra as it's main databases. But I don't know for what purpose/type/part of data it uses w...
I am trying to write my first big backend project. This is a mobile/web application like Instagram but for different purposes. As I searched through the internet I found that Instagram uses PostgreSQL and Cassandra as it's main databases. But I don't know for what purpose/type/part of data it uses which database? Does anyone know more about the databases Instagram use or in general, may I know how to decide for what services or what type of data/application it's better to use SQL or NoSQL databases?
user20551429 (69 rep)
Dec 1, 2022, 05:16 PM • Last activity: Dec 5, 2022, 06:59 PM
2 votes
1 answers
110 views
Can we use Cassandra in place of Hadoop with Spark?
Considering we have a backend written in NodeJS and uses MySQL and Cassandra as it's databases, if we want to add Spark to the system to do some data analyzing stuff like recommendation, can we do it with Cassandra( I mean using Spark + Cassandra) and reach the same result as we could reach with the...
Considering we have a backend written in NodeJS and uses MySQL and Cassandra as it's databases, if we want to add Spark to the system to do some data analyzing stuff like recommendation, can we do it with Cassandra( I mean using Spark + Cassandra) and reach the same result as we could reach with the Hadoop( Spark + Hadoop)? I want to know what Hadoop can do that Cassandra can not to? Or what would make it essential to use Hadoop alongside with the Spark?
user20551429 (69 rep)
Nov 29, 2022, 04:41 AM • Last activity: Nov 29, 2022, 05:09 AM
1 votes
1 answers
790 views
Adding soft delete to a database after having used hard delete
**Introduction** My app collects data from a centralized source where many different users can submit data about their organisation and their staff. Previously we used to just hard delete a users data when they were no longer relevant from the source of truth because it used to be reliable. But a ch...
**Introduction** My app collects data from a centralized source where many different users can submit data about their organisation and their staff. Previously we used to just hard delete a users data when they were no longer relevant from the source of truth because it used to be reliable. But a change to some software the clients use, messes with everything. They now DELETE all their data multiple times per month when they submit data. This is by mistake and due to a terrible design. Which means they lose the data for the users in our system and have to re-enter parts of it. The software they use are stubborn and won't change the behaviour. We have tried educating the users about how to use it, but they don't learn. So now the last option is to soft delete the data for a certain time period. Having looked at multiple Stack Overflow posts and blogs around the web, I don't really fancy any of the options, IE. add a column to the tables that need to be soft deleted. I started looking because that was my first instinct as well but don't really like it and the implications. I was wondering if you could give me some feedback on a different idea. I have no experience with maintaining soft deletion and I don't know if my thought is terrible. **Diagram and relations** Simple diagram to show some of the relations There is a user, their unique identifier is the same across multiple orgs. Per user affiliation with an org they have some userinformation like name, title etc. In our system they have one status row because it is the same in our app no matter what org they choose to connect as. So if I follow the conventional way, of adding columns for soft deleting I would have to add one to each of the unique tables that contains user data, because their affiliation to a certain org might be deleted but as a user they still live on in our system from somewhere else. But it seems like a hassle and a lot of change in the nitty gritty of my code to change things around to account for all these extra columns. **Idea** In my mind it would be simpler if I added a separate table containing the following: - UniqueUserIdentifier - UniqueOrgIdentifier - SoftDeleteDate And then whenever my app ask for data the api checks the new table; "is this person soft deleted from this org?" If true, they just block the request until they are restored if needed, or they will remain deleted until they are hard deleted within x hours of the soft deletion happening. Instead of having to change many queries and logic all over the place. **Additional information** The API uses EFCore as an ORM to connect to the database, in case that would help with any other smart fixes regarding its feature set. I have thought about creating custom savechanges logic, but couldn't come up with a good idea other than again adding a column to all the tables. Please let me know if you need any more information. **Update** J.D. Told me about row-level security which made me look around. It seems very useful, and it gave me some more insight into what I could search for. So I came across global query filters for EFCore which seems promising. It allows the context to filter on all queries and when you actually need to ignore this global filter, you can simply do it on a query by query basis. And it allows for dependency injection if you need to use something for the global filter that is based on the user that is connected. I created an answer based on this new information It also turns out that what I really wanted was to deactivate the row until eventual activation or hard delete instead of soft delete. I didn't know the correct way to express myself.
Mikkel (21 rep)
Feb 3, 2022, 01:49 PM • Last activity: Nov 24, 2022, 04:53 AM
0 votes
1 answers
30 views
I need advice for which database system to use
I have to create a database from scratch for the first time and I am not sure which system I should try. I will have specific IP addresses, IP ranges, domain names, host names which will be the key for a given customer. I want my database to be filterable by the customer's key. Each key will have a...
I have to create a database from scratch for the first time and I am not sure which system I should try. I will have specific IP addresses, IP ranges, domain names, host names which will be the key for a given customer. I want my database to be filterable by the customer's key. Each key will have a variety of different forms of data associated with it. For example, a key may have a list of open ports, any identified services, etc. Sometimes a key will be a website, and the data associated with it may be cookie names, URLs, and parameters. If I later find out that a customer's parameter is vulnerable, I want to be able to search through all of the customer's for that vulnerability. What database system should I try using? What would be the most flexible and efficient? I don't want to try a database that isn't for my scheme.
thanley (11 rep)
Nov 22, 2022, 09:52 AM • Last activity: Nov 22, 2022, 01:23 PM
0 votes
0 answers
39 views
Reorder table data with issues to analysis
I am using SQL Server 2008. [![enter image description here][1]][1] 1. The table of products above was created from data from another table. PRODUCT_SORT is the result of sorting the PRODUCT and EVENT_ORDER together 2. Non Product records were inserted (lines in Yellow) from other sources 3. Table s...
I am using SQL Server 2008. enter image description here 1. The table of products above was created from data from another table. PRODUCT_SORT is the result of sorting the PRODUCT and EVENT_ORDER together 2. Non Product records were inserted (lines in Yellow) from other sources 3. Table sorted by SERIALNO and DATE enter image description here **Problem:** - In the table above, the products PROD_03 and PROD_04 are OK (green) and PROD_01 and PROD_02 are NOT OK (red) - The two records of each product must be together and sorted as in the first table at the top **The goal is:** Due to registration, I found many mistakes, so I have to do some corrections to put it in order to make the analysis **Brainstorming my ideas:** - I've thinking on how to do some king of sorting - Maybe to have a Product table and non product table. Create a loop and insert records from one and the other **I am here just to ask for some approaches** on how to resolve these problems. That's it, **no code needed!** **Result table** - The two records of each product must be together and sorted as in the first table at the top. - Non product records must be between two different products ordered by date. enter image description here Regards, Elio Fernandes
Elio Fernandes (169 rep)
Aug 26, 2022, 05:12 PM • Last activity: Aug 28, 2022, 06:58 PM
0 votes
0 answers
51 views
Postgres execute requests isolatedly
Users submit schema and query. I need to execute them and return the result. But if multiple requests have the same table in the schema mentioned, then I am getting conflicts (which is as expected). I can delete this schema/data immediately after execution. No need to store further. Still unable to...
Users submit schema and query. I need to execute them and return the result. But if multiple requests have the same table in the schema mentioned, then I am getting conflicts (which is as expected). I can delete this schema/data immediately after execution. No need to store further. Still unable to achieve for concurrent requests. Is there a way to achieve this?
Forece85 (101 rep)
May 27, 2022, 05:14 AM
0 votes
0 answers
109 views
Is this database correctly structured?
I am in the process of converting a bus timetable to a database structure. I am building a free mobile app for a specific itinerary in my country that will: - let users know when their next bus is due to arrive, - tell them how long it will take to arrive at their destination, - alert them x minutes...
I am in the process of converting a bus timetable to a database structure. I am building a free mobile app for a specific itinerary in my country that will: - let users know when their next bus is due to arrive, - tell them how long it will take to arrive at their destination, - alert them x minutes before their planned trip, etc... The app will be implemented in Python with Kivy, and a SQLite database. It will be able to run offline. This is the original timetable: Original And this is what I have come up with... enter image description here Code: Table stop as S { stop_id int [pk, increment] // auto-increment name varchar coordinates_id int [ref: - C.coordinates_id] } Table coordinates as C { coordinates_id int latitude float longitude float } Table user as U { user_id int [pk, increment] name varchar tax_number int buspass_id int [ref: - B.buspass_id] } Table buspass as B { buspass_id int start_stop int [ref: - S.stop_id] end_stop int [ref: - S.stop_id] validity datetime } Table route as R { route_id int [pk, increment] disabled boolean route_frequency enum route_direction enum route_type enum } Table timetable as TT { timetable_id int [pk, increment] route_id int [ref: > R.route_id] stop_id int [ref: > S.stop_id] time timestamp } //----------------------------------------------// Enum route_direction { coimbra_serpins serpins_coimbra } Enum route_frequency { daily not_saturday wednesday not_sunday_or_holidays not_weekend_or_holidays saturday school [note: "not weekends, holidays or July and August"] not_july_or_august } Enum route_type { direct_miranda direct_lousa semidirect_miranda_lousa semidirect_lousa_serpins } I will need to do a bunch of queries such as: - Find the next departure based on the current time and departure location, and arrival location and arrival time. - List all stops of the ideal route, with arrival times for each. - Know if bus pass is almost out of data, i.e. when it was purchased. I would like to know if this structure seems appropriate for my end goal.
Steffan (101 rep)
Nov 28, 2021, 02:41 AM • Last activity: Nov 29, 2021, 03:32 AM
1 votes
1 answers
250 views
Permission for business logic layer & application
I'm very new into db development and currently working on my first production app. I learned that I would need a business logic layer (BLL) to authenticate and authorize users, for example: John can only query the database while Andrew can insert new records. Following are my questions that required...
I'm very new into db development and currently working on my first production app. I learned that I would need a business logic layer (BLL) to authenticate and authorize users, for example: John can only query the database while Andrew can insert new records. Following are my questions that required clarification: - Does it mean the BLL would have to connect to the database with greatest privilege necessary, instead of least privilege needed for each user? - Will the BLL need INSERT permission to provide service to Andrew, which is more than enough for John? - Can we solve this potential flaw (except by securing BLL better, which I would of course do)? For example, implement authorization in database layer (as described here)?
Ryan (313 rep)
Jan 30, 2017, 10:00 AM • Last activity: Aug 31, 2021, 09:04 AM
0 votes
1 answers
237 views
B2B application: users has its own users
B2B Scenario I am trying is that: The following Roles exists in the system: 1. Super Admin 2. Super Admin Staff 3. Owner (client) 4. Admin (owner i.e. client and creates admin to define sales counters) 5. Sales Counters That means there are users who'll create their own sub users e.g. **[FLAG-A]** R...
B2B Scenario I am trying is that:
The following Roles exists in the system: 1. Super Admin 2. Super Admin Staff 3. Owner (client) 4. Admin (owner i.e. client and creates admin to define sales counters) 5. Sales Counters That means there are users who'll create their own sub users e.g.
**[FLAG-A]** ROLE "SUPER ADMIN" creates a ROLE "Owner" and "OWNER" creates its own sub users ROLES i.e. "ADMN" & "SALES COUNTER". Each role obviously has its info to login to the system I tried to design database as following:
enter image description here In this diagram I am assuming that main user creates sub user that in my view many to many relationship so I have to add table between them as "USER_PERSON". Problem is to set "WHO CREATED USER" in the system. I mean which user has created sub user as described in **[FLAG-A]**
Please check this diagram enter image description here
Murteza (5 rep)
Feb 23, 2021, 09:22 PM • Last activity: Feb 24, 2021, 09:23 AM
Showing page 1 of 20 total questions