Sample Header Ad - 728x90

Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

15 votes
4 answers
6701 views
Foreign keys - link using surrogate or natural key?
Is there a best practice for whether a foreign key between tables should link to a natural key or a surrogate key? The only discussion I've really found (unless my google-fu is lacking) is [Jack Douglas' answer in this question][1], and his reasoning seems sound to me. I'm aware of the discussion be...
Is there a best practice for whether a foreign key between tables should link to a natural key or a surrogate key? The only discussion I've really found (unless my google-fu is lacking) is Jack Douglas' answer in this question , and his reasoning seems sound to me. I'm aware of the discussion beyond that that rules change, but this would be something that would need to be considered in any situation. The main reason for asking is that I have a legacy application that makes uses of FKs with natural keys, but there is a strong push from devlopers to move to an OR/M (NHibernate in our case), and a fork has already produced some breaking changes, so I'm looking to either push them back on track using the natural key, or move the legacy app to use surrogate keys for the FK. My gut says to restore the original FK, but I'm honestly not sure if this is really the right path to follow. The majority of our tables already have both a surrogate and natural key already defined (though unique constraint and PK) so having to add extra columns is a non-issue for us in this insance. We're using SQL Server 2008, but I'd hope this is generic enough for any DB.
Callie J (492 rep)
Apr 2, 2013, 12:17 PM • Last activity: Sep 6, 2023, 07:07 AM
2 votes
1 answers
1049 views
How to get the "best of both worlds" with natural and surrogate keys? And could DBMSs be better?
I am designing my first database, and I find myself frustrated by the choice between storing an integer or a string for each instance of a categorical variable. My understanding is that if I have a table containing cities that I want to make a child of a table of countries, the most performant way t...
I am designing my first database, and I find myself frustrated by the choice between storing an integer or a string for each instance of a categorical variable. My understanding is that if I have a table containing cities that I want to make a child of a table of countries, the most performant way to do that is to have the PK of the countries table as a FK in in the table of cities. However for ease of use and debugging, it's nice to always have the string name associated with the country PK. Every solution I have considered either is not recommended or seems overly complex. I'd like opinions the merits of these approaches (or hear about new ones) and also to understand if it has to be this way or if databases simply are this way because of tradition. Possible approaches: 1. Use a string as a PK for countries. Then I will have a human-readable FK for it in any child tables. Obviously less performant than using integers, but I suspect it may be the least worst way to have the convenience I desire. 2. Create a view using application logic that join each the string name of the country to the states table. - I don't love this because if the application logic breaks, the tables become less readable. Also I would expect large join operations to have an even worse performance penalty than string PK/FKs. 3. Create a separate table to connect numeric IDs with the appropriate string ID. I'm not sure if it would be better to have a table coding each type of relation, or one big table with one big pool of IDs that cover all integer key-string value relations. I could then use application logic to look up the appropriate strings and fill the appropriate PK into the child table when it's string name is given by a user. - I feel like this might be pretty resource intensive too, as there would have to be a lookup every time a new row was added to the child. It also means that I would still have to create the views I want. 4. Use enum data type. Instinctively, this would be my go-to approach, as it seems the ideal balance between natural and synthetic keys: Use integer IDs and give the IDs a string label so that the string itself need not be repeated. - Unfortunately my research has found that this is not recommended. One reason for that is that categories cannot be deleted easily. I'm not sure if that is dealbreaker for me, but I also wonder why DBMSs are designed this way. Aren't categorical variables commonly used enough to add convenience features for them?
Stonecraft (125 rep)
Jul 24, 2022, 06:43 PM • Last activity: Jul 25, 2022, 12:29 PM
1 votes
2 answers
6092 views
PostgreSQL surrogate keys: use sequence/bigint or OID?
I'm looking at a need for surrogate keys in tables I'm designing on a PostgreSQL-derived DBMS ([Citus][1]). Would OIDs suffice? Is there a downside to using them instead of creating a `bigint` field and a sequence? [1]: https://www.citusdata.com/
I'm looking at a need for surrogate keys in tables I'm designing on a PostgreSQL-derived DBMS (Citus ). Would OIDs suffice? Is there a downside to using them instead of creating a bigint field and a sequence?
Andrew Wolfe (584 rep)
Feb 1, 2016, 05:43 PM • Last activity: Sep 27, 2021, 12:04 AM
2 votes
1 answers
2838 views
Identity versus Composite values for Primary Keys in Many-To-Many Tables
I am in the middle of an interesting "Data Architecture" discussion at a place I am contracting & need some input. As a habit, I always create a **Surrogate Key** as the **Primary Key** in my tables - be they Guid (UUID) or Identity values. IMHO the *synthetic-nature* of a Surrogate Key has the dist...
I am in the middle of an interesting "Data Architecture" discussion at a place I am contracting & need some input. As a habit, I always create a **Surrogate Key** as the **Primary Key** in my tables - be they Guid (UUID) or Identity values. IMHO the *synthetic-nature* of a Surrogate Key has the distinct value of identifying a row without intruding on "real" or "natural" keys-or-values that may otherwise be useful **Alternate Keys**. To me...it just seems natural - but maybe I am wrong. Where I happen to be at the moment...they argue this *good* for "core" **Entity** tables, but is *completely incorrect* for **Many-to-Many** (relationship) tables. While I will happily oblige the customer...I disagree. Thoughts behind my approach are: - A primary key value must be unique - The primary key should be as compact as possible - Primary key value should be stable - Primary Key identifies a tuple...not the data INSIDE the tuple **MY QUESTION:** Which is either correct or better? And Why?
- Please provide reasoning etc. **NOTE:**
Please Ignore the ID column naming in the samples below...that is how they do it here. enter image description here
Prisoner ZERO (129 rep)
Jul 6, 2017, 12:24 PM • Last activity: Aug 16, 2021, 11:01 AM
1 votes
0 answers
49 views
How not to "naturalize" surrogate keys?
> Don't naturalize surrogate keys. As soon as you display the value of a surrogate key to your end users, or worse yet allow them to work with the value (perhaps to search), you have effectively given the key business meaning. This in effect naturalizes the key and thereby negates some of the advant...
> Don't naturalize surrogate keys. As soon as you display the value of a surrogate key to your end users, or worse yet allow them to work with the value (perhaps to search), you have effectively given the key business meaning. This in effect naturalizes the key and thereby negates some of the advantages of surrogate keys. > > http://www.agiledata.org/essays/keys.html > Don’t “naturalize” surrogate keys. As soon as you display the value of a surrogate key to your end users, or worse yet allow users to work with the value (perhaps via search), you have effectively given the key business meaning. The exposed key in our database could then be considered a natural key in someone else’s. > > https://begriffs.com/posts/2018-01-01-sql-keys-in-depth.html And yet, every web framework/ORM I am familiar with (e.g. Ruby on Rails) exposes the surrogate key in the URL which is natural key that users share, send, save, bookmark, etc. In light this of this recommendation, what are the best practices for handling web URLs?
Paul Draper (800 rep)
Aug 5, 2020, 07:18 PM
0 votes
1 answers
560 views
Insert results from query, update old existing row if query doesn't still return same value
I am trying to refactor a table which contains historical data and therefore isn't ever deleted, but has an 'active' field. Each day we run a query (a few joins, but quite slow) to get some data to insert into this table: * if the row exists already, and is present in the query, we do nothing * if t...
I am trying to refactor a table which contains historical data and therefore isn't ever deleted, but has an 'active' field. Each day we run a query (a few joins, but quite slow) to get some data to insert into this table: * if the row exists already, and is present in the query, we do nothing * if the row exists but the result isn't in todays query, we set 'active' to _false_ * if the row doesn't exist but is in the query, we insert it (with active defaulting to true) It is possible for an entity to be active, made inactive (same row), then a few days later become active again (would be a new row). The table is using a surrogate key, as the natural key can be repeated as noted above, though there should only ever be a single entry that is active on the natural key. I'm curious if there is an action or 'pattern' I can take that will enable such functionality - I realise I'm asking a lot :) I am using a java library (jOOQ) to generate the sql, so I am only looking for sql as the solution, no PL/pgSQL and no stored procs. I doubt it is possible without 2 passes (1. perform the update where not in query result, 2. perform the insert from query result) but I'm intrigued if there are possibilities. Edit: Example table history, where ** is an update to same row pk_surrogate | natural_id | active| date_made_active 1 | 1 | true | 2020-01-01 -- row for entity 1 added 1 ** | 1 | false | 2020-01-01 -- row for entity 1 no longer present on 2020-01-02, so marked inactive 2 | 1 | true | 2020-01-03 -- new row for entity 1 added when it was present on 2020-01-03 3 | 2 | true | 2020-01-03 --new value for 2020-01-03
Mitch Kent (205 rep)
Jul 28, 2020, 08:07 PM • Last activity: Jul 29, 2020, 07:21 AM
-2 votes
1 answers
53 views
If all associated attributes of an intended primary key are identical, is it still a true primary key?
I'll start with an example. If I have a `person` table with intended surrogate primary key `Id`: +----+------+------------+-------------+ | Id | name | DoB | SSN | +----+------+------------+-------------+ | 1 | John | 1901-01-01 | 111-11-1111 | | 2 | Jane | 1902-02-02 | 222-22-2222 | | 3 | John | 19...
I'll start with an example. If I have a person table with intended surrogate primary key Id:
+----+------+------------+-------------+
| Id | name |    DoB     |     SSN     |
+----+------+------------+-------------+
|  1 | John | 1901-01-01 | 111-11-1111 |
|  2 | Jane | 1902-02-02 | 222-22-2222 |
|  3 | John | 1901-01-01 | 111-11-1111 |
+----+------+------------+-------------+
Note Ids 1 & 3 have the same attributes; They both represent the same person. Now from what we know about the theory behind what constitutes a primary key, which I think is well summarized here : > - The primary key must uniquely identify each record. > - A record’s primary-key value can’t be null. > - The primary key-value must exist when the record is created. > - The primary key must remain stable—you can’t change the primary-key field(s). > - The primary key must be compact and contain the fewest possible attributes. Consider the first bullet, "*The primary key must uniquely identify each record.*" In my example, I suppose whether or not each Id does represent uniqueness depends on what's really supposed to be considered unique. A different database record? Yes. A different person (what the records are supposed to represent)? No. So multiple Ids represent what is functionally the same subject generating the data, present in 2 records. A "two to one Id" of sorts. I've not read anything that directly address the scenario my example illustrates, as is relates to what is or is not a PK. 1. Does this example violate the theory behind what constitutes a primary key? 2. If not, does this example illustrate a violation of any larger principles of database architecture, or can this concept be reduced to something as simple as "duplication of data - clean it up"? Many thanks.
simply_sideways (3 rep)
May 5, 2020, 04:33 PM • Last activity: May 5, 2020, 10:02 PM
2 votes
1 answers
412 views
Should a inherited table have a new surrogate key, or just use parent table's key?
I am modeling some things with an "is a" relationship (aka inheritance), and there were two ways to go about it in regards to the inherited table's primary key and foreign key pointing back to the parent table: - Strategy #1 was to always create a new surrogate key on each inherited table, and then...
I am modeling some things with an "is a" relationship (aka inheritance), and there were two ways to go about it in regards to the inherited table's primary key and foreign key pointing back to the parent table: - Strategy #1 was to always create a new surrogate key on each inherited table, and then store the reference to the parent table's primary key in a separate column (and have unique constraint on that column to ensure a 1 to 1 mapping). - Strategy #2 was to make the primary key value of the inherited table have the same value as the parent table's primary key. I think there is a clear winner here (Strategy #2), but I want to make sure there isn't some potential issue I am missing. Here's the code (targeting SQL Server, but maybe generic) elaborating each strategy: ------Strategy #1: always create a surrogate key for a child table (separate column holds parent table's primary key)---------- drop table if exists TruckAsset drop table if exists VehicleAsset drop table if exists Asset create table Asset ( AssetId int not null, primary key (AssetId) ) create table VehicleAsset ( VehicleAssetId int not null, /*surrogate key*/ AssetId int not null, VehicleRegistrationExpirationDate date not null, /*specialization of Asset*/ primary key (VehicleAssetId), constraint UQ_AssetId unique (AssetId), foreign key (AssetId) references Asset(AssetId) ) create table TruckAsset ( TruckAssetId int not null, /*surrogate key*/ VehicleAssetId int not null, CargoCapacityCubicMeters float not null, /*specialization of VehicleAsset*/ primary key (TruckAssetId), constraint UQ_VehicleAssetId unique (VehicleAssetId), foreign key (VehicleAssetId) references VehicleAsset(VehicleAssetId) ) INSERT INTO Asset(AssetId)VALUES(1234) INSERT INTO VehicleAsset(VehicleAssetId, AssetId, VehicleRegistrationExpirationDate) VALUES (5678, 1234, '2020-01-01') INSERT INTO TruckAsset(TruckAssetId, VehicleAssetId, CargoCapacityCubicMeters) VALUES (9999, 5678, 8.80977) GO versus this ------Strategy #2: primary key references parent table's primary key directly---------- drop table if exists TruckAsset drop table if exists VehicleAsset drop table if exists Asset create table Asset ( AssetId int not null, primary key (AssetId) ) create table VehicleAsset ( VehicleAssetId int not null, VehicleRegistrationExpirationDate date not null, /*specialization of Asset*/ primary key (VehicleAssetId), foreign key (VehicleAssetId) references Asset(AssetId) ) create table TruckAsset ( TruckAssetId int not null, CargoCapacityCubicMeters float not null, /*specialization of VehicleAsset*/ primary key (TruckAssetId), foreign key (TruckAssetId) references VehicleAsset(VehicleAssetId) ) INSERT INTO Asset(AssetId)VALUES(1234) INSERT INTO VehicleAsset(VehicleAssetId, VehicleRegistrationExpirationDate) VALUES (1234, '2020-01-01') INSERT INTO TruckAsset(TruckAssetId, CargoCapacityCubicMeters) VALUES (1234, 8.80977) ----------------------------------------
Anssssss (248 rep)
Dec 5, 2019, 10:01 PM • Last activity: Dec 6, 2019, 06:00 PM
2 votes
1 answers
583 views
Natural Keys vs Surrogate Keys part 2
A while back, I asked if [surrogate keys provide better performance than natural keys in SQL Server](https://dba.stackexchange.com/questions/50708/do-natural-keys-provide-higher-or-lower-performance-in-sql-server-than-surrogate). [@sqlvogel](https://dba.stackexchange.com/users/1357/sqlvogel) provide...
A while back, I asked if [surrogate keys provide better performance than natural keys in SQL Server](https://dba.stackexchange.com/questions/50708/do-natural-keys-provide-higher-or-lower-performance-in-sql-server-than-surrogate) . [@sqlvogel](https://dba.stackexchange.com/users/1357/sqlvogel) provided an answer to that question yesterday that caused me to revisit it. This question is an attempt to "upgrade" the prior question, and hopefully provide the opportunity for thoughtful answers that help the community. Consider a system for storing details about computers. Each computer has an architecture, and an Operating System. In SQL Server, we could create these tables using natural keys like this: CREATE TABLE dbo.Architecture ( ArchitectureName varchar(10) NOT NULL , ArchitectureVersion decimal(5,2) NOT NULL , ReleaseDate date NOT NULL , CONSTRAINT PK_Architecture PRIMARY KEY CLUSTERED (ArchitectureName, ArchitectureVersion) ); CREATE TABLE dbo.Manufacturer ( ManufacturerName varchar(10) NOT NULL CONSTRAINT PK_Manufacturer PRIMARY KEY CLUSTERED ); CREATE TABLE dbo.OS ( OSName varchar(30) NOT NULL , ManufacturerName varchar(10) NOT NULL CONSTRAINT FK_OS_Manufacturer FOREIGN KEY (ManufacturerName) REFERENCES dbo.Manufacturer(ManufacturerName) , ArchitectureName varchar(10) NOT NULL , ArchitectureVersion decimal(5,2) NOT NULL , CONSTRAINT FK_OS_Architecture FOREIGN KEY (ArchitectureName, ArchitectureVersion) REFERENCES dbo.Architecture(ArchitectureName, ArchitectureVersion) , CONSTRAINT PK_OS PRIMARY KEY CLUSTERED (OSName) ); CREATE TABLE dbo.Computers ( ComputerID varchar(10) NOT NULL CONSTRAINT PK_Computers PRIMARY KEY CLUSTERED , OSName varchar(30) NOT NULL CONSTRAINT FK_Computers_OSName FOREIGN KEY REFERENCES dbo.OS(OSName) , ComputerManufacturerName varchar(10) NOT NULL CONSTRAINT FK_Computers_Manufacturer FOREIGN KEY REFERENCES dbo.Manufacturer(ManufacturerName) , EffectiveDate datetime NOT NULL CONSTRAINT DF_Computers_EffectiveDate DEFAULT (GETDATE()) , ExpiryDate datetime NULL ); To query the dbo.Computers table, with 2 rows in dbo.Computers, showing various details, we could do this: SELECT Computers.ComputerID , Computers.ComputerManufacturerName , OSManufacturer = OS.ManufacturerName , Computers.OSName , OS.ArchitectureName , OS.ArchitectureVersion FROM dbo.Computers INNER JOIN dbo.OS ON Computers.OSName = OS.OSName WHERE Computers.EffectiveDate = GETDATE() OR Computers.ExpiryDate IS NULL) ORDER BY Computers.ComputerID; The query output is:
╔════════════╦══════════════════════════╦════════════════╦════════════╦══════════════════╦═════════════════════╗
║ ComputerID ║ ComputerManufacturerName ║ OSManufacturer ║ OSName     ║ ArchitectureName ║ ArchitectureVersion ║
╠════════════╬══════════════════════════╬════════════════╬════════════╬══════════════════╬═════════════════════╣
║ CM700-01   ║ HP                       ║ Microsoft      ║ Windows 10 ║ x64              ║ 1.00                ║
║ CM700-02   ║ HP                       ║ Microsoft      ║ Windows 10 ║ x64              ║ 1.00                ║
╚════════════╩══════════════════════════╩════════════════╩════════════╩══════════════════╩═════════════════════╝
The query plan for this is quite simple: enter image description here Or, if we choose to use surrogate keys, like this: CREATE TABLE dbo.Architecture ( ArchitectureID int NOT NULL IDENTITY(1,1) CONSTRAINT PK_Architecture PRIMARY KEY CLUSTERED , ArchitectureName varchar(10) NOT NULL , ArchitectureVersion decimal(5,2) NOT NULL , ReleaseDate date NOT NULL , CONSTRAINT UQ_Architecture_Name UNIQUE (ArchitectureName, ArchitectureVersion) ); CREATE TABLE dbo.Manufacturer ( ManufacturerID int NOT NULL IDENTITY(1,1) CONSTRAINT PK_Manufacturer PRIMARY KEY CLUSTERED , ManufacturerName varchar(10) NOT NULL ); CREATE TABLE dbo.OS ( OS_ID int NOT NULL IDENTITY(1,1) CONSTRAINT PK_OS PRIMARY KEY CLUSTERED , OSName varchar(30) NOT NULL CONSTRAINT UQ_OS_Name UNIQUE , ManufacturerID int NOT NULL CONSTRAINT FK_OS_Manufacturer FOREIGN KEY REFERENCES dbo.Manufacturer(ManufacturerID) , ArchitectureID int NOT NULL CONSTRAINT FK_OS_Architecture FOREIGN KEY REFERENCES dbo.Architecture(ArchitectureID) ); CREATE TABLE dbo.Computers ( ComputerID int NOT NULL IDENTITY(1,1) CONSTRAINT PK_Computers PRIMARY KEY CLUSTERED , ComputerName varchar(10) NOT NULL CONSTRAINT UQ_Computers_Name UNIQUE , OS_ID int NOT NULL CONSTRAINT FK_Computers_OS FOREIGN KEY REFERENCES dbo.OS(OS_ID) , ComputerManufacturerID int NOT NULL CONSTRAINT FK_Computers_Manufacturer FOREIGN KEY REFERENCES dbo.Manufacturer(ManufacturerID) , EffectiveDate datetime NOT NULL CONSTRAINT DF_Computers_EffectiveDate DEFAULT (GETDATE()) , ExpiryDate datetime NULL ); In the design above, you may notice we have to include several new unique constraints to ensure our data model is consistent across both approaches. Querying this surrogate-key approach with 2 rows in dbo.Computers looks like: SELECT Computers.ComputerName , ComputerManufacturerName = cm.ManufacturerName , OSManufacturer = om.ManufacturerName , OS.OSName , Architecture.ArchitectureName , Architecture.ArchitectureVersion FROM dbo.Computers INNER JOIN dbo.OS ON Computers.OS_ID = OS.OS_ID INNER JOIN dbo.Manufacturer cm ON Computers.ComputerManufacturerID = cm.ManufacturerID INNER JOIN dbo.Architecture ON OS.ArchitectureID = Architecture.ArchitectureID INNER JOIN dbo.Manufacturer om ON OS.ManufacturerID = om.ManufacturerID WHERE Computers.EffectiveDate = GETDATE() OR Computers.ExpiryDate IS NULL) ORDER BY Computers.ComputerID; The results:
╔══════════════╦══════════════════════════╦════════════════╦════════════╦══════════════════╦═════════════════════╗
║ ComputerName ║ ComputerManufacturerName ║ OSManufacturer ║ OSName     ║ ArchitectureName ║ ArchitectureVersion ║
╠══════════════╬══════════════════════════╬════════════════╬════════════╬══════════════════╬═════════════════════╣
║ CM700-01     ║ HP                       ║ Microsoft      ║ Windows 10 ║ x64              ║ 1.00                ║
║ CM700-02     ║ HP                       ║ Microsoft      ║ Windows 10 ║ x64              ║ 1.00                ║
╚══════════════╩══════════════════════════╩════════════════╩════════════╩══════════════════╩═════════════════════╝
enter image description here The I/O statistics are even more telling. For the natural keys, we have:
Table 'OS'. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  
Table 'Computers'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  
For the surrogate key setup, we get:
Table 'Manufacturer'. Scan count 0, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Architecture'. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'OS'. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Computers'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Quite clearly, in the above admittedly very simple setup, the surrogate key is lagging in both ease-of-use, and performance. Having said that, what happens if we need to change the name of one of the manufacturers? Here's the T-SQL for the natural key version: UPDATE dbo.Manufacturer SET ManufacturerName = 'Microsoft­™' WHERE ManufacturerName = 'Microsoft'; And the plan: enter image description here The T-SQL for the surrogate key version: UPDATE dbo.Manufacturer SET ManufacturerName = 'Microsoft­™' WHERE ManufacturerID = 1; And that plan: enter image description here The natural key version has an estimated subtree cost that is nearly three times greater than the surrogate key version. Am I correct in saying that both natural keys and surrogate keys offer benefits; deciding which methodology to use should be carefully considered? Are there common situations where the comparisons I made above don't work? What other considerations should be made when choosing natural or surrogate keys?
Hannah Vernon (70988 rep)
Aug 30, 2017, 08:04 PM • Last activity: Oct 22, 2019, 06:02 AM
1 votes
2 answers
2823 views
What are the disadvantages of using surrogate keys?
I am using MS SQL Server but in general database design I want to know what problems can arise when every row in a database has its auto generated surrogate key value. I know some advantages, for example that there is no need to identify unique columns without NULLs for primary keys, and there is no...
I am using MS SQL Server but in general database design I want to know what problems can arise when every row in a database has its auto generated surrogate key value. I know some advantages, for example that there is no need to identify unique columns without NULLs for primary keys, and there is no need to manage composite primary keys, normal forms are easier to manage, and uniqueness is guaranteed. I want to know, is there any good reason regarding to performance, or index structures, etc. that should make us use real world factful keys instead of surrogate ones? Thanks.
igelr (2162 rep)
Mar 19, 2019, 07:15 PM • Last activity: Mar 25, 2019, 08:21 PM
1 votes
1 answers
6712 views
What problems are there in not using a primary key column at all?
What purpose would a primary key column serve on a table such as `post_versions` here? Nothing refers to it, and there are no queries where I will ever want to select a row by `post_versions.id`. It'll be joined to `posts` in most queries. CREATE TABLE posts ( id SERIAL PRIMARY KEY, created TIMESTAM...
What purpose would a primary key column serve on a table such as post_versions here? Nothing refers to it, and there are no queries where I will ever want to select a row by post_versions.id. It'll be joined to posts in most queries. CREATE TABLE posts ( id SERIAL PRIMARY KEY, created TIMESTAMPTZ NOT NULL DEFAULT now(), user_id INTEGER NOT NULL REFERENCES users ON DELETE RESTRICT ); CREATE TABLE post_versions ( id SERIAL PRIMARY KEY, -- serves no purpose, could remove I think post_id INTEGER NOT NULL REFERENCES posts ON DELETE CASCADE, updated TIMESTAMPTZ NOT NULL DEFAULT now(), body TEXT NOT NULL ); Are there any problems if I just remove it?
davidtgq (759 rep)
Mar 4, 2019, 02:06 PM • Last activity: Mar 4, 2019, 04:36 PM
2 votes
2 answers
742 views
Composite primary key plus a separate (surrogate) id column for foreign key references
I am trying to build a database about *locations* like so: - Country → Region → Town Countries will be populated with a list of pre-defined worldwide countries, regions & towns are input by users. Region may not apply to every location and so a blank string will be allowed (hence the many to many li...
I am trying to build a database about *locations* like so: - Country → Region → Town Countries will be populated with a list of pre-defined worldwide countries, regions & towns are input by users. Region may not apply to every location and so a blank string will be allowed (hence the many to many link table). This is the best design I have come up with: ER Diagram The town_id column is set to AUTO_INCREMENT & UNIQUE. It is then used as a foreign key reference in other tables. The town column on its own is not UNIQUE as you may have the same town name in multiple places. Is the towns table correct? The reason I ask is because it just seems odd to me to have the town_id as the reference, however this cannot be included or used as the primary key (because if it did, it would cause duplicates)? I would welcome feedback if this approach is correct.
the_peacock (123 rep)
Jun 3, 2014, 07:00 PM • Last activity: Feb 19, 2019, 10:46 PM
3 votes
1 answers
449 views
Does adding a surrogate key get rid of an identifying relationship between two entities?
In the following example, we have a `Comments` table with a composite primary key (PK) CREATE TABLE Comments ( video_id INT NOT NULL, user_id INT NOT NULL, comment_dt DATETIME NOT NULL, PRIMARY KEY (video_id, user_id, comment_dt), FOREIGN KEY (video_id) REFERENCES Videos(video_id), FOREIGN KEY (user...
In the following example, we have a Comments table with a composite primary key (PK) CREATE TABLE Comments ( video_id INT NOT NULL, user_id INT NOT NULL, comment_dt DATETIME NOT NULL, PRIMARY KEY (video_id, user_id, comment_dt), FOREIGN KEY (video_id) REFERENCES Videos(video_id), FOREIGN KEY (user_id) REFERENCES Users(user_id) ); There appears to be an identifying relationship, as the primary key of Comments includes its foreign keys. However, watch what happens when you give this Comments table a surrogate primary key of its own: CREATE TABLE Comments ( comment_id SERIAL PRIMARY KEY, video_id INT NOT NULL, user_id INT NOT NULL, comment_dt DATETIME NOT NULL, FOREIGN KEY (video_id) REFERENCES Videos(video_id), FOREIGN KEY (user_id) REFERENCES Users(user_id) ); Does that get rid of the identifying relationship? Logically, it still seems to be there, so does this really come down to the *physical* model and whether or not you want to include a composite key? Are identifying relationships even necessary?
brennvo (91 rep)
Feb 1, 2019, 08:06 PM • Last activity: Feb 2, 2019, 08:22 PM
0 votes
1 answers
67 views
M:M Table Design - Use of PK
I have a many-to-many table that contains id's from 2 related tables. For creating a primary key on this table, is it better to create another integer column and have that as a surrogate PK? Or should I create the PK on both id columns? What are the benefits of either method? Here is the table desig...
I have a many-to-many table that contains id's from 2 related tables. For creating a primary key on this table, is it better to create another integer column and have that as a surrogate PK? Or should I create the PK on both id columns? What are the benefits of either method? Here is the table design with the surrogate key (CodeGroupMappingID) CREATE TABLE [dbo].[CodeGroupMappings]( [CodeGroupMappingID] [int] IDENTITY(1,1) NOT NULL, [CodeID] [int] NOT NULL, [GroupID] [int] NOT NULL, CONSTRAINT [PK_CodeGroupMappings_CodeGroupMappingID] PRIMARY KEY CLUSTERED ([CodeGroupMappingID] ASC) )
Kevin (533 rep)
May 22, 2018, 09:41 AM • Last activity: May 22, 2018, 08:51 PM
29 votes
3 answers
6548 views
Do natural keys provide higher or lower performance in SQL Server than surrogate integer keys?
I'm a fan of surrogate keys. There is a risk my findings are confirmation biased. Many questions I've seen both here and at http://stackoverflow.com use natural keys instead of surrogate keys based on `IDENTITY()` values. My background in computer systems tells me performing any comparative operatio...
I'm a fan of surrogate keys. There is a risk my findings are confirmation biased. Many questions I've seen both here and at http://stackoverflow.com use natural keys instead of surrogate keys based on IDENTITY() values. My background in computer systems tells me performing any comparative operation on an integer will be faster than comparing strings. This comment made me question my beliefs, so I thought I would create a system to investigate my thesis that integers are faster than strings for use as keys in SQL Server. Since there is likely to be very little discernible difference in small datasets, I immediately thought of a two table setup where the primary table has 1,000,000 rows and the secondary table has 10 rows for each row in the primary table for a total of 10,000,000 rows in the secondary table. The premise of my test is to create two sets of tables like this, one using natural keys and one using integer keys, and run timing tests on a simple query like: SELECT * FROM Table1 INNER JOIN Table2 ON Table1.Key = Table2.Key; The following is the code I created as a test bed: USE Master; IF (SELECT COUNT(database_id) FROM sys.databases d WHERE d.name = 'NaturalKeyTest') = 1 BEGIN ALTER DATABASE NaturalKeyTest SET SINGLE_USER WITH ROLLBACK IMMEDIATE; DROP DATABASE NaturalKeyTest; END GO CREATE DATABASE NaturalKeyTest ON (NAME = 'NaturalKeyTest', FILENAME = 'C:\SQLServer\Data\NaturalKeyTest.mdf', SIZE=8GB, FILEGROWTH=1GB) LOG ON (NAME='NaturalKeyTestLog', FILENAME = 'C:\SQLServer\Logs\NaturalKeyTest.mdf', SIZE=256MB, FILEGROWTH=128MB); GO ALTER DATABASE NaturalKeyTest SET RECOVERY SIMPLE; GO USE NaturalKeyTest; GO CREATE VIEW GetRand AS SELECT RAND() AS RandomNumber; GO CREATE FUNCTION RandomString ( @StringLength INT ) RETURNS NVARCHAR(max) AS BEGIN DECLARE @cnt INT = 0 DECLARE @str NVARCHAR(MAX) = ''; DECLARE @RandomNum FLOAT = 0; WHILE @cnt < @StringLength BEGIN SELECT @RandomNum = RandomNumber FROM GetRand; SET @str = @str + CAST(CHAR((@RandomNum * 64.) + 32) AS NVARCHAR(MAX)); SET @cnt = @cnt + 1; END RETURN @str; END; GO CREATE TABLE NaturalTable1 ( NaturalTable1Key NVARCHAR(255) NOT NULL CONSTRAINT PK_NaturalTable1 PRIMARY KEY CLUSTERED , Table1TestData NVARCHAR(255) NOT NULL ); CREATE TABLE NaturalTable2 ( NaturalTable2Key NVARCHAR(255) NOT NULL CONSTRAINT PK_NaturalTable2 PRIMARY KEY CLUSTERED , NaturalTable1Key NVARCHAR(255) NOT NULL CONSTRAINT FK_NaturalTable2_NaturalTable1Key FOREIGN KEY REFERENCES dbo.NaturalTable1 (NaturalTable1Key) ON DELETE CASCADE ON UPDATE CASCADE , Table2TestData NVARCHAR(255) NOT NULL ); GO /* insert 1,000,000 rows into NaturalTable1 */ INSERT INTO NaturalTable1 (NaturalTable1Key, Table1TestData) VALUES (dbo.RandomString(25), dbo.RandomString(100)); GO 1000000 /* insert 10,000,000 rows into NaturalTable2 */ INSERT INTO NaturalTable2 (NaturalTable2Key, NaturalTable1Key, Table2TestData) SELECT dbo.RandomString(25), T1.NaturalTable1Key, dbo.RandomString(100) FROM NaturalTable1 T1 GO 10 CREATE TABLE IDTable1 ( IDTable1Key INT NOT NULL CONSTRAINT PK_IDTable1 PRIMARY KEY CLUSTERED IDENTITY(1,1) , Table1TestData NVARCHAR(255) NOT NULL CONSTRAINT DF_IDTable1_TestData DEFAULT dbo.RandomString(100) ); CREATE TABLE IDTable2 ( IDTable2Key INT NOT NULL CONSTRAINT PK_IDTable2 PRIMARY KEY CLUSTERED IDENTITY(1,1) , IDTable1Key INT NOT NULL CONSTRAINT FK_IDTable2_IDTable1Key FOREIGN KEY REFERENCES dbo.IDTable1 (IDTable1Key) ON DELETE CASCADE ON UPDATE CASCADE , Table2TestData NVARCHAR(255) NOT NULL CONSTRAINT DF_IDTable2_TestData DEFAULT dbo.RandomString(100) ); GO INSERT INTO IDTable1 DEFAULT VALUES; GO 1000000 INSERT INTO IDTable2 (IDTable1Key) SELECT T1.IDTable1Key FROM IDTable1 T1 GO 10 The code above creates a database and 4 tables, and fills the tables with data, ready to test. The test code I ran is: USE NaturalKeyTest; GO DECLARE @loops INT = 0; DECLARE @MaxLoops INT = 10; DECLARE @Results TABLE ( FinishedAt DATETIME DEFAULT (GETDATE()) , KeyType NVARCHAR(255) , ElapsedTime FLOAT ); WHILE @loops < @MaxLoops BEGIN DBCC FREEPROCCACHE; DBCC FREESESSIONCACHE; DBCC FREESYSTEMCACHE ('ALL'); DBCC DROPCLEANBUFFERS; WAITFOR DELAY '00:00:05'; DECLARE @start DATETIME = GETDATE(); DECLARE @end DATETIME; DECLARE @count INT; SELECT @count = COUNT(*) FROM dbo.NaturalTable1 T1 INNER JOIN dbo.NaturalTable2 T2 ON T1.NaturalTable1Key = T2.NaturalTable1Key; SET @end = GETDATE(); INSERT INTO @Results (KeyType, ElapsedTime) SELECT 'Natural PK' AS KeyType, CAST((@end - @start) AS FLOAT) AS ElapsedTime; DBCC FREEPROCCACHE; DBCC FREESESSIONCACHE; DBCC FREESYSTEMCACHE ('ALL'); DBCC DROPCLEANBUFFERS; WAITFOR DELAY '00:00:05'; SET @start = GETDATE(); SELECT @count = COUNT(*) FROM dbo.IDTable1 T1 INNER JOIN dbo.IDTable2 T2 ON T1.IDTable1Key = T2.IDTable1Key; SET @end = GETDATE(); INSERT INTO @Results (KeyType, ElapsedTime) SELECT 'IDENTITY() PK' AS KeyType, CAST((@end - @start) AS FLOAT) AS ElapsedTime; SET @loops = @loops + 1; END SELECT KeyType, FORMAT(CAST(AVG(ElapsedTime) AS DATETIME), 'HH:mm:ss.fff') AS AvgTime FROM @Results GROUP BY KeyType; These are the results: enter image description here Am I doing something wrong here, or are INT keys 3 times faster than 25 character natural keys? Note, I've written a follow-up question [here](https://dba.stackexchange.com/questions/184756/natural-keys-vs-surrogate-keys-part-2-the-showdown) .
Hannah Vernon (70988 rep)
Sep 29, 2013, 01:56 AM • Last activity: Mar 18, 2018, 03:22 PM
40 votes
3 answers
7197 views
Should every table have a single-field surrogate/artificial primary key?
I understand one benefit of surrogate/artificial keys in general - they do not change and that can be very convenient. This is true whether they are single or multiple field - as long as they are 'artificial'. However, it sometimes seems to be a matter of policy to have an auto-incrementing integer...
I understand one benefit of surrogate/artificial keys in general - they do not change and that can be very convenient. This is true whether they are single or multiple field - as long as they are 'artificial'. However, it sometimes seems to be a matter of policy to have an auto-incrementing integer field as the primary key of each table. Is this always the best idea to have such a **single-field** key and why (or why not)? To be clear, this question is not about artificial vs natural - but about whether all artificial keys should be **single-field**
Jack Douglas (40537 rep)
Sep 23, 2011, 09:25 PM • Last activity: Mar 18, 2018, 12:50 PM
8 votes
5 answers
5853 views
What normal form does a surrogate key violate?
I have following question: "What normal form does a surrogate key violate?" My thought was the 3rd normal form, but I'm not quite sure it's just an assumption i am making. Could someone explain that to me?
I have following question: "What normal form does a surrogate key violate?" My thought was the 3rd normal form, but I'm not quite sure it's just an assumption i am making. Could someone explain that to me?
Marco (91 rep)
Apr 16, 2012, 08:35 AM • Last activity: Mar 18, 2018, 12:48 PM
5 votes
2 answers
2189 views
Surrogate key vs Natural key
I have a table called devices. Most of the devices that will get stored in this table can be uniquely identified by their serial number and part number. But there are some device types that do not have serial number and part number assigned to them. Instead they can be uniquely identified by another...
I have a table called devices. Most of the devices that will get stored in this table can be uniquely identified by their serial number and part number. But there are some device types that do not have serial number and part number assigned to them. Instead they can be uniquely identified by another field (internal id). Should I create a surrogate key for this table or should I create a composite primary key (serial number, part number, internal id) and insert default values to the serial number and part number columns when they are not supplied? The device types that do not have part number and serial number now, will have the numbers assigned to them in the future releases (may be 5 years from now). Should I create a surrogate key or a composite key in this scenario? Or using the three unique attributes, should I create a hash in the program and use that as a surrogate key for the tables?
DBK (378 rep)
Aug 26, 2017, 04:07 AM • Last activity: Mar 18, 2018, 12:43 PM
0 votes
2 answers
155 views
Common term for tenant-based keys
Assume the following schema: ID&#160;&#160;| TENANT | TENANT_CUSTOM_ID 346 | 101&#160;&#160;&#160;&#160;| 1 347 | 102&#160;&#160;&#160;&#160;| 1 348 | 101&#160;&#160;&#160;&#160;| 2 349 | 101&#160;&#160;&#160;&#160;| 3 350 | 101&#160;&#160;&#160;&#160;| 4 351 | 102&#160;&#160;&#160;&#160;| 2 352 | 1...
Assume the following schema: ID  | TENANT | TENANT_CUSTOM_ID 346 | 101    | 1 347 | 102    | 1 348 | 101    | 2 349 | 101    | 3 350 | 101    | 4 351 | 102    | 2 352 | 103    | 1 **What is another name (or the proper classification) for TENANT_CUSTOM_ID?** Perhaps something along the lines of a surrogate key? The purpose of this key is to make each tenant have their own numbering system instead of knowing that their records are mixed with all other tenants on the same tables.
1234567 (159 rep)
Jun 2, 2017, 12:00 AM • Last activity: Jun 2, 2017, 09:01 PM
1 votes
1 answers
1003 views
What are the consequences, positive or negative, to having a surrogate primary key for a table which already has a guaranteed unique column?
I recently made a table that looks like the one below CREATE TABLE example( id SERIAL PRIMARY KEY, subindusty_id INT UNIQUE NOT NULL, name TEXT NOT NULL -- other fields... ) I realized that in this table I could just remove the column called `id` and make `subindustry_id` the primary key. Thinking b...
I recently made a table that looks like the one below CREATE TABLE example( id SERIAL PRIMARY KEY, subindusty_id INT UNIQUE NOT NULL, name TEXT NOT NULL -- other fields... ) I realized that in this table I could just remove the column called id and make subindustry_id the primary key. Thinking back, the only reason I can think of for why I didn't make subindustry_id the primary key is for the sake of consistency, considering that I have a lot of other tables with a column called id. Except for things like consistency*, are there any tangible benefits/downsides to having this additional id column as the primary key when there's already another column that'd be a good primary key? \*Consistency as in "that's the way the rest of the tables look". The kind where you're enforcing a certain type of style. I guess the point of this question is to question the practice of giving every single table an id column regardless of the fact that they might already have a more suitable candidate for a primary key.
m0meni (147 rep)
Jan 19, 2017, 06:26 PM • Last activity: Jan 20, 2017, 05:53 AM
Showing page 1 of 20 total questions