Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

15 votes

4 answers

6701 views

Foreign keys - link using surrogate or natural key?

database-design foreign-key surrogate-key natural-key

Is there a best practice for whether a foreign key between tables should link to a natural key or a surrogate key? The only discussion I've really found (unless my google-fu is lacking) is [Jack Douglas' answer in this question][1], and his reasoning seems sound to me. I'm aware of the discussion be...

                                  Is there a best practice for whether a foreign key between tables should link to a natural key or a surrogate key? The only discussion I've really found (unless my google-fu is lacking) is Jack Douglas' answer in this question , and his reasoning seems sound to me. I'm aware of the discussion beyond that that rules change, but this would be something that would need to be considered in any situation.

The main reason for asking is that I have a legacy application that makes uses of FKs with natural keys, but there is a strong push from devlopers to move to an OR/M (NHibernate in our case), and a fork has already produced some breaking changes, so I'm looking to either push them back on track using the natural key, or move the legacy app to use surrogate keys for the FK. My gut says to restore the original FK, but I'm honestly not sure if this is really the right path to follow.

The majority of our tables already have both a surrogate and natural key already defined (though unique constraint and PK) so having to add extra columns is a non-issue for us in this insance. We're using SQL Server 2008, but I'd hope this is generic enough for any DB.

Callie J (492 rep)

Apr 2, 2013, 12:17 PM • Last activity: Sep 6, 2023, 07:07 AM

2 votes

1 answers

1049 views

How to get the "best of both worlds" with natural and surrogate keys? And could DBMSs be better?

foreign-key primary-key enum surrogate-key natural-key

I am designing my first database, and I find myself frustrated by the choice between storing an integer or a string for each instance of a categorical variable. My understanding is that if I have a table containing cities that I want to make a child of a table of countries, the most performant way t...

                                  I am designing my first database, and I find myself frustrated by the choice between storing an integer or a string for each instance of a categorical variable. 

My understanding is that if I have a table containing cities that I want to make a child of a table of countries, the most performant way to do that is to have the PK of the countries table as a FK in in the table of cities. However for ease of use and debugging, it's nice to always have the string name associated with the country PK. Every solution I have considered either is not recommended or seems overly complex.

I'd like opinions the merits of these approaches (or hear about new ones) and also to understand if it has to be this way or if databases simply are this way because of tradition.

Possible approaches:

1. Use a string as a PK for countries. Then I will have a human-readable FK for it in any child tables. Obviously less performant than using integers, but I suspect it may be the least worst way to have the convenience I desire.

2. Create a view using application logic that join each the string name of the country to the states table. 
- I don't love this because if the application logic breaks, the tables become less readable. Also I would expect large join operations to have an even worse performance penalty than string PK/FKs.

3. Create a separate table to connect numeric IDs with the appropriate string ID. I'm not sure if it would be better to have a table coding each type of relation, or one big table with one big pool of IDs that cover all integer key-string value relations. I could then use application logic to look up the appropriate strings and fill the appropriate PK into the child table when it's string name is given by a user. 
- I feel like this might be pretty resource intensive too, as there would have to be a lookup every time a new row was added to the child. It also means that I would still have to create the views I want.

4. Use enum data type. Instinctively, this would be my go-to approach, as it seems the ideal balance between natural and synthetic keys: Use integer IDs and give the IDs a string label so that the string itself need not be repeated. 
- Unfortunately my research has found that this is not recommended. One reason for that is that categories cannot be deleted easily. I'm not sure if that is  dealbreaker for me, but I also wonder why DBMSs are designed this way. Aren't categorical variables commonly used enough to add convenience features for them?

Stonecraft (125 rep)

Jul 24, 2022, 06:43 PM • Last activity: Jul 25, 2022, 12:29 PM

1 votes

2 answers

6092 views

PostgreSQL surrogate keys: use sequence/bigint or OID?

postgresql database-design sequence surrogate-key citus

I'm looking at a need for surrogate keys in tables I'm designing on a PostgreSQL-derived DBMS ([Citus][1]). Would OIDs suffice? Is there a downside to using them instead of creating a `bigint` field and a sequence? [1]: https://www.citusdata.com/

                                  I'm looking at a need for surrogate keys in tables I'm designing on a PostgreSQL-derived DBMS (Citus ). Would OIDs suffice? Is there a downside to using them instead of creating a bigint field and a sequence?
                                

Andrew Wolfe (584 rep)

Feb 1, 2016, 05:43 PM • Last activity: Sep 27, 2021, 12:04 AM

2 votes

1 answers

2838 views

Identity versus Composite values for Primary Keys in Many-To-Many Tables

sql-server database-design primary-key surrogate-key

I am in the middle of an interesting "Data Architecture" discussion at a place I am contracting & need some input. As a habit, I always create a **Surrogate Key** as the **Primary Key** in my tables - be they Guid (UUID) or Identity values. IMHO the *synthetic-nature* of a Surrogate Key has the dist...

                                  I am in the middle of an interesting "Data Architecture" discussion at a place I am contracting & need some input.

As a habit, I always create a **Surrogate Key** as the **Primary Key** in my tables - be they Guid (UUID) or Identity values.  IMHO the *synthetic-nature* of a Surrogate Key has the distinct value of identifying a row without intruding on "real" or "natural" keys-or-values that may otherwise be useful **Alternate Keys**.

To me...it just seems natural - but maybe I am wrong.

Where I happen to be at the moment...they argue this *good* for "core" **Entity** tables, but is *completely incorrect* for **Many-to-Many** (relationship) tables.  While I will happily oblige the customer...I disagree.

Thoughts behind my approach are:

 - A primary key value must be unique
 - The primary key should be as compact as possible
 - Primary key value should be stable
 - Primary Key identifies a tuple...not the data INSIDE the tuple

**MY QUESTION:** Which is either correct or better?  And Why?

- Please provide reasoning etc.

**NOTE:**

Please Ignore the ID column naming in the samples below...that is how they do it here.

Prisoner ZERO (129 rep)

Jul 6, 2017, 12:24 PM • Last activity: Aug 16, 2021, 11:01 AM

1 votes

0 answers

49 views

How not to "naturalize" surrogate keys?

primary-key surrogate-key web-service

> Don't naturalize surrogate keys. As soon as you display the value of a surrogate key to your end users, or worse yet allow them to work with the value (perhaps to search), you have effectively given the key business meaning. This in effect naturalizes the key and thereby negates some of the advant...

                                  > Don't naturalize surrogate keys. As soon as you display the value of a surrogate key to your end users, or worse yet allow them to work with the value (perhaps to search), you have effectively given the key business meaning. This in effect naturalizes the key and thereby negates some of the advantages of surrogate keys.
>
> http://www.agiledata.org/essays/keys.html 

>  Don’t “naturalize” surrogate keys. As soon as you display the value of a surrogate key to your end users, or worse yet allow users to work with the value (perhaps via search), you have effectively given the key business meaning. The exposed key in our database could then be considered a natural key in someone else’s.
>
> https://begriffs.com/posts/2018-01-01-sql-keys-in-depth.html 

And yet, every web framework/ORM I am familiar with (e.g. Ruby on Rails) exposes the surrogate key in the URL which is natural key that users share, send, save, bookmark, etc.

In light this of this recommendation, what are the best practices for handling web URLs?
                                

Paul Draper (800 rep)

Aug 5, 2020, 07:18 PM

0 votes

1 answers

560 views

Insert results from query, update old existing row if query doesn't still return same value

postgresql upsert surrogate-key

I am trying to refactor a table which contains historical data and therefore isn't ever deleted, but has an 'active' field. Each day we run a query (a few joins, but quite slow) to get some data to insert into this table: * if the row exists already, and is present in the query, we do nothing * if t...

                                  I am trying to refactor a table which contains historical data and therefore isn't ever deleted, but has an 'active' field. 

Each day we run a query (a few joins, but quite slow) to get some data to insert into this table:

* if the row exists already, and is present in the query, we do nothing
* if the row exists but the result isn't in todays query, we set 'active' to _false_
* if the row doesn't exist but is in the query, we insert it (with active defaulting to true)

It is possible for an entity to be active, made inactive (same row), then a few days later become active again (would be a new row).

The table is using a surrogate key, as the natural key can be repeated as noted above, though there should only ever be a single entry that is active on the natural key.

I'm curious if there is an action or 'pattern' I can take that will enable such functionality - I realise I'm asking a lot :) 

I am using a java library (jOOQ) to generate the sql, so I am only looking for sql as the solution, no PL/pgSQL and no stored procs. I doubt it is possible without 2 passes (1. perform the update where not in query result, 2. perform the insert from query result) but I'm intrigued if there are possibilities.

Edit: Example table history, where ** is an update to same row

    pk_surrogate | natural_id | active| date_made_active
    1            | 1          | true  | 2020-01-01 -- row for entity 1 added
    1 **         | 1          | false | 2020-01-01 -- row for entity 1 no longer present on 2020-01-02, so marked inactive
    2            | 1          | true  | 2020-01-03 -- new row for entity 1 added when it was present on 2020-01-03
    3            | 2          | true  | 2020-01-03 --new value for 2020-01-03

Mitch Kent (205 rep)

Jul 28, 2020, 08:07 PM • Last activity: Jul 29, 2020, 07:21 AM

-2 votes

1 answers

53 views

If all associated attributes of an intended primary key are identical, is it still a true primary key?

primary-key duplication architecture uniqueidentifier surrogate-key

I'll start with an example. If I have a `person` table with intended surrogate primary key `Id`: +----+------+------------+-------------+ | Id | name | DoB | SSN | +----+------+------------+-------------+ | 1 | John | 1901-01-01 | 111-11-1111 | | 2 | Jane | 1902-02-02 | 222-22-2222 | | 3 | John | 19...

I'll start with an example. If I have a person table with intended surrogate primary key Id:

+----+------+------------+-------------+
| Id | name |    DoB     |     SSN     |
+----+------+------------+-------------+
|  1 | John | 1901-01-01 | 111-11-1111 |
|  2 | Jane | 1902-02-02 | 222-22-2222 |
|  3 | John | 1901-01-01 | 111-11-1111 |
+----+------+------------+-------------+

Note Ids 1 & 3 have the same attributes; They both represent the same person. Now from what we know about the theory behind what constitutes a primary key, which I think is well summarized here : > - The primary key must uniquely identify each record. > - A record’s primary-key value can’t be null. > - The primary key-value must exist when the record is created. > - The primary key must remain stable—you can’t change the primary-key field(s). > - The primary key must be compact and contain the fewest possible attributes. Consider the first bullet, "*The primary key must uniquely identify each record.*" In my example, I suppose whether or not each Id does represent uniqueness depends on what's really supposed to be considered unique. A different database record? Yes. A different person (what the records are supposed to represent)? No. So multiple Ids represent what is functionally the same subject generating the data, present in 2 records. A "two to one Id" of sorts. I've not read anything that directly address the scenario my example illustrates, as is relates to what is or is not a PK. 1. Does this example violate the theory behind what constitutes a primary key? 2. If not, does this example illustrate a violation of any larger principles of database architecture, or can this concept be reduced to something as simple as "duplication of data - clean it up"? Many thanks.

simply_sideways (3 rep)

May 5, 2020, 04:33 PM • Last activity: May 5, 2020, 10:02 PM

2 votes

1 answers

412 views

Should a inherited table have a new surrogate key, or just use parent table's key?

database-design surrogate-key

I am modeling some things with an "is a" relationship (aka inheritance), and there were two ways to go about it in regards to the inherited table's primary key and foreign key pointing back to the parent table: - Strategy #1 was to always create a new surrogate key on each inherited table, and then...

                                  I am modeling some things with an "is a" relationship (aka inheritance), and there were two ways to go about it in regards to the inherited table's primary key and foreign key pointing back to the parent table:

 - Strategy #1 was to always create a new surrogate key on each inherited table, and then store the reference to the parent table's primary key in a separate column (and have unique constraint on that column to ensure a 1 to 1 mapping).

 - Strategy #2 was to make the primary key value of the inherited table have the same value as the parent table's primary key. 

I think there is a clear winner here (Strategy #2), but I want to make sure there isn't some potential issue I am missing. 

Here's the code (targeting SQL Server, but maybe generic) elaborating each strategy:

    ------Strategy #1: always create a surrogate key for a child table (separate column holds parent table's primary key)----------
    drop table if exists TruckAsset
    drop table if exists VehicleAsset
    drop table if exists Asset
    
    create table Asset
    (
    	AssetId int not null,
    	primary key (AssetId)
    )
    
    create table VehicleAsset
    (
    	VehicleAssetId int not null, /*surrogate key*/
    	AssetId int not null,	
    	VehicleRegistrationExpirationDate date not null, /*specialization of Asset*/
    
    	primary key (VehicleAssetId),
    	constraint UQ_AssetId unique (AssetId),
    	foreign key (AssetId) references Asset(AssetId)
    )
    
    create table TruckAsset
    (
    	TruckAssetId int not null, /*surrogate key*/
    	VehicleAssetId int not null,
    	CargoCapacityCubicMeters float not null, /*specialization of VehicleAsset*/
    
    	primary key (TruckAssetId),
    	constraint UQ_VehicleAssetId unique (VehicleAssetId),
    	foreign key (VehicleAssetId) references VehicleAsset(VehicleAssetId)
    )
    
    INSERT INTO Asset(AssetId)VALUES(1234)
    INSERT INTO VehicleAsset(VehicleAssetId, AssetId, VehicleRegistrationExpirationDate) VALUES (5678, 1234, '2020-01-01')
    INSERT INTO TruckAsset(TruckAssetId, VehicleAssetId, CargoCapacityCubicMeters) VALUES (9999, 5678, 8.80977)
    
    GO

versus this
    
    ------Strategy #2: primary key references parent table's primary key directly----------
    drop table if exists TruckAsset
    drop table if exists VehicleAsset
    drop table if exists Asset
    
    create table Asset
    (
    	AssetId int not null,
    	primary key (AssetId)
    )
    
    create table VehicleAsset
    (
    	VehicleAssetId int not null,
    	VehicleRegistrationExpirationDate date not null, /*specialization of Asset*/
    
    	primary key (VehicleAssetId),
    	foreign key (VehicleAssetId) references Asset(AssetId)
    )
    
    create table TruckAsset
    (
    	TruckAssetId int not null,
    	CargoCapacityCubicMeters float not null, /*specialization of VehicleAsset*/
    
    	primary key (TruckAssetId),
    	foreign key (TruckAssetId) references VehicleAsset(VehicleAssetId)
    )
    
    INSERT INTO Asset(AssetId)VALUES(1234)
    INSERT INTO VehicleAsset(VehicleAssetId, VehicleRegistrationExpirationDate) VALUES (1234, '2020-01-01')
    INSERT INTO TruckAsset(TruckAssetId, CargoCapacityCubicMeters) VALUES (1234, 8.80977)
    ----------------------------------------

                                

Anssssss (248 rep)

Dec 5, 2019, 10:01 PM • Last activity: Dec 6, 2019, 06:00 PM

2 votes

1 answers

583 views

Natural Keys vs Surrogate Keys part 2

sql-server surrogate-key natural-key

A while back, I asked if [surrogate keys provide better performance than natural keys in SQL Server](https://dba.stackexchange.com/questions/50708/do-natural-keys-provide-higher-or-lower-performance-in-sql-server-than-surrogate) . [@sqlvogel](https://dba.stackexchange.com/users/1357/sqlvogel) provided an answer to that question yesterday that caused me to revisit it. This question is an attempt to "upgrade" the prior question, and hopefully provide the opportunity for thoughtful answers that help the community. Consider a system for storing details about computers. Each computer has an architecture, and an Operating System. In SQL Server, we could create these tables using natural keys like this: CREATE TABLE dbo.Architecture ( ArchitectureName varchar(10) NOT NULL , ArchitectureVersion decimal(5,2) NOT NULL , ReleaseDate date NOT NULL , CONSTRAINT PK_Architecture PRIMARY KEY CLUSTERED (ArchitectureName, ArchitectureVersion) ); CREATE TABLE dbo.Manufacturer ( ManufacturerName varchar(10) NOT NULL CONSTRAINT PK_Manufacturer PRIMARY KEY CLUSTERED ); CREATE TABLE dbo.OS ( OSName varchar(30) NOT NULL , ManufacturerName varchar(10) NOT NULL CONSTRAINT FK_OS_Manufacturer FOREIGN KEY (ManufacturerName) REFERENCES dbo.Manufacturer(ManufacturerName) , ArchitectureName varchar(10) NOT NULL , ArchitectureVersion decimal(5,2) NOT NULL , CONSTRAINT FK_OS_Architecture FOREIGN KEY (ArchitectureName, ArchitectureVersion) REFERENCES dbo.Architecture(ArchitectureName, ArchitectureVersion) , CONSTRAINT PK_OS PRIMARY KEY CLUSTERED (OSName) ); CREATE TABLE dbo.Computers ( ComputerID varchar(10) NOT NULL CONSTRAINT PK_Computers PRIMARY KEY CLUSTERED , OSName varchar(30) NOT NULL CONSTRAINT FK_Computers_OSName FOREIGN KEY REFERENCES dbo.OS(OSName) , ComputerManufacturerName varchar(10) NOT NULL CONSTRAINT FK_Computers_Manufacturer FOREIGN KEY REFERENCES dbo.Manufacturer(ManufacturerName) , EffectiveDate datetime NOT NULL CONSTRAINT DF_Computers_EffectiveDate DEFAULT (GETDATE()) , ExpiryDate datetime NULL ); To query the dbo.Computers table, with 2 rows in dbo.Computers, showing various details, we could do this: SELECT Computers.ComputerID , Computers.ComputerManufacturerName , OSManufacturer = OS.ManufacturerName , Computers.OSName , OS.ArchitectureName , OS.ArchitectureVersion FROM dbo.Computers INNER JOIN dbo.OS ON Computers.OSName = OS.OSName WHERE Computers.EffectiveDate = GETDATE() OR Computers.ExpiryDate IS NULL) ORDER BY Computers.ComputerID; The query output is:

╔════════════╦══════════════════════════╦════════════════╦════════════╦══════════════════╦═════════════════════╗
║ ComputerID ║ ComputerManufacturerName ║ OSManufacturer ║ OSName     ║ ArchitectureName ║ ArchitectureVersion ║
╠════════════╬══════════════════════════╬════════════════╬════════════╬══════════════════╬═════════════════════╣
║ CM700-01   ║ HP                       ║ Microsoft      ║ Windows 10 ║ x64              ║ 1.00                ║
║ CM700-02   ║ HP                       ║ Microsoft      ║ Windows 10 ║ x64              ║ 1.00                ║
╚════════════╩══════════════════════════╩════════════════╩════════════╩══════════════════╩═════════════════════╝

The query plan for this is quite simple:

Or, if we choose to use surrogate keys, like this: CREATE TABLE dbo.Architecture ( ArchitectureID int NOT NULL IDENTITY(1,1) CONSTRAINT PK_Architecture PRIMARY KEY CLUSTERED , ArchitectureName varchar(10) NOT NULL , ArchitectureVersion decimal(5,2) NOT NULL , ReleaseDate date NOT NULL , CONSTRAINT UQ_Architecture_Name UNIQUE (ArchitectureName, ArchitectureVersion) ); CREATE TABLE dbo.Manufacturer ( ManufacturerID int NOT NULL IDENTITY(1,1) CONSTRAINT PK_Manufacturer PRIMARY KEY CLUSTERED , ManufacturerName varchar(10) NOT NULL ); CREATE TABLE dbo.OS ( OS_ID int NOT NULL IDENTITY(1,1) CONSTRAINT PK_OS PRIMARY KEY CLUSTERED , OSName varchar(30) NOT NULL CONSTRAINT UQ_OS_Name UNIQUE , ManufacturerID int NOT NULL CONSTRAINT FK_OS_Manufacturer FOREIGN KEY REFERENCES dbo.Manufacturer(ManufacturerID) , ArchitectureID int NOT NULL CONSTRAINT FK_OS_Architecture FOREIGN KEY REFERENCES dbo.Architecture(ArchitectureID) ); CREATE TABLE dbo.Computers ( ComputerID int NOT NULL IDENTITY(1,1) CONSTRAINT PK_Computers PRIMARY KEY CLUSTERED , ComputerName varchar(10) NOT NULL CONSTRAINT UQ_Computers_Name UNIQUE , OS_ID int NOT NULL CONSTRAINT FK_Computers_OS FOREIGN KEY REFERENCES dbo.OS(OS_ID) , ComputerManufacturerID int NOT NULL CONSTRAINT FK_Computers_Manufacturer FOREIGN KEY REFERENCES dbo.Manufacturer(ManufacturerID) , EffectiveDate datetime NOT NULL CONSTRAINT DF_Computers_EffectiveDate DEFAULT (GETDATE()) , ExpiryDate datetime NULL ); In the design above, you may notice we have to include several new unique constraints to ensure our data model is consistent across both approaches. Querying this surrogate-key approach with 2 rows in dbo.Computers looks like: SELECT Computers.ComputerName , ComputerManufacturerName = cm.ManufacturerName , OSManufacturer = om.ManufacturerName , OS.OSName , Architecture.ArchitectureName , Architecture.ArchitectureVersion FROM dbo.Computers INNER JOIN dbo.OS ON Computers.OS_ID = OS.OS_ID INNER JOIN dbo.Manufacturer cm ON Computers.ComputerManufacturerID = cm.ManufacturerID INNER JOIN dbo.Architecture ON OS.ArchitectureID = Architecture.ArchitectureID INNER JOIN dbo.Manufacturer om ON OS.ManufacturerID = om.ManufacturerID WHERE Computers.EffectiveDate = GETDATE() OR Computers.ExpiryDate IS NULL) ORDER BY Computers.ComputerID; The results:

╔══════════════╦══════════════════════════╦════════════════╦════════════╦══════════════════╦═════════════════════╗
║ ComputerName ║ ComputerManufacturerName ║ OSManufacturer ║ OSName     ║ ArchitectureName ║ ArchitectureVersion ║
╠══════════════╬══════════════════════════╬════════════════╬════════════╬══════════════════╬═════════════════════╣
║ CM700-01     ║ HP                       ║ Microsoft      ║ Windows 10 ║ x64              ║ 1.00                ║
║ CM700-02     ║ HP                       ║ Microsoft      ║ Windows 10 ║ x64              ║ 1.00                ║
╚══════════════╩══════════════════════════╩════════════════╩════════════╩══════════════════╩═════════════════════╝

The I/O statistics are even more telling. For the natural keys, we have:

Table 'OS'. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  
Table 'Computers'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

For the surrogate key setup, we get:

Table 'Manufacturer'. Scan count 0, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Architecture'. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'OS'. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Computers'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Quite clearly, in the above admittedly very simple setup, the surrogate key is lagging in both ease-of-use, and performance. Having said that, what happens if we need to change the name of one of the manufacturers? Here's the T-SQL for the natural key version: UPDATE dbo.Manufacturer SET ManufacturerName = 'Microsoft™' WHERE ManufacturerName = 'Microsoft'; And the plan:

The T-SQL for the surrogate key version: UPDATE dbo.Manufacturer SET ManufacturerName = 'Microsoft™' WHERE ManufacturerID = 1; And that plan:

The natural key version has an estimated subtree cost that is nearly three times greater than the surrogate key version. Am I correct in saying that both natural keys and surrogate keys offer benefits; deciding which methodology to use should be carefully considered? Are there common situations where the comparisons I made above don't work? What other considerations should be made when choosing natural or surrogate keys?

Hannah Vernon (70988 rep)

Aug 30, 2017, 08:04 PM • Last activity: Oct 22, 2019, 06:02 AM

1 votes

2 answers

2823 views

What are the disadvantages of using surrogate keys?

database-design primary-key surrogate-key candidate-key

I am using MS SQL Server but in general database design I want to know what problems can arise when every row in a database has its auto generated surrogate key value. I know some advantages, for example that there is no need to identify unique columns without NULLs for primary keys, and there is no...

                                  I am using MS SQL Server but in general database design I want to know what problems can arise when every row in a database has its auto generated surrogate key value.

I know some advantages, for example that there is no need to identify unique columns without NULLs for primary keys, and there is no need to manage composite primary keys, normal forms are easier to manage, and uniqueness is guaranteed. 

I want to know, is there any good reason regarding to performance, or index structures, etc. that should make us use real world factful keys instead of surrogate ones?

Thanks.

igelr (2162 rep)

Mar 19, 2019, 07:15 PM • Last activity: Mar 25, 2019, 08:21 PM

1 votes

1 answers

6712 views

What problems are there in not using a primary key column at all?

postgresql primary-key surrogate-key

What purpose would a primary key column serve on a table such as `post_versions` here? Nothing refers to it, and there are no queries where I will ever want to select a row by `post_versions.id`. It'll be joined to `posts` in most queries. CREATE TABLE posts ( id SERIAL PRIMARY KEY, created TIMESTAM...

                                  What purpose would a primary key column serve on a table such as post_versions here? Nothing refers to it, and there are no queries where I will ever want to select a row by post_versions.id. It'll be joined to posts in most queries.
    
    CREATE TABLE posts (
    	id       SERIAL PRIMARY KEY,
        created  TIMESTAMPTZ NOT NULL DEFAULT now(),
        user_id  INTEGER NOT NULL REFERENCES users ON DELETE RESTRICT
    );
    
    CREATE TABLE post_versions (
    	id       SERIAL PRIMARY KEY, -- serves no purpose, could remove I think
    	post_id  INTEGER NOT NULL REFERENCES posts ON DELETE CASCADE,
        updated  TIMESTAMPTZ NOT NULL DEFAULT now(),
    	body     TEXT NOT NULL
    );

Are there any problems if I just remove it?
                                

davidtgq (759 rep)

Mar 4, 2019, 02:06 PM • Last activity: Mar 4, 2019, 04:36 PM

2 votes

2 answers

742 views

Composite primary key plus a separate (surrogate) id column for foreign key references

database-design primary-key erd surrogate-key candidate-key

I am trying to build a database about *locations* like so: - Country → Region → Town Countries will be populated with a list of pre-defined worldwide countries, regions & towns are input by users. Region may not apply to every location and so a blank string will be allowed (hence the many to many li...

                                  I am trying to build a database about *locations* like so:

- Country → Region → Town

Countries will be populated with a list of pre-defined worldwide countries, regions & towns are input by users.

Region may not apply to every location and so a blank string will be allowed (hence the many to many link table).

This is the best design I have come up with: 

The town_id column is set to AUTO_INCREMENT & UNIQUE. It is then used as a foreign key reference in other tables. 

The town column on its own is not UNIQUE as you may have the same town name in multiple places.

Is the towns table correct? 

The reason I ask is because it just seems odd to me to have the town_id as the reference, however this cannot be included or used as the primary key (because if it did, it would cause duplicates)? 

I would welcome feedback if this approach is correct.

the_peacock (123 rep)

Jun 3, 2014, 07:00 PM • Last activity: Feb 19, 2019, 10:46 PM

3 votes

1 answers

449 views

Does adding a surrogate key get rid of an identifying relationship between two entities?

database-design erd surrogate-key

In the following example, we have a `Comments` table with a composite primary key (PK) CREATE TABLE Comments ( video_id INT NOT NULL, user_id INT NOT NULL, comment_dt DATETIME NOT NULL, PRIMARY KEY (video_id, user_id, comment_dt), FOREIGN KEY (video_id) REFERENCES Videos(video_id), FOREIGN KEY (user...

                                  In the following example, we have a Comments table with a composite primary key (PK)
    
    CREATE TABLE Comments (
      video_id INT NOT NULL,
      user_id INT NOT NULL,
      comment_dt DATETIME NOT NULL,
      PRIMARY KEY (video_id, user_id, comment_dt),
      FOREIGN KEY (video_id) REFERENCES Videos(video_id),
      FOREIGN KEY (user_id) REFERENCES Users(user_id)
    );

There appears to be an identifying relationship, as the primary key of Comments includes its foreign keys.

However, watch what happens when you give this Comments table a surrogate primary key of its own:

    CREATE TABLE Comments (
    comment_id SERIAL PRIMARY KEY,
    video_id INT NOT NULL,
    user_id INT NOT NULL,
    comment_dt DATETIME NOT NULL,
    FOREIGN KEY (video_id) REFERENCES Videos(video_id),
    FOREIGN KEY (user_id) REFERENCES Users(user_id)
    );

Does that get rid of the identifying relationship? Logically, it still seems to be there, so does this really come down to the *physical* model and whether or not you want to include a composite key? Are identifying relationships even necessary?
                                

brennvo (91 rep)

Feb 1, 2019, 08:06 PM • Last activity: Feb 2, 2019, 08:22 PM

0 votes

1 answers

67 views

M:M Table Design - Use of PK

sql-server primary-key many-to-many surrogate-key

I have a many-to-many table that contains id's from 2 related tables. For creating a primary key on this table, is it better to create another integer column and have that as a surrogate PK? Or should I create the PK on both id columns? What are the benefits of either method? Here is the table desig...

                                  I have a many-to-many table that contains id's from 2 related tables.

For creating a primary key on this table, is it better to create another integer column and have that as a surrogate PK?

Or should I create the PK on both id columns?

What are the benefits of either method?

Here is the table design with the surrogate key (CodeGroupMappingID)

    CREATE TABLE [dbo].[CodeGroupMappings](
    	[CodeGroupMappingID] [int] IDENTITY(1,1) NOT NULL,
    	[CodeID] [int] NOT NULL,
    	[GroupID] [int] NOT NULL,
         CONSTRAINT [PK_CodeGroupMappings_CodeGroupMappingID] PRIMARY KEY CLUSTERED ([CodeGroupMappingID] ASC)
    )
                                

Kevin (533 rep)

May 22, 2018, 09:41 AM • Last activity: May 22, 2018, 08:51 PM

29 votes

3 answers

6548 views

Do natural keys provide higher or lower performance in SQL Server than surrogate integer keys?

sql-server sql-server-2012 performance surrogate-key natural-key performance-testing

I'm a fan of surrogate keys. There is a risk my findings are confirmation biased. Many questions I've seen both here and at http://stackoverflow.com use natural keys instead of surrogate keys based on `IDENTITY()` values. My background in computer systems tells me performing any comparative operatio...

                                  I'm a fan of surrogate keys.  There is a risk my findings are confirmation biased.  

Many questions I've seen both here and at http://stackoverflow.com  use natural keys instead of surrogate keys based on IDENTITY() values.

My background in computer systems tells me performing any comparative operation on an integer will be faster than comparing strings.

This  comment made me question my beliefs, so I thought I would create a system to investigate my thesis that integers are faster than strings for use as keys in SQL Server.

Since there is likely to be very little discernible difference in small datasets, I immediately thought of a two table setup where the primary table has 1,000,000 rows and the secondary table has 10 rows for each row in the primary table for a total of 10,000,000 rows in the secondary table.  The premise of my test is to create two sets of tables like this, one using natural keys and one using integer keys, and run timing tests on a simple query like:

    SELECT *
    FROM Table1
        INNER JOIN Table2 ON Table1.Key = Table2.Key;

The following is the code I created as a test bed:

	USE Master;
	IF (SELECT COUNT(database_id) FROM sys.databases d WHERE d.name = 'NaturalKeyTest') = 1
	BEGIN
		ALTER DATABASE NaturalKeyTest SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
		DROP DATABASE NaturalKeyTest;
	END
	GO
	CREATE DATABASE NaturalKeyTest 
    	ON (NAME = 'NaturalKeyTest', FILENAME = 
            'C:\SQLServer\Data\NaturalKeyTest.mdf', SIZE=8GB, FILEGROWTH=1GB) 
    	LOG ON (NAME='NaturalKeyTestLog', FILENAME = 
            'C:\SQLServer\Logs\NaturalKeyTest.mdf', SIZE=256MB, FILEGROWTH=128MB);
	GO
	ALTER DATABASE NaturalKeyTest SET RECOVERY SIMPLE;
	GO
	USE NaturalKeyTest;
	GO
	CREATE VIEW GetRand
	AS 
		SELECT RAND() AS RandomNumber;
	GO
	CREATE FUNCTION RandomString
	(
		@StringLength INT
	)
	RETURNS NVARCHAR(max)
	AS
	BEGIN
		DECLARE @cnt INT = 0
		DECLARE @str NVARCHAR(MAX) = '';
		DECLARE @RandomNum FLOAT = 0;
		WHILE @cnt < @StringLength
		BEGIN
			SELECT @RandomNum = RandomNumber
			FROM GetRand;
			SET @str = @str + CAST(CHAR((@RandomNum * 64.) + 32) AS NVARCHAR(MAX)); 
			SET @cnt = @cnt + 1;
		END
		RETURN @str;
	END;
	GO
	CREATE TABLE NaturalTable1
	(
		NaturalTable1Key NVARCHAR(255) NOT NULL 
		    CONSTRAINT PK_NaturalTable1 PRIMARY KEY CLUSTERED 
		, Table1TestData NVARCHAR(255) NOT NULL 
	);
	CREATE TABLE NaturalTable2
	(
		NaturalTable2Key NVARCHAR(255) NOT NULL 
		    CONSTRAINT PK_NaturalTable2 PRIMARY KEY CLUSTERED 
		, NaturalTable1Key NVARCHAR(255) NOT NULL 
		    CONSTRAINT FK_NaturalTable2_NaturalTable1Key 
		    FOREIGN KEY REFERENCES dbo.NaturalTable1 (NaturalTable1Key) 
		    ON DELETE CASCADE ON UPDATE CASCADE
		, Table2TestData NVARCHAR(255) NOT NULL  
	);
	GO

	/* insert 1,000,000 rows into NaturalTable1 */
	INSERT INTO NaturalTable1 (NaturalTable1Key, Table1TestData) 
		VALUES (dbo.RandomString(25), dbo.RandomString(100));
	GO 1000000 

	/* insert 10,000,000 rows into NaturalTable2 */
	INSERT INTO NaturalTable2 (NaturalTable2Key, NaturalTable1Key, Table2TestData)
	SELECT dbo.RandomString(25), T1.NaturalTable1Key, dbo.RandomString(100)
	FROM NaturalTable1 T1
	GO 10 

	CREATE TABLE IDTable1
	(
		IDTable1Key INT NOT NULL CONSTRAINT PK_IDTable1 
		PRIMARY KEY CLUSTERED IDENTITY(1,1)
		, Table1TestData NVARCHAR(255) NOT NULL 
		CONSTRAINT DF_IDTable1_TestData DEFAULT dbo.RandomString(100)
	);
	CREATE TABLE IDTable2
	(
		IDTable2Key INT NOT NULL CONSTRAINT PK_IDTable2 
		    PRIMARY KEY CLUSTERED IDENTITY(1,1)
		, IDTable1Key INT NOT NULL 
		    CONSTRAINT FK_IDTable2_IDTable1Key FOREIGN KEY 
		    REFERENCES dbo.IDTable1 (IDTable1Key) 
		    ON DELETE CASCADE ON UPDATE CASCADE
		, Table2TestData NVARCHAR(255) NOT NULL 
		    CONSTRAINT DF_IDTable2_TestData DEFAULT dbo.RandomString(100)
	);
	GO
	INSERT INTO IDTable1 DEFAULT VALUES;
	GO 1000000
	INSERT INTO IDTable2 (IDTable1Key)
	SELECT T1.IDTable1Key
	FROM IDTable1 T1
	GO 10

The code above creates a database and 4 tables, and fills the tables with data, ready to test.  The test code I ran is:

	USE NaturalKeyTest;
	GO
	DECLARE @loops INT = 0;
	DECLARE @MaxLoops INT = 10;
	DECLARE @Results TABLE (
		FinishedAt DATETIME DEFAULT (GETDATE())
		, KeyType NVARCHAR(255)
		, ElapsedTime FLOAT
	);
	WHILE @loops < @MaxLoops
	BEGIN
		DBCC FREEPROCCACHE;
		DBCC FREESESSIONCACHE;
		DBCC FREESYSTEMCACHE ('ALL');
		DBCC DROPCLEANBUFFERS;
		WAITFOR DELAY '00:00:05';
		DECLARE @start DATETIME = GETDATE();
		DECLARE @end DATETIME;
		DECLARE @count INT;
		SELECT @count = COUNT(*) 
		FROM dbo.NaturalTable1 T1
			INNER JOIN dbo.NaturalTable2 T2 ON T1.NaturalTable1Key = T2.NaturalTable1Key;
		SET @end = GETDATE();
		INSERT INTO @Results (KeyType, ElapsedTime)
		SELECT 'Natural PK' AS KeyType, CAST((@end - @start) AS FLOAT) AS ElapsedTime;

		DBCC FREEPROCCACHE;
		DBCC FREESESSIONCACHE;
		DBCC FREESYSTEMCACHE ('ALL');
		DBCC DROPCLEANBUFFERS;
		WAITFOR DELAY '00:00:05';
		SET @start = GETDATE();
		SELECT @count = COUNT(*) 
		FROM dbo.IDTable1 T1
			INNER JOIN dbo.IDTable2 T2 ON T1.IDTable1Key = T2.IDTable1Key;
		SET @end = GETDATE();
		INSERT INTO @Results (KeyType, ElapsedTime)
		SELECT 'IDENTITY() PK' AS KeyType, CAST((@end - @start) AS FLOAT) AS ElapsedTime;

		SET @loops = @loops + 1;
	END
	SELECT KeyType, FORMAT(CAST(AVG(ElapsedTime) AS DATETIME), 'HH:mm:ss.fff') AS AvgTime 
	FROM @Results
	GROUP BY KeyType;

These are the results:



Am I doing something wrong here, or are INT keys 3 times faster than 25 character natural keys?

Note, I've written a follow-up question [here](https://dba.stackexchange.com/questions/184756/natural-keys-vs-surrogate-keys-part-2-the-showdown) .

                                

Hannah Vernon (70988 rep)

Sep 29, 2013, 01:56 AM • Last activity: Mar 18, 2018, 03:22 PM

40 votes

3 answers

7197 views

Should every table have a single-field surrogate/artificial primary key?

database-design primary-key surrogate-key

I understand one benefit of surrogate/artificial keys in general - they do not change and that can be very convenient. This is true whether they are single or multiple field - as long as they are 'artificial'. However, it sometimes seems to be a matter of policy to have an auto-incrementing integer...

                                  I understand one benefit of surrogate/artificial keys in general - they do not change and that can be very convenient. This is true whether they are single or multiple field - as long as they are 'artificial'.

However, it sometimes seems to be a matter of policy to have an auto-incrementing integer field as the primary key of each table. Is this always the best idea to have such a **single-field** key and why (or why not)?

To be clear, this question is not about artificial vs natural - but about whether all artificial keys should be **single-field**

Jack Douglas (40537 rep)

Sep 23, 2011, 09:25 PM • Last activity: Mar 18, 2018, 12:50 PM

8 votes

5 answers

5853 views

What normal form does a surrogate key violate?

normalization relational-theory surrogate-key

I have following question: "What normal form does a surrogate key violate?" My thought was the 3rd normal form, but I'm not quite sure it's just an assumption i am making. Could someone explain that to me?

                                  I have following question:

"What normal form does a surrogate key violate?"

My thought was the 3rd normal form, but I'm not quite sure it's just an assumption i am making. Could someone explain that to me?

Marco (91 rep)

Apr 16, 2012, 08:35 AM • Last activity: Mar 18, 2018, 12:48 PM

5 votes

2 answers

2189 views

Surrogate key vs Natural key

database-design best-practices surrogate-key natural-key

I have a table called devices. Most of the devices that will get stored in this table can be uniquely identified by their serial number and part number. But there are some device types that do not have serial number and part number assigned to them. Instead they can be uniquely identified by another...

                                  I have a table called devices. Most of the devices that will get stored in this table can be uniquely identified by their serial number and part number. But there are some device types that do not have serial number and part number assigned to them. Instead they can be uniquely identified by another field (internal id).

Should I create a surrogate key for this table or should I create a composite primary key (serial number, part number, internal id) and insert default values to the serial number and part number columns when they are not supplied? The device types that do not have part number and serial number now, will have the numbers assigned to them in the future releases (may be 5 years from now). Should I create a surrogate key or a composite key in this scenario? Or using the three unique attributes, should I create a hash in the program and use that as a surrogate key for the tables?

DBK (378 rep)

Aug 26, 2017, 04:07 AM • Last activity: Mar 18, 2018, 12:43 PM

0 votes

2 answers

155 views

Common term for tenant-based keys

terminology multi-tenant surrogate-key

Assume the following schema: ID  | TENANT | TENANT_CUSTOM_ID 346 | 101    | 1 347 | 102    | 1 348 | 101    | 2 349 | 101    | 3 350 | 101    | 4 351 | 102    | 2 352 | 1...

                                  Assume the following schema:



    ID  | TENANT | TENANT_CUSTOM_ID
| 101    | 1  
| 102    | 1  
| 101    | 2  
| 101    | 3  
| 101    | 4  
| 102    | 2  
| 103    | 1

**What is another name (or the proper classification) for TENANT_CUSTOM_ID?** Perhaps something along the lines of a surrogate key? The purpose of this key is to make each tenant have their own numbering system instead of knowing that their records are mixed with all other tenants on the same tables. 

                                

1234567 (159 rep)

Jun 2, 2017, 12:00 AM • Last activity: Jun 2, 2017, 09:01 PM

1 votes

1 answers

1003 views

What are the consequences, positive or negative, to having a surrogate primary key for a table which already has a guaranteed unique column?

postgresql database-design primary-key surrogate-key

I recently made a table that looks like the one below CREATE TABLE example( id SERIAL PRIMARY KEY, subindusty_id INT UNIQUE NOT NULL, name TEXT NOT NULL -- other fields... ) I realized that in this table I could just remove the column called `id` and make `subindustry_id` the primary key. Thinking b...

                                  I recently made a table that looks like the one below

    CREATE TABLE example(
      id SERIAL PRIMARY KEY,
      subindusty_id INT UNIQUE NOT NULL,
      name TEXT NOT NULL
      -- other fields...
    )

I realized that in this table I could just remove the column called id and make subindustry_id the primary key.

Thinking back, the only reason I can think of for why I didn't make subindustry_id the primary key is for the sake of consistency, considering that I have a lot of other tables with a column called id.

Except for things like consistency*, are there any tangible benefits/downsides to having this additional id column as the primary key when there's already another column that'd be a good primary key?

\*Consistency as in "that's the way the rest of the tables look". The kind where you're enforcing a certain type of style. I guess the point of this question is to question the practice of giving every single table an id column regardless of the fact that they might already have a more suitable candidate for a primary key.
                                

m0meni (147 rep)

Jan 19, 2017, 06:26 PM • Last activity: Jan 20, 2017, 05:53 AM

Showing page 1 of 20 total questions