Database Administrators
Q&A for database professionals who wish to improve their database skills
Latest Questions
15
votes
4
answers
6701
views
Foreign keys - link using surrogate or natural key?
Is there a best practice for whether a foreign key between tables should link to a natural key or a surrogate key? The only discussion I've really found (unless my google-fu is lacking) is [Jack Douglas' answer in this question][1], and his reasoning seems sound to me. I'm aware of the discussion be...
Is there a best practice for whether a foreign key between tables should link to a natural key or a surrogate key? The only discussion I've really found (unless my google-fu is lacking) is Jack Douglas' answer in this question , and his reasoning seems sound to me. I'm aware of the discussion beyond that that rules change, but this would be something that would need to be considered in any situation.
The main reason for asking is that I have a legacy application that makes uses of FKs with natural keys, but there is a strong push from devlopers to move to an OR/M (NHibernate in our case), and a fork has already produced some breaking changes, so I'm looking to either push them back on track using the natural key, or move the legacy app to use surrogate keys for the FK. My gut says to restore the original FK, but I'm honestly not sure if this is really the right path to follow.
The majority of our tables already have both a surrogate and natural key already defined (though unique constraint and PK) so having to add extra columns is a non-issue for us in this insance. We're using SQL Server 2008, but I'd hope this is generic enough for any DB.
Callie J
(492 rep)
Apr 2, 2013, 12:17 PM
• Last activity: Sep 6, 2023, 07:07 AM
2
votes
1
answers
1049
views
How to get the "best of both worlds" with natural and surrogate keys? And could DBMSs be better?
I am designing my first database, and I find myself frustrated by the choice between storing an integer or a string for each instance of a categorical variable. My understanding is that if I have a table containing cities that I want to make a child of a table of countries, the most performant way t...
I am designing my first database, and I find myself frustrated by the choice between storing an integer or a string for each instance of a categorical variable.
My understanding is that if I have a table containing cities that I want to make a child of a table of countries, the most performant way to do that is to have the PK of the countries table as a FK in in the table of cities. However for ease of use and debugging, it's nice to always have the string name associated with the country PK. Every solution I have considered either is not recommended or seems overly complex.
I'd like opinions the merits of these approaches (or hear about new ones) and also to understand if it has to be this way or if databases simply are this way because of tradition.
Possible approaches:
1. Use a string as a PK for countries. Then I will have a human-readable FK for it in any child tables. Obviously less performant than using integers, but I suspect it may be the least worst way to have the convenience I desire.
2. Create a view using application logic that join each the string name of the country to the states table.
- I don't love this because if the application logic breaks, the tables become less readable. Also I would expect large join operations to have an even worse performance penalty than string PK/FKs.
3. Create a separate table to connect numeric IDs with the appropriate string ID. I'm not sure if it would be better to have a table coding each type of relation, or one big table with one big pool of IDs that cover all integer key-string value relations. I could then use application logic to look up the appropriate strings and fill the appropriate PK into the child table when it's string name is given by a user.
- I feel like this might be pretty resource intensive too, as there would have to be a lookup every time a new row was added to the child. It also means that I would still have to create the views I want.
4. Use
enum
data type. Instinctively, this would be my go-to approach, as it seems the ideal balance between natural and synthetic keys: Use integer IDs and give the IDs a string label so that the string itself need not be repeated.
- Unfortunately my research has found that this is not recommended. One reason for that is that categories cannot be deleted easily. I'm not sure if that is dealbreaker for me, but I also wonder why DBMSs are designed this way. Aren't categorical variables commonly used enough to add convenience features for them?
Stonecraft
(125 rep)
Jul 24, 2022, 06:43 PM
• Last activity: Jul 25, 2022, 12:29 PM
1
votes
2
answers
6092
views
PostgreSQL surrogate keys: use sequence/bigint or OID?
I'm looking at a need for surrogate keys in tables I'm designing on a PostgreSQL-derived DBMS ([Citus][1]). Would OIDs suffice? Is there a downside to using them instead of creating a `bigint` field and a sequence? [1]: https://www.citusdata.com/
I'm looking at a need for surrogate keys in tables I'm designing on a PostgreSQL-derived DBMS (Citus ). Would OIDs suffice? Is there a downside to using them instead of creating a
bigint
field and a sequence?
Andrew Wolfe
(584 rep)
Feb 1, 2016, 05:43 PM
• Last activity: Sep 27, 2021, 12:04 AM
2
votes
1
answers
2838
views
Identity versus Composite values for Primary Keys in Many-To-Many Tables
I am in the middle of an interesting "Data Architecture" discussion at a place I am contracting & need some input. As a habit, I always create a **Surrogate Key** as the **Primary Key** in my tables - be they Guid (UUID) or Identity values. IMHO the *synthetic-nature* of a Surrogate Key has the dist...
I am in the middle of an interesting "Data Architecture" discussion at a place I am contracting & need some input.
As a habit, I always create a **Surrogate Key** as the **Primary Key** in my tables - be they Guid (UUID) or Identity values. IMHO the *synthetic-nature* of a Surrogate Key has the distinct value of identifying a row without intruding on "real" or "natural" keys-or-values that may otherwise be useful **Alternate Keys**.
To me...it just seems natural - but maybe I am wrong.
Where I happen to be at the moment...they argue this *good* for "core" **Entity** tables, but is *completely incorrect* for **Many-to-Many** (relationship) tables. While I will happily oblige the customer...I disagree.
Thoughts behind my approach are:
- A primary key value must be unique
- The primary key should be as compact as possible
- Primary key value should be stable
- Primary Key identifies a tuple...not the data INSIDE the tuple
**MY QUESTION:** Which is either correct or better? And Why?
- Please provide reasoning etc. **NOTE:**
Please Ignore the ID column naming in the samples below...that is how they do it here.
- Please provide reasoning etc. **NOTE:**
Please Ignore the ID column naming in the samples below...that is how they do it here.

Prisoner ZERO
(129 rep)
Jul 6, 2017, 12:24 PM
• Last activity: Aug 16, 2021, 11:01 AM
1
votes
0
answers
49
views
How not to "naturalize" surrogate keys?
> Don't naturalize surrogate keys. As soon as you display the value of a surrogate key to your end users, or worse yet allow them to work with the value (perhaps to search), you have effectively given the key business meaning. This in effect naturalizes the key and thereby negates some of the advant...
> Don't naturalize surrogate keys. As soon as you display the value of a surrogate key to your end users, or worse yet allow them to work with the value (perhaps to search), you have effectively given the key business meaning. This in effect naturalizes the key and thereby negates some of the advantages of surrogate keys.
>
> http://www.agiledata.org/essays/keys.html
> Don’t “naturalize” surrogate keys. As soon as you display the value of a surrogate key to your end users, or worse yet allow users to work with the value (perhaps via search), you have effectively given the key business meaning. The exposed key in our database could then be considered a natural key in someone else’s.
>
> https://begriffs.com/posts/2018-01-01-sql-keys-in-depth.html
And yet, every web framework/ORM I am familiar with (e.g. Ruby on Rails) exposes the surrogate key in the URL which is natural key that users share, send, save, bookmark, etc.
In light this of this recommendation, what are the best practices for handling web URLs?
Paul Draper
(800 rep)
Aug 5, 2020, 07:18 PM
0
votes
1
answers
560
views
Insert results from query, update old existing row if query doesn't still return same value
I am trying to refactor a table which contains historical data and therefore isn't ever deleted, but has an 'active' field. Each day we run a query (a few joins, but quite slow) to get some data to insert into this table: * if the row exists already, and is present in the query, we do nothing * if t...
I am trying to refactor a table which contains historical data and therefore isn't ever deleted, but has an 'active' field.
Each day we run a query (a few joins, but quite slow) to get some data to insert into this table:
* if the row exists already, and is present in the query, we do nothing
* if the row exists but the result isn't in todays query, we set 'active' to _false_
* if the row doesn't exist but is in the query, we insert it (with active defaulting to true)
It is possible for an entity to be active, made inactive (same row), then a few days later become active again (would be a new row).
The table is using a surrogate key, as the natural key can be repeated as noted above, though there should only ever be a single entry that is active on the natural key.
I'm curious if there is an action or 'pattern' I can take that will enable such functionality - I realise I'm asking a lot :)
I am using a java library (jOOQ) to generate the sql, so I am only looking for sql as the solution, no PL/pgSQL and no stored procs. I doubt it is possible without 2 passes (1. perform the update where not in query result, 2. perform the insert from query result) but I'm intrigued if there are possibilities.
Edit: Example table history, where ** is an update to same row
pk_surrogate | natural_id | active| date_made_active
1 | 1 | true | 2020-01-01 -- row for entity 1 added
1 ** | 1 | false | 2020-01-01 -- row for entity 1 no longer present on 2020-01-02, so marked inactive
2 | 1 | true | 2020-01-03 -- new row for entity 1 added when it was present on 2020-01-03
3 | 2 | true | 2020-01-03 --new value for 2020-01-03
Mitch Kent
(205 rep)
Jul 28, 2020, 08:07 PM
• Last activity: Jul 29, 2020, 07:21 AM
-2
votes
1
answers
53
views
If all associated attributes of an intended primary key are identical, is it still a true primary key?
I'll start with an example. If I have a `person` table with intended surrogate primary key `Id`: +----+------+------------+-------------+ | Id | name | DoB | SSN | +----+------+------------+-------------+ | 1 | John | 1901-01-01 | 111-11-1111 | | 2 | Jane | 1902-02-02 | 222-22-2222 | | 3 | John | 19...
I'll start with an example. If I have a
person
table with intended surrogate primary key Id
:
+----+------+------------+-------------+ | Id | name | DoB | SSN | +----+------+------------+-------------+ | 1 | John | 1901-01-01 | 111-11-1111 | | 2 | Jane | 1902-02-02 | 222-22-2222 | | 3 | John | 1901-01-01 | 111-11-1111 | +----+------+------------+-------------+Note
Id
s 1 & 3 have the same attributes; They both represent the same person.
Now from what we know about the theory behind what constitutes a primary key, which I think is well summarized here :
> - The primary key must uniquely identify each record.
> - A record’s primary-key value can’t be null.
> - The primary key-value must exist when the record is created.
> - The primary key must remain stable—you can’t change the primary-key field(s).
> - The primary key must be compact and contain the fewest possible attributes.
Consider the first bullet, "*The primary key must uniquely identify each record.*" In my example, I suppose whether or not each Id
does represent uniqueness depends on what's really supposed to be considered unique. A different database record? Yes. A different person (what the records are supposed to represent)? No.
So multiple Id
s represent what is functionally the same subject generating the data, present in 2 records. A "two to one Id" of sorts. I've not read anything that directly address the scenario my example illustrates, as is relates to what is or is not a PK.
1. Does this example violate the theory behind what constitutes a primary key?
2. If not, does this example illustrate a violation of any larger principles of database architecture, or can this concept be reduced to something as simple as "duplication of data - clean it up"?
Many thanks.
simply_sideways
(3 rep)
May 5, 2020, 04:33 PM
• Last activity: May 5, 2020, 10:02 PM
2
votes
1
answers
412
views
Should a inherited table have a new surrogate key, or just use parent table's key?
I am modeling some things with an "is a" relationship (aka inheritance), and there were two ways to go about it in regards to the inherited table's primary key and foreign key pointing back to the parent table: - Strategy #1 was to always create a new surrogate key on each inherited table, and then...
I am modeling some things with an "is a" relationship (aka inheritance), and there were two ways to go about it in regards to the inherited table's primary key and foreign key pointing back to the parent table:
- Strategy #1 was to always create a new surrogate key on each inherited table, and then store the reference to the parent table's primary key in a separate column (and have unique constraint on that column to ensure a 1 to 1 mapping).
- Strategy #2 was to make the primary key value of the inherited table have the same value as the parent table's primary key.
I think there is a clear winner here (Strategy #2), but I want to make sure there isn't some potential issue I am missing.
Here's the code (targeting SQL Server, but maybe generic) elaborating each strategy:
------Strategy #1: always create a surrogate key for a child table (separate column holds parent table's primary key)----------
drop table if exists TruckAsset
drop table if exists VehicleAsset
drop table if exists Asset
create table Asset
(
AssetId int not null,
primary key (AssetId)
)
create table VehicleAsset
(
VehicleAssetId int not null, /*surrogate key*/
AssetId int not null,
VehicleRegistrationExpirationDate date not null, /*specialization of Asset*/
primary key (VehicleAssetId),
constraint UQ_AssetId unique (AssetId),
foreign key (AssetId) references Asset(AssetId)
)
create table TruckAsset
(
TruckAssetId int not null, /*surrogate key*/
VehicleAssetId int not null,
CargoCapacityCubicMeters float not null, /*specialization of VehicleAsset*/
primary key (TruckAssetId),
constraint UQ_VehicleAssetId unique (VehicleAssetId),
foreign key (VehicleAssetId) references VehicleAsset(VehicleAssetId)
)
INSERT INTO Asset(AssetId)VALUES(1234)
INSERT INTO VehicleAsset(VehicleAssetId, AssetId, VehicleRegistrationExpirationDate) VALUES (5678, 1234, '2020-01-01')
INSERT INTO TruckAsset(TruckAssetId, VehicleAssetId, CargoCapacityCubicMeters) VALUES (9999, 5678, 8.80977)
GO
versus this
------Strategy #2: primary key references parent table's primary key directly----------
drop table if exists TruckAsset
drop table if exists VehicleAsset
drop table if exists Asset
create table Asset
(
AssetId int not null,
primary key (AssetId)
)
create table VehicleAsset
(
VehicleAssetId int not null,
VehicleRegistrationExpirationDate date not null, /*specialization of Asset*/
primary key (VehicleAssetId),
foreign key (VehicleAssetId) references Asset(AssetId)
)
create table TruckAsset
(
TruckAssetId int not null,
CargoCapacityCubicMeters float not null, /*specialization of VehicleAsset*/
primary key (TruckAssetId),
foreign key (TruckAssetId) references VehicleAsset(VehicleAssetId)
)
INSERT INTO Asset(AssetId)VALUES(1234)
INSERT INTO VehicleAsset(VehicleAssetId, VehicleRegistrationExpirationDate) VALUES (1234, '2020-01-01')
INSERT INTO TruckAsset(TruckAssetId, CargoCapacityCubicMeters) VALUES (1234, 8.80977)
----------------------------------------
Anssssss
(248 rep)
Dec 5, 2019, 10:01 PM
• Last activity: Dec 6, 2019, 06:00 PM
2
votes
1
answers
583
views
Natural Keys vs Surrogate Keys part 2
A while back, I asked if [surrogate keys provide better performance than natural keys in SQL Server](https://dba.stackexchange.com/questions/50708/do-natural-keys-provide-higher-or-lower-performance-in-sql-server-than-surrogate). [@sqlvogel](https://dba.stackexchange.com/users/1357/sqlvogel) provide...
A while back, I asked if [surrogate keys provide better performance than natural keys in SQL Server](https://dba.stackexchange.com/questions/50708/do-natural-keys-provide-higher-or-lower-performance-in-sql-server-than-surrogate) . [@sqlvogel](https://dba.stackexchange.com/users/1357/sqlvogel) provided an answer to that question yesterday that caused me to revisit it.
This question is an attempt to "upgrade" the prior question, and hopefully provide the opportunity for thoughtful answers that help the community.
Consider a system for storing details about computers. Each computer has an architecture, and an Operating System. In SQL Server, we could create these tables using natural keys like this:
CREATE TABLE dbo.Architecture
(
ArchitectureName varchar(10) NOT NULL
, ArchitectureVersion decimal(5,2) NOT NULL
, ReleaseDate date NOT NULL
, CONSTRAINT PK_Architecture
PRIMARY KEY CLUSTERED
(ArchitectureName, ArchitectureVersion)
);
CREATE TABLE dbo.Manufacturer
(
ManufacturerName varchar(10) NOT NULL
CONSTRAINT PK_Manufacturer
PRIMARY KEY CLUSTERED
);
CREATE TABLE dbo.OS
(
OSName varchar(30) NOT NULL
, ManufacturerName varchar(10) NOT NULL
CONSTRAINT FK_OS_Manufacturer
FOREIGN KEY
(ManufacturerName)
REFERENCES dbo.Manufacturer(ManufacturerName)
, ArchitectureName varchar(10) NOT NULL
, ArchitectureVersion decimal(5,2) NOT NULL
, CONSTRAINT FK_OS_Architecture
FOREIGN KEY
(ArchitectureName, ArchitectureVersion)
REFERENCES dbo.Architecture(ArchitectureName, ArchitectureVersion)
, CONSTRAINT PK_OS
PRIMARY KEY CLUSTERED
(OSName)
);
CREATE TABLE dbo.Computers
(
ComputerID varchar(10) NOT NULL
CONSTRAINT PK_Computers
PRIMARY KEY CLUSTERED
, OSName varchar(30) NOT NULL
CONSTRAINT FK_Computers_OSName
FOREIGN KEY
REFERENCES dbo.OS(OSName)
, ComputerManufacturerName varchar(10) NOT NULL
CONSTRAINT FK_Computers_Manufacturer
FOREIGN KEY
REFERENCES dbo.Manufacturer(ManufacturerName)
, EffectiveDate datetime NOT NULL
CONSTRAINT DF_Computers_EffectiveDate
DEFAULT (GETDATE())
, ExpiryDate datetime NULL
);
To query the
Or, if we choose to use surrogate keys, like this:
CREATE TABLE dbo.Architecture
(
ArchitectureID int NOT NULL IDENTITY(1,1)
CONSTRAINT PK_Architecture
PRIMARY KEY CLUSTERED
, ArchitectureName varchar(10) NOT NULL
, ArchitectureVersion decimal(5,2) NOT NULL
, ReleaseDate date NOT NULL
, CONSTRAINT UQ_Architecture_Name
UNIQUE
(ArchitectureName, ArchitectureVersion)
);
CREATE TABLE dbo.Manufacturer
(
ManufacturerID int NOT NULL IDENTITY(1,1)
CONSTRAINT PK_Manufacturer
PRIMARY KEY CLUSTERED
, ManufacturerName varchar(10) NOT NULL
);
CREATE TABLE dbo.OS
(
OS_ID int NOT NULL IDENTITY(1,1)
CONSTRAINT PK_OS
PRIMARY KEY CLUSTERED
, OSName varchar(30) NOT NULL
CONSTRAINT UQ_OS_Name
UNIQUE
, ManufacturerID int NOT NULL
CONSTRAINT FK_OS_Manufacturer
FOREIGN KEY
REFERENCES dbo.Manufacturer(ManufacturerID)
, ArchitectureID int NOT NULL
CONSTRAINT FK_OS_Architecture
FOREIGN KEY
REFERENCES dbo.Architecture(ArchitectureID)
);
CREATE TABLE dbo.Computers
(
ComputerID int NOT NULL IDENTITY(1,1)
CONSTRAINT PK_Computers
PRIMARY KEY CLUSTERED
, ComputerName varchar(10) NOT NULL
CONSTRAINT UQ_Computers_Name
UNIQUE
, OS_ID int NOT NULL
CONSTRAINT FK_Computers_OS
FOREIGN KEY
REFERENCES dbo.OS(OS_ID)
, ComputerManufacturerID int NOT NULL
CONSTRAINT FK_Computers_Manufacturer
FOREIGN KEY
REFERENCES dbo.Manufacturer(ManufacturerID)
, EffectiveDate datetime NOT NULL
CONSTRAINT DF_Computers_EffectiveDate
DEFAULT (GETDATE())
, ExpiryDate datetime NULL
);
In the design above, you may notice we have to include several new unique constraints to ensure our data model is consistent across both approaches.
Querying this surrogate-key approach with 2 rows in
The I/O statistics are even more telling. For the natural keys, we have:
The T-SQL for the surrogate key version:
UPDATE dbo.Manufacturer
SET ManufacturerName = 'Microsoft™'
WHERE ManufacturerID = 1;
And that plan:
The natural key version has an estimated subtree cost that is nearly three times greater than the surrogate key version.
Am I correct in saying that both natural keys and surrogate keys offer benefits; deciding which methodology to use should be carefully considered?
Are there common situations where the comparisons I made above don't work? What other considerations should be made when choosing natural or surrogate keys?
dbo.Computers
table, with 2 rows in dbo.Computers
, showing various details, we could do this:
SELECT Computers.ComputerID
, Computers.ComputerManufacturerName
, OSManufacturer = OS.ManufacturerName
, Computers.OSName
, OS.ArchitectureName
, OS.ArchitectureVersion
FROM dbo.Computers
INNER JOIN dbo.OS ON Computers.OSName = OS.OSName
WHERE Computers.EffectiveDate = GETDATE() OR Computers.ExpiryDate IS NULL)
ORDER BY Computers.ComputerID;
The query output is:
╔════════════╦══════════════════════════╦════════════════╦════════════╦══════════════════╦═════════════════════╗ ║ ComputerID ║ ComputerManufacturerName ║ OSManufacturer ║ OSName ║ ArchitectureName ║ ArchitectureVersion ║ ╠════════════╬══════════════════════════╬════════════════╬════════════╬══════════════════╬═════════════════════╣ ║ CM700-01 ║ HP ║ Microsoft ║ Windows 10 ║ x64 ║ 1.00 ║ ║ CM700-02 ║ HP ║ Microsoft ║ Windows 10 ║ x64 ║ 1.00 ║ ╚════════════╩══════════════════════════╩════════════════╩════════════╩══════════════════╩═════════════════════╝The query plan for this is quite simple:

dbo.Computers
looks like:
SELECT Computers.ComputerName
, ComputerManufacturerName = cm.ManufacturerName
, OSManufacturer = om.ManufacturerName
, OS.OSName
, Architecture.ArchitectureName
, Architecture.ArchitectureVersion
FROM dbo.Computers
INNER JOIN dbo.OS ON Computers.OS_ID = OS.OS_ID
INNER JOIN dbo.Manufacturer cm ON Computers.ComputerManufacturerID = cm.ManufacturerID
INNER JOIN dbo.Architecture ON OS.ArchitectureID = Architecture.ArchitectureID
INNER JOIN dbo.Manufacturer om ON OS.ManufacturerID = om.ManufacturerID
WHERE Computers.EffectiveDate = GETDATE() OR Computers.ExpiryDate IS NULL)
ORDER BY Computers.ComputerID;
The results:
╔══════════════╦══════════════════════════╦════════════════╦════════════╦══════════════════╦═════════════════════╗ ║ ComputerName ║ ComputerManufacturerName ║ OSManufacturer ║ OSName ║ ArchitectureName ║ ArchitectureVersion ║ ╠══════════════╬══════════════════════════╬════════════════╬════════════╬══════════════════╬═════════════════════╣ ║ CM700-01 ║ HP ║ Microsoft ║ Windows 10 ║ x64 ║ 1.00 ║ ║ CM700-02 ║ HP ║ Microsoft ║ Windows 10 ║ x64 ║ 1.00 ║ ╚══════════════╩══════════════════════════╩════════════════╩════════════╩══════════════════╩═════════════════════╝

Table 'OS'. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Computers'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.For the surrogate key setup, we get:
Table 'Manufacturer'. Scan count 0, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Architecture'. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'OS'. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Computers'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Quite clearly, in the above admittedly very simple setup, the surrogate key is lagging in both ease-of-use, and performance. Having said that, what happens if we need to change the name of one of the manufacturers? Here's the T-SQL for the natural key version: UPDATE dbo.Manufacturer SET ManufacturerName = 'Microsoft™' WHERE ManufacturerName = 'Microsoft'; And the plan:


Hannah Vernon
(70988 rep)
Aug 30, 2017, 08:04 PM
• Last activity: Oct 22, 2019, 06:02 AM
1
votes
2
answers
2823
views
What are the disadvantages of using surrogate keys?
I am using MS SQL Server but in general database design I want to know what problems can arise when every row in a database has its auto generated surrogate key value. I know some advantages, for example that there is no need to identify unique columns without NULLs for primary keys, and there is no...
I am using MS SQL Server but in general database design I want to know what problems can arise when every row in a database has its auto generated surrogate key value.
I know some advantages, for example that there is no need to identify unique columns without NULLs for primary keys, and there is no need to manage composite primary keys, normal forms are easier to manage, and uniqueness is guaranteed.
I want to know, is there any good reason regarding to performance, or index structures, etc. that should make us use real world factful keys instead of surrogate ones?
Thanks.
igelr
(2162 rep)
Mar 19, 2019, 07:15 PM
• Last activity: Mar 25, 2019, 08:21 PM
1
votes
1
answers
6712
views
What problems are there in not using a primary key column at all?
What purpose would a primary key column serve on a table such as `post_versions` here? Nothing refers to it, and there are no queries where I will ever want to select a row by `post_versions.id`. It'll be joined to `posts` in most queries. CREATE TABLE posts ( id SERIAL PRIMARY KEY, created TIMESTAM...
What purpose would a primary key column serve on a table such as
post_versions
here? Nothing refers to it, and there are no queries where I will ever want to select a row by post_versions.id
. It'll be joined to posts
in most queries.
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
created TIMESTAMPTZ NOT NULL DEFAULT now(),
user_id INTEGER NOT NULL REFERENCES users ON DELETE RESTRICT
);
CREATE TABLE post_versions (
id SERIAL PRIMARY KEY, -- serves no purpose, could remove I think
post_id INTEGER NOT NULL REFERENCES posts ON DELETE CASCADE,
updated TIMESTAMPTZ NOT NULL DEFAULT now(),
body TEXT NOT NULL
);
Are there any problems if I just remove it?
davidtgq
(759 rep)
Mar 4, 2019, 02:06 PM
• Last activity: Mar 4, 2019, 04:36 PM
2
votes
2
answers
742
views
Composite primary key plus a separate (surrogate) id column for foreign key references
I am trying to build a database about *locations* like so: - Country → Region → Town Countries will be populated with a list of pre-defined worldwide countries, regions & towns are input by users. Region may not apply to every location and so a blank string will be allowed (hence the many to many li...
I am trying to build a database about *locations* like so:
- Country → Region → Town
Countries will be populated with a list of pre-defined worldwide countries, regions & towns are input by users.
Region may not apply to every location and so a blank string will be allowed (hence the many to many link table).
This is the best design I have come up with:
The

town_id
column is set to AUTO_INCREMENT & UNIQUE. It is then used as a foreign key reference in other tables.
The town
column on its own is not UNIQUE as you may have the same town name in multiple places.
Is the towns
table correct?
The reason I ask is because it just seems odd to me to have the town_id
as the reference, however this cannot be included or used as the primary key (because if it did, it would cause duplicates)?
I would welcome feedback if this approach is correct.
the_peacock
(123 rep)
Jun 3, 2014, 07:00 PM
• Last activity: Feb 19, 2019, 10:46 PM
3
votes
1
answers
449
views
Does adding a surrogate key get rid of an identifying relationship between two entities?
In the following example, we have a `Comments` table with a composite primary key (PK) CREATE TABLE Comments ( video_id INT NOT NULL, user_id INT NOT NULL, comment_dt DATETIME NOT NULL, PRIMARY KEY (video_id, user_id, comment_dt), FOREIGN KEY (video_id) REFERENCES Videos(video_id), FOREIGN KEY (user...
In the following example, we have a
Comments
table with a composite primary key (PK)
CREATE TABLE Comments (
video_id INT NOT NULL,
user_id INT NOT NULL,
comment_dt DATETIME NOT NULL,
PRIMARY KEY (video_id, user_id, comment_dt),
FOREIGN KEY (video_id) REFERENCES Videos(video_id),
FOREIGN KEY (user_id) REFERENCES Users(user_id)
);
There appears to be an identifying relationship, as the primary key of Comments
includes its foreign keys.
However, watch what happens when you give this Comments
table a surrogate primary key of its own:
CREATE TABLE Comments (
comment_id SERIAL PRIMARY KEY,
video_id INT NOT NULL,
user_id INT NOT NULL,
comment_dt DATETIME NOT NULL,
FOREIGN KEY (video_id) REFERENCES Videos(video_id),
FOREIGN KEY (user_id) REFERENCES Users(user_id)
);
Does that get rid of the identifying relationship? Logically, it still seems to be there, so does this really come down to the *physical* model and whether or not you want to include a composite key? Are identifying relationships even necessary?
brennvo
(91 rep)
Feb 1, 2019, 08:06 PM
• Last activity: Feb 2, 2019, 08:22 PM
0
votes
1
answers
67
views
M:M Table Design - Use of PK
I have a many-to-many table that contains id's from 2 related tables. For creating a primary key on this table, is it better to create another integer column and have that as a surrogate PK? Or should I create the PK on both id columns? What are the benefits of either method? Here is the table desig...
I have a many-to-many table that contains id's from 2 related tables.
For creating a primary key on this table, is it better to create another integer column and have that as a surrogate PK?
Or should I create the PK on both id columns?
What are the benefits of either method?
Here is the table design with the surrogate key (CodeGroupMappingID)
CREATE TABLE [dbo].[CodeGroupMappings](
[CodeGroupMappingID] [int] IDENTITY(1,1) NOT NULL,
[CodeID] [int] NOT NULL,
[GroupID] [int] NOT NULL,
CONSTRAINT [PK_CodeGroupMappings_CodeGroupMappingID] PRIMARY KEY CLUSTERED ([CodeGroupMappingID] ASC)
)
Kevin
(533 rep)
May 22, 2018, 09:41 AM
• Last activity: May 22, 2018, 08:51 PM
29
votes
3
answers
6548
views
Do natural keys provide higher or lower performance in SQL Server than surrogate integer keys?
I'm a fan of surrogate keys. There is a risk my findings are confirmation biased. Many questions I've seen both here and at http://stackoverflow.com use natural keys instead of surrogate keys based on `IDENTITY()` values. My background in computer systems tells me performing any comparative operatio...
I'm a fan of surrogate keys. There is a risk my findings are confirmation biased.
Many questions I've seen both here and at http://stackoverflow.com use natural keys instead of surrogate keys based on
Am I doing something wrong here, or are INT keys 3 times faster than 25 character natural keys?
Note, I've written a follow-up question [here](https://dba.stackexchange.com/questions/184756/natural-keys-vs-surrogate-keys-part-2-the-showdown) .
IDENTITY()
values.
My background in computer systems tells me performing any comparative operation on an integer will be faster than comparing strings.
This comment made me question my beliefs, so I thought I would create a system to investigate my thesis that integers are faster than strings for use as keys in SQL Server.
Since there is likely to be very little discernible difference in small datasets, I immediately thought of a two table setup where the primary table has 1,000,000 rows and the secondary table has 10 rows for each row in the primary table for a total of 10,000,000 rows in the secondary table. The premise of my test is to create two sets of tables like this, one using natural keys and one using integer keys, and run timing tests on a simple query like:
SELECT *
FROM Table1
INNER JOIN Table2 ON Table1.Key = Table2.Key;
The following is the code I created as a test bed:
USE Master;
IF (SELECT COUNT(database_id) FROM sys.databases d WHERE d.name = 'NaturalKeyTest') = 1
BEGIN
ALTER DATABASE NaturalKeyTest SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
DROP DATABASE NaturalKeyTest;
END
GO
CREATE DATABASE NaturalKeyTest
ON (NAME = 'NaturalKeyTest', FILENAME =
'C:\SQLServer\Data\NaturalKeyTest.mdf', SIZE=8GB, FILEGROWTH=1GB)
LOG ON (NAME='NaturalKeyTestLog', FILENAME =
'C:\SQLServer\Logs\NaturalKeyTest.mdf', SIZE=256MB, FILEGROWTH=128MB);
GO
ALTER DATABASE NaturalKeyTest SET RECOVERY SIMPLE;
GO
USE NaturalKeyTest;
GO
CREATE VIEW GetRand
AS
SELECT RAND() AS RandomNumber;
GO
CREATE FUNCTION RandomString
(
@StringLength INT
)
RETURNS NVARCHAR(max)
AS
BEGIN
DECLARE @cnt INT = 0
DECLARE @str NVARCHAR(MAX) = '';
DECLARE @RandomNum FLOAT = 0;
WHILE @cnt < @StringLength
BEGIN
SELECT @RandomNum = RandomNumber
FROM GetRand;
SET @str = @str + CAST(CHAR((@RandomNum * 64.) + 32) AS NVARCHAR(MAX));
SET @cnt = @cnt + 1;
END
RETURN @str;
END;
GO
CREATE TABLE NaturalTable1
(
NaturalTable1Key NVARCHAR(255) NOT NULL
CONSTRAINT PK_NaturalTable1 PRIMARY KEY CLUSTERED
, Table1TestData NVARCHAR(255) NOT NULL
);
CREATE TABLE NaturalTable2
(
NaturalTable2Key NVARCHAR(255) NOT NULL
CONSTRAINT PK_NaturalTable2 PRIMARY KEY CLUSTERED
, NaturalTable1Key NVARCHAR(255) NOT NULL
CONSTRAINT FK_NaturalTable2_NaturalTable1Key
FOREIGN KEY REFERENCES dbo.NaturalTable1 (NaturalTable1Key)
ON DELETE CASCADE ON UPDATE CASCADE
, Table2TestData NVARCHAR(255) NOT NULL
);
GO
/* insert 1,000,000 rows into NaturalTable1 */
INSERT INTO NaturalTable1 (NaturalTable1Key, Table1TestData)
VALUES (dbo.RandomString(25), dbo.RandomString(100));
GO 1000000
/* insert 10,000,000 rows into NaturalTable2 */
INSERT INTO NaturalTable2 (NaturalTable2Key, NaturalTable1Key, Table2TestData)
SELECT dbo.RandomString(25), T1.NaturalTable1Key, dbo.RandomString(100)
FROM NaturalTable1 T1
GO 10
CREATE TABLE IDTable1
(
IDTable1Key INT NOT NULL CONSTRAINT PK_IDTable1
PRIMARY KEY CLUSTERED IDENTITY(1,1)
, Table1TestData NVARCHAR(255) NOT NULL
CONSTRAINT DF_IDTable1_TestData DEFAULT dbo.RandomString(100)
);
CREATE TABLE IDTable2
(
IDTable2Key INT NOT NULL CONSTRAINT PK_IDTable2
PRIMARY KEY CLUSTERED IDENTITY(1,1)
, IDTable1Key INT NOT NULL
CONSTRAINT FK_IDTable2_IDTable1Key FOREIGN KEY
REFERENCES dbo.IDTable1 (IDTable1Key)
ON DELETE CASCADE ON UPDATE CASCADE
, Table2TestData NVARCHAR(255) NOT NULL
CONSTRAINT DF_IDTable2_TestData DEFAULT dbo.RandomString(100)
);
GO
INSERT INTO IDTable1 DEFAULT VALUES;
GO 1000000
INSERT INTO IDTable2 (IDTable1Key)
SELECT T1.IDTable1Key
FROM IDTable1 T1
GO 10
The code above creates a database and 4 tables, and fills the tables with data, ready to test. The test code I ran is:
USE NaturalKeyTest;
GO
DECLARE @loops INT = 0;
DECLARE @MaxLoops INT = 10;
DECLARE @Results TABLE (
FinishedAt DATETIME DEFAULT (GETDATE())
, KeyType NVARCHAR(255)
, ElapsedTime FLOAT
);
WHILE @loops < @MaxLoops
BEGIN
DBCC FREEPROCCACHE;
DBCC FREESESSIONCACHE;
DBCC FREESYSTEMCACHE ('ALL');
DBCC DROPCLEANBUFFERS;
WAITFOR DELAY '00:00:05';
DECLARE @start DATETIME = GETDATE();
DECLARE @end DATETIME;
DECLARE @count INT;
SELECT @count = COUNT(*)
FROM dbo.NaturalTable1 T1
INNER JOIN dbo.NaturalTable2 T2 ON T1.NaturalTable1Key = T2.NaturalTable1Key;
SET @end = GETDATE();
INSERT INTO @Results (KeyType, ElapsedTime)
SELECT 'Natural PK' AS KeyType, CAST((@end - @start) AS FLOAT) AS ElapsedTime;
DBCC FREEPROCCACHE;
DBCC FREESESSIONCACHE;
DBCC FREESYSTEMCACHE ('ALL');
DBCC DROPCLEANBUFFERS;
WAITFOR DELAY '00:00:05';
SET @start = GETDATE();
SELECT @count = COUNT(*)
FROM dbo.IDTable1 T1
INNER JOIN dbo.IDTable2 T2 ON T1.IDTable1Key = T2.IDTable1Key;
SET @end = GETDATE();
INSERT INTO @Results (KeyType, ElapsedTime)
SELECT 'IDENTITY() PK' AS KeyType, CAST((@end - @start) AS FLOAT) AS ElapsedTime;
SET @loops = @loops + 1;
END
SELECT KeyType, FORMAT(CAST(AVG(ElapsedTime) AS DATETIME), 'HH:mm:ss.fff') AS AvgTime
FROM @Results
GROUP BY KeyType;
These are the results:

Hannah Vernon
(70988 rep)
Sep 29, 2013, 01:56 AM
• Last activity: Mar 18, 2018, 03:22 PM
40
votes
3
answers
7197
views
Should every table have a single-field surrogate/artificial primary key?
I understand one benefit of surrogate/artificial keys in general - they do not change and that can be very convenient. This is true whether they are single or multiple field - as long as they are 'artificial'. However, it sometimes seems to be a matter of policy to have an auto-incrementing integer...
I understand one benefit of surrogate/artificial keys in general - they do not change and that can be very convenient. This is true whether they are single or multiple field - as long as they are 'artificial'.
However, it sometimes seems to be a matter of policy to have an auto-incrementing integer field as the primary key of each table. Is this always the best idea to have such a **single-field** key and why (or why not)?
To be clear, this question is not about artificial vs natural - but about whether all artificial keys should be **single-field**
Jack Douglas
(40537 rep)
Sep 23, 2011, 09:25 PM
• Last activity: Mar 18, 2018, 12:50 PM
8
votes
5
answers
5853
views
What normal form does a surrogate key violate?
I have following question: "What normal form does a surrogate key violate?" My thought was the 3rd normal form, but I'm not quite sure it's just an assumption i am making. Could someone explain that to me?
I have following question:
"What normal form does a surrogate key violate?"
My thought was the 3rd normal form, but I'm not quite sure it's just an assumption i am making. Could someone explain that to me?
Marco
(91 rep)
Apr 16, 2012, 08:35 AM
• Last activity: Mar 18, 2018, 12:48 PM
5
votes
2
answers
2189
views
Surrogate key vs Natural key
I have a table called devices. Most of the devices that will get stored in this table can be uniquely identified by their serial number and part number. But there are some device types that do not have serial number and part number assigned to them. Instead they can be uniquely identified by another...
I have a table called devices. Most of the devices that will get stored in this table can be uniquely identified by their serial number and part number. But there are some device types that do not have serial number and part number assigned to them. Instead they can be uniquely identified by another field (internal id).
Should I create a surrogate key for this table or should I create a composite primary key (serial number, part number, internal id) and insert default values to the serial number and part number columns when they are not supplied? The device types that do not have part number and serial number now, will have the numbers assigned to them in the future releases (may be 5 years from now). Should I create a surrogate key or a composite key in this scenario? Or using the three unique attributes, should I create a hash in the program and use that as a surrogate key for the tables?
DBK
(378 rep)
Aug 26, 2017, 04:07 AM
• Last activity: Mar 18, 2018, 12:43 PM
0
votes
2
answers
155
views
Common term for tenant-based keys
Assume the following schema: ID  | TENANT | TENANT_CUSTOM_ID 346 | 101    | 1 347 | 102    | 1 348 | 101    | 2 349 | 101    | 3 350 | 101    | 4 351 | 102    | 2 352 | 1...
Assume the following schema:
ID | TENANT | TENANT_CUSTOM_ID
346 | 101 | 1
347 | 102 | 1
348 | 101 | 2
349 | 101 | 3
350 | 101 | 4
351 | 102 | 2
352 | 103 | 1
**What is another name (or the proper classification) for
TENANT_CUSTOM_ID
?** Perhaps something along the lines of a surrogate key? The purpose of this key is to make each tenant have their own numbering system instead of knowing that their records are mixed with all other tenants on the same tables.
1234567
(159 rep)
Jun 2, 2017, 12:00 AM
• Last activity: Jun 2, 2017, 09:01 PM
1
votes
1
answers
1003
views
What are the consequences, positive or negative, to having a surrogate primary key for a table which already has a guaranteed unique column?
I recently made a table that looks like the one below CREATE TABLE example( id SERIAL PRIMARY KEY, subindusty_id INT UNIQUE NOT NULL, name TEXT NOT NULL -- other fields... ) I realized that in this table I could just remove the column called `id` and make `subindustry_id` the primary key. Thinking b...
I recently made a table that looks like the one below
CREATE TABLE example(
id SERIAL PRIMARY KEY,
subindusty_id INT UNIQUE NOT NULL,
name TEXT NOT NULL
-- other fields...
)
I realized that in this table I could just remove the column called
id
and make subindustry_id
the primary key.
Thinking back, the only reason I can think of for why I didn't make subindustry_id
the primary key is for the sake of consistency, considering that I have a lot of other tables with a column called id
.
Except for things like consistency*, are there any tangible benefits/downsides to having this additional id column as the primary key when there's already another column that'd be a good primary key?
\*Consistency as in "that's the way the rest of the tables look". The kind where you're enforcing a certain type of style. I guess the point of this question is to question the practice of giving every single table an id
column regardless of the fact that they might already have a more suitable candidate for a primary key.
m0meni
(147 rep)
Jan 19, 2017, 06:26 PM
• Last activity: Jan 20, 2017, 05:53 AM
Showing page 1 of 20 total questions