Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes

1 answers

703 views

How to update a UID throughout the database?

Our database makes use of `uniqueidentifier` and we have the need to update that value everywhere it's found in our system. Sample minimal structure: CREATE TABLE Users ( id uniqueidentifier not null default newsequentialid() primary key, [name] nvarchar(100) not null ) CREATE TABLE Tasks ( id uniqu...

                                  Our database makes use of uniqueidentifier and we have the need to update that value everywhere it's found in our system.

Sample minimal structure:

    CREATE TABLE Users
    (
        id uniqueidentifier not null default newsequentialid() primary key,
        [name] nvarchar(100) not null
    )

    CREATE TABLE Tasks
    (
        id uniqueidentifier not null default newsequentialid() primary key,
        [name] nvarchar(max) not null,
        userId uniqueidentifier not null foreign key references Users(id)
    )

    DECLARE @users TABLE(id uniqueidentifier)
    DECLARE @uid uniqueidentifier

    INSERT INTO Users(name) OUTPUT INSERTED.id INTO @users VALUES('Jim Bob')
    SET @uid = (SELECT TOP 1 id FROM @users)
    INSERT INTO Tasks(name, userId) VALUES('Some task', @uid)

P.S. Obviously I have many more tables

I know how to search all database tables for a given value (I've used queries similar what's written here this  before). I'm unaware of a simpler solution for uid's if any exist.

With the above, is there a simpler way than looping over tables, then columns of datatype uniqueidentifier, then locating the uid created in order to update its value to say 0x0?


                                

ctwheels (173 rep)

Jul 28, 2020, 10:29 PM • Last activity: May 8, 2025, 09:06 AM

57 votes

3 answers

55604 views

What is the optimal data type for an MD5 field?

postgresql database-design index datatypes uniqueidentifier

We are designing a system that is known to be read-heavy (on the order of tens of thousands of reads per minute). * There is a table `names` that serves as a sort of central registry. Each row has a `text` field `representation` and a unique `key` that is an MD5 hash of that `representation`. 1 This...

                                  We are designing a system that is known to be read-heavy (on the order of tens of thousands of reads per minute). 

* There is a table names that serves as a sort of central registry. Each row has a text field representation and a unique key that is an MD5 hash of that representation.1 This table currently has tens of millions of records and is expected to grow into the billions over the lifetime of the application.
* There are dozens of other tables (of highly varying schemas and record counts) that make reference to the names table. Any given record in one of these tables is guaranteed to have a name_key, which is functionally a foreign key to the names table. 

1: Incidentally, as you might expect, records in this table are immutable once written. 

For any given table other than the names table, the most common query will follow this pattern:

    SELECT list, of, fields 
    FROM table 
    WHERE name_key IN (md5a, md5b, md5c...);

I would like to optimize for read performance. I suspect that my first stop should be to minimize the size of the indices (though I wouldn't mind being proven wrong there). 

**The Question:**  
What is/are the optimal data types for the key and name_key columns?  
Is there a reason to use hex(32) over bit(128)? BTREE or GIN?
                                

bobocopy (765 rep)

Sep 16, 2015, 05:00 PM • Last activity: Jul 11, 2024, 11:35 PM

0 votes

2 answers

50 views

id increases even if the line has not been created

postgresql auto-increment uniqueidentifier

I was doing a simple crud on nestjs and using postgres, then i noticed something strange. I made an entity with a unique string column @Entity() export class Model { @PrimaryGeneratedColumn() id: number; @Column({unique:true}) name: string; } but then everytime i try to add a row, and if this row do...

                                  I was doing a simple crud on nestjs and using postgres, then i noticed something strange.

I made an entity with a unique string column

    @Entity()
    export class Model {
      @PrimaryGeneratedColumn()
      id: number;
    
      @Column({unique:true})
      name: string;
    }

but then everytime i try to add a row, and if this row dont get added because the unique restriction it increases the current id of the database anyways, and then the next line jumps some ids.

exemplifying: if i try to add a Model with name 'model 1', it will be id 1, since it's the first, but then, if i try to add 'model 1' 15 times, and it wont be added because i made the restriction, the next time i insert a row, like 'model 2' it will jump a lot of ids, and wont be 2, it will be 16. Is it how it should work? it looks like I am wasting ids for no reason.
                                

Kaique Bahmad (1 rep)

Jun 24, 2024, 01:05 AM • Last activity: Jun 24, 2024, 05:04 AM

177 votes

6 answers

186115 views

Guid vs INT - Which is better as a primary key?

performance sql-server primary-key uniqueidentifier

I've being reading around reasons to use or not `Guid` and `int`. `int` is smaller, faster, easy to remember, keeps a chronological sequence. And as for `Guid`, the only advantage I found is that it is unique. In which case a `Guid` would be better than and `int` and why? From what I've seen, `int`...

                                  I've being reading around reasons to use or not Guid and int.

int is smaller, faster, easy to remember, keeps a chronological sequence. And as for Guid, the only advantage I found is that it is unique. In which case a Guid would be better than and int and why?

From what I've seen, int has no flaws except by the number limit, which in many cases are irrelevant.

Why exactly was Guid created? I actually think it has a purpose other than serving as primary key of a simple table. (Any example of a real application using Guid for something?)

( Guid = UniqueIdentifier ) type on SQL Server

BrunoLM (3533 rep)

Jan 5, 2011, 07:46 AM • Last activity: May 7, 2024, 01:21 PM

0 votes

0 answers

43 views

Is this an abuse of UniqueIdentifiers?

sql-server database-design foreign-key uniqueidentifier

I'm designing a new Sql Server database. Every major table uses a UniqueIdentifier as a primary key. I'm adding in a new data object called "Notebook" which is a universal object that can be attached to rows from any of several major tables. It will be used to hold user notes, user created reminders...

                                  I'm designing a new Sql Server database.  Every major table uses a UniqueIdentifier as a primary key.  I'm adding in a new data object called "Notebook" which is a universal object that can be attached to rows from any of several major tables.  It will be used to hold user notes, user created reminders, and TO DO items in a universal format so that all objects will get the full, consistent functionality of a "Notebook" across our entire application.

Each "NotebookEntry" will be tied to one or more items from the major tables, and often tied to objects in multiple major tables.  (E.G. An entry that reminds EmployeeB to Check on Policy C for customer A would be tied to "Customer A", "Employee B" and "Policy C")

Since UniqueIdentifiers are (theoretically) globally unique, I am considering a Many-to-Many join table with two columns "NotebookId" and "NotebookEntryId" where the "NotebookId" would be the UniqueIdentifier of it's origin row *even though the ids that that columns refers to would be scattered across multiple tables.*

I'm torn between believing this is either a great idea or a horrible idea.  I'd love to hear opinions on what problems might be caused by a column that is "foreign keyed" to multiple tables.
The obvious:

 - I won't be able to use a standard SQL foreign key, although *I think* performance shouldn't suffer much as long as the queries always start from the major table and not from the Join table  (IOW, Checking what notebooks are related to "Employee" objects rather than trying to write queries for every notebook regardless of what it is tied to)
 - I'm not sure if UniqueIdentifiers could possible repeat across tables, which would break things, but since the chance of two randomly generated UniqueIdentifiers being the same is staggeringly small, I don't think we have to worry about this too much.  Plus, if two items did happen to share a UniqueIdentifiers and the entries for one showed up on the other, there would be no serious consequences.

SvdSinner (101 rep)

Jun 26, 2023, 07:28 PM

0 votes

0 answers

395 views

How to create a list of increasing unique identifiers?

sql-server insert uniqueidentifier

I'm creating a list of Connections objects (names of machines with IP address and IP port), but I'm having an issue with the `Id` field. Normally this is the way I would work: INSERT INTO [CONNECTIONS] ([Id], [Name], [Address]) VALUES (1, 'Connection_1', '1.1.1.1') INSERT INTO [CONNECTIONS] ([Id], [...

                                  I'm creating a list of Connections objects (names of machines with IP address and IP port), but I'm having an issue with the Id field.

Normally this is the way I would work:

    INSERT INTO [CONNECTIONS] ([Id], [Name], [Address]) VALUES (1, 'Connection_1', '1.1.1.1')
    INSERT INTO [CONNECTIONS] ([Id], [Name], [Address]) VALUES (2, 'Connection_2', '2.2.2.2')
    ...

... but as my Id field is not a simple int, but a uniqueidentifier this doesn't work.

So, I tried with NEWID():

    INSERT INTO [CONNECTIONS] ([Id], [Name], [Address]) VALUES (NEWID(), 'Connection_1', '1.1.1.1')
    INSERT INTO [CONNECTIONS] ([Id], [Name], [Address]) VALUES (NEWID(), 'Connection_2', '2.2.2.2')
    ...

That worked, but the newly generated IDs were created randomly, so when asking the result, I  did not get "Connection_1" up to "Connection_10", but something like "Connection_3", then "Connection_8", only then "Connection_1", ..., quite messy.

So I tried again, this time using NEWSEQUENTIALID(), so I had following SQL "INSERT INTO" queries:

    INSERT INTO [CONNECTIONS] ([Id], [Name], [Address]) VALUES (NEWSEQUENTIALID(), 'Connection_1', '1.1.1.1')
    INSERT INTO [CONNECTIONS] ([Id], [Name], [Address]) VALUES (NEWSEQUENTIALID(), 'Connection_2', '2.2.2.2')
    ...

As a result, I got "Commands completed successfully", but my table turns out empty :-(

I don't need my IDs to be sequential, I just want them to be ordered from small to large (as can be expected from an ID, at least that's what I thought).

How can I achieve that? (I'm working with SQL-Server, using SQL Server Management Studio version v18.12).
                                

Dominique (609 rep)

Oct 6, 2022, 08:25 AM

0 votes

0 answers

96 views

Generate unique ids in BigQuery data import

data-warehouse uniqueidentifier google-bigquery

I am using Zapier to import woocommerce orders into BigQuery and build a datawarehouse with paid orders from multiple sites. However, multiple sites generate an ID per site, not a unique id globally for all, so importing into the same table data from all the woocommerce orders will not have unique i...

                                  I am using Zapier to import woocommerce orders into BigQuery and build a datawarehouse with paid orders from multiple sites. However, multiple sites generate an ID per site, not a unique id globally for all, so importing into the same table data from all the woocommerce orders will not have unique ids. 

I would like BigQuery to automatically assign a unique id to every new order that enters the table. Is this possible?

tokyographer (1 rep)

Aug 22, 2022, 07:26 PM • Last activity: Aug 22, 2022, 09:09 PM

17 votes

3 answers

8857 views

Securely generate a UNIQUEIDENTIFIER in SQL Server

sql-server sql-server-2012 security uniqueidentifier cryptography

I intend to be using a `UNIQUEIDENTIFIER` as an access key that users can use to access certain data. The key will act as a password in that sense. I need to generate multiple such identifiers as part of an `INSERT...SELECT` statement. For architectural reasons I want to generate the identifiers ser...

                                  I intend to be using a UNIQUEIDENTIFIER as an access key that users can use to access certain data. The key will act as a password in that sense.

I need to generate multiple such identifiers as part of an INSERT...SELECT statement. For architectural reasons I want to generate the identifiers server-side in this case.

How can I generate a securely random UNIQUEIDENTIFIER? Note, that NEWID would not be random enough as it does not promise any security properties at all. I'm looking for the SQL Server equivalent of System.Security.Cryptography.RandomNumberGenerator  because I need unguessable IDs. Anything based on CHECKSUM, RAND or GETUTCDATE would also not qualify.

usr (7390 rep)

Apr 4, 2013, 02:09 PM • Last activity: Jul 15, 2022, 03:18 PM

28 votes

3 answers

44598 views

Should I use UUID as well as ID

mysql database-design performance uniqueidentifier

I've been using UUIDs in my systems for a while now for a variety of reasons ranging from logging to delayed correlation. The formats I used changed as I became less naive from: 1. `VARCHAR(255)` 2. `VARCHAR(36)` 3. `CHAR(36)` 4. `BINARY(16)` It was when I reached the final one `BINARY(16)` that I s...

                                  I've been using UUIDs in my systems for a while now for a variety of reasons ranging from logging to delayed correlation.  The formats I used changed as I became less naive from:

1. VARCHAR(255)
2. VARCHAR(36)
3. CHAR(36)
4. BINARY(16)

It was when I reached the final one BINARY(16) that I started to compare performance with basic auto-increment integer.  The test and results are shown below, but if you just want the summary, it indicates that INT AUTOINCREMENT and BINARY(16) RANDOM have identical performance on data ranges up to 200,000 (the database was pre-populated prior to tests).

I was initially sceptical towards using UUIDs as primary keys, and indeed I still am, however I see potential here to create a flexible database that can use both.  Whereas many people stress over the advantages of either, what are the disadvantages cancelled out by using both data types?

- PRIMARY INT
- UNIQUE BINARY(16)

The use case for this type of set up would be the traditional primary key for inter-table relationships, with unique identifier used for inter-system relationships.

What I am essentially trying to discover is difference in efficiency between the two approaches.  Besides the quadruple disk space used, which may be largely negligible after additional data is added, they appear to me to be the same.

Schema:

    -- phpMyAdmin SQL Dump
    -- version 4.0.10deb1
    -- http://www.phpmyadmin.net 
    --
    -- Host: localhost
    -- Generation Time: Sep 22, 2015 at 10:54 AM
    -- Server version: 5.5.44-0ubuntu0.14.04.1
    -- PHP Version: 5.5.29-1+deb.sury.org~trusty+3
    
    SET SQL_MODE = "NO_AUTO_VALUE_ON_ZERO";
    SET time_zone = "+00:00";
    
    
    /*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
    /*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
    /*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
    /*!40101 SET NAMES utf8 */;
    
    --
    -- Database: test
    --
    
    -- --------------------------------------------------------
    
    --
    -- Table structure for table with_2id
    --
    
    CREATE TABLE with_2id (
      guidl bigint(20) NOT NULL,
      guidr bigint(20) NOT NULL,
      data varchar(255) NOT NULL,
      PRIMARY KEY (guidl,guidr)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
    
    -- --------------------------------------------------------
    
    --
    -- Table structure for table with_guid
    --
    
    CREATE TABLE with_guid (
      guid binary(16) NOT NULL,
      data varchar(255) NOT NULL,
      PRIMARY KEY (guid)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
    
    -- --------------------------------------------------------
    
    --
    -- Table structure for table with_id
    --
    
    CREATE TABLE with_id (
      id int(11) NOT NULL AUTO_INCREMENT,
      data varchar(255) NOT NULL,
      PRIMARY KEY (id)
    ) ENGINE=InnoDB  DEFAULT CHARSET=latin1 AUTO_INCREMENT=197687 ;
    
    /*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
    /*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
    /*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;

Insert benchmark:

    function benchmark_insert(PDO $pdo, $runs)
    {
        $data = 'Sample Data';
    
        $insert1 = $pdo->prepare("INSERT INTO with_id (data) VALUES (:data)");
        $insert1->bindParam(':data', $data);
    
        $insert2 = $pdo->prepare("INSERT INTO with_guid (guid, data) VALUES (:guid, :data)");
        $insert2->bindParam(':guid', $guid);
        $insert2->bindParam(':data', $data);
    
        $insert3 = $pdo->prepare("INSERT INTO with_2id (guidl, guidr, data) VALUES (:guidl, :guidr, :data)");
        $insert3->bindParam(':guidl', $guidl);
        $insert3->bindParam(':guidr', $guidr);
        $insert3->bindParam(':data',  $data);
    
        $benchmark = array();
    
        $time = time();
        for ($i = 0; $i execute();
        }
        $benchmark = 'INC ID:     ' . (time() - $time);
    
        $time = time();
        for ($i = 0; $i execute();
        }
        $benchmark = 'GUID:       ' . (time() - $time);
    
        $time = time();
        for ($i = 0; $i execute();
        }
        $benchmark = 'SPLIT GUID: ' . (time() - $time);
    
        echo 'INSERTION' . PHP_EOL;
        echo '=============================' . PHP_EOL;
        echo $benchmark . PHP_EOL;
        echo $benchmark . PHP_EOL;
        echo $benchmark . PHP_EOL . PHP_EOL;
    }

Select benchmark:

    function benchmark_select(PDO $pdo, $runs) {
        $select1 = $pdo->prepare("SELECT * FROM with_id WHERE id = :id");
        $select1->bindParam(':id', $id);
    
        $select2 = $pdo->prepare("SELECT * FROM with_guid WHERE guid = :guid");
        $select2->bindParam(':guid', $guid);
    
        $select3 = $pdo->prepare("SELECT * FROM with_2id WHERE guidl = :guidl AND guidr = :guidr");
        $select3->bindParam(':guidl', $guidl);
        $select3->bindParam(':guidr', $guidr);
    
        $keys = array();
    
        for ($i = 0; $i  $kguid,
                'guidl' => $kguidl,
                'guidr' => $kguidr,
                'id'    => $kid
            );
        }
    
        $benchmark = array();
    
        $time = time();
        foreach ($keys as $key) {
            $id = $key['id'];
            $select1->execute();
            $row = $select1->fetch(PDO::FETCH_ASSOC);
        }
        $benchmark = 'INC ID:     ' . (time() - $time);
    
    
        $time = time();
        foreach ($keys as $key) {
            $guid = $key['guid'];
            $select2->execute();
            $row = $select2->fetch(PDO::FETCH_ASSOC);
        }
        $benchmark = 'GUID:       ' . (time() - $time);
    
        $time = time();
        foreach ($keys as $key) {
            $guidl = $key['guidl'];
            $guidr = $key['guidr'];
            $select3->execute();
            $row = $select3->fetch(PDO::FETCH_ASSOC);
        }
        $benchmark = 'SPLIT GUID: ' . (time() - $time);
    
        echo 'SELECTION' . PHP_EOL;
        echo '=============================' . PHP_EOL;
        echo $benchmark . PHP_EOL;
        echo $benchmark . PHP_EOL;
        echo $benchmark . PHP_EOL . PHP_EOL;
    }

Tests:

    $pdo = new PDO('mysql:host=localhost;dbname=test', 'root', '');
    
    benchmark_insert($pdo, 1000);
    benchmark_select($pdo, 100000);

Results:

    INSERTION
    =============================
    INC ID:     3
    GUID:       2
    SPLIT GUID: 3
    
    SELECTION
    =============================
    INC ID:     5
    GUID:       5
    SPLIT GUID: 6
                                

Flosculus (391 rep)

Sep 22, 2015, 01:39 PM • Last activity: Jun 19, 2022, 09:25 AM

0 votes

1 answers

89 views

Prevent creation of duplicate rows in database dynamically

database-design duplication cassandra uniqueidentifier

I am new to databases so please forgive my mistakes. Here's the scenario. I use the web for my data source. Sometimes, like, for a keyword, what my crawler do is crawl same keyword repeatedly (not the crawler's fault). I don't want to include that particular keyword again in my database. I know of p...

                                  I am new to databases so please forgive my mistakes. Here's the scenario.
I use the web for my data source.
Sometimes, like, for a keyword, what my crawler do is crawl same keyword repeatedly (not the crawler's fault).

I don't want to include that particular keyword again in my database. 
I know of primary key and also know that no two primary keys can be same. 

So should I make my keyword primary key to avoid duplicate rows or is there some other methods to deal with it??

The DB is dynamic because there is very little manual interference in the crawl.

Note: I am using Apache Cassandra.
                                

Aman Rawat (1 rep)

May 26, 2022, 01:51 PM • Last activity: May 27, 2022, 03:22 PM

26 votes

2 answers

26183 views

What is the data type of the ‘ctid’ system column in Postgres?

postgresql datatypes postgresql-10 uniqueidentifier

The Postgres system columns are documented in [*Chapter 5. Data Definition > 5.4. System Columns*](https://www.postgresql.org/docs/current/static/ddl-system-columns.html). That page mentions that `oid` values “are 32-bit quantities”. And that page says the same about transaction identifiers. So I wi...

                                  The Postgres system columns are documented in [*Chapter 5. Data Definition > 5.4. System Columns*](https://www.postgresql.org/docs/current/static/ddl-system-columns.html) .

That page mentions that oid values “are 32-bit quantities”. And that page says the same about transaction identifiers. So I will assume that means oid, tableoid, xmin, cmin, xmax, and cmax are all 32-bit integers. 

But that leaves the ctid system column.

>The physical location of the row version within its table. Note that although the ctid can be used to locate the row version very quickly, a row's ctid will change if it is updated or moved by VACUUM FULL. Therefore ctid is useless as a long-term row identifier. The OID, or even better a user-defined serial number, should be used to identify logical rows.

➡ What is the data type of the ctid column? 

Specifically I am interested in version Postgres 10.3, but if it has changed over past versions, that would be good to know.

Basil Bourque (11188 rep)

Apr 15, 2018, 04:26 AM • Last activity: Jan 8, 2022, 02:59 AM

0 votes

1 answers

101 views

Do I need to implement id column to table if i know that every added row will be unique?

database-design erd database-diagrams uniqueidentifier

I want to create database which allow me to store IT job offers from various websites. While creating ER diagram I started wondering if I really need an id column in one particular table. Look at this situation below: [![enter image description here][1]][1] 1. Every job offer has a title and a sourc...

                                  I want to create database which allow me to store IT job offers from various websites. While creating ER diagram I started wondering if I really need an id column in one particular table. Look at this situation below:

1. Every job offer has a title and a source website. It happens quite often that job offers from the same website have the same title so in order to distinguish them somehow I created column job_id which makes each offer (row in table) unique.
2. Job offers sometimes contain information about required additional Tech_skill. For these I made a separate table. As with job offers, their names are often repeated. And this is where the main thread comes in: unlike in case of job offers, I think I don't need to store every  Tech_skill name because if I did this they would repeat. Instead, I think I can add only unique ones (those that haven't appeared yet) to my Tech_skill table so that there wouldn't be repetition. And if I needed to associate job offer with one or several skills then I'd use linking table (many to many relationship). 

So if I were to inserting non-repeating skill names into a Tech_skill table would I need to create an id column?

beginsql (65 rep)

Sep 23, 2021, 04:35 PM • Last activity: Sep 23, 2021, 04:58 PM

0 votes

1 answers

113 views

Must tuples of a relation and their referents be in a one-to-one relationship?

relational-theory identity uniqueidentifier

Figures 3.a and 3.b, section 1.4, page 381 of Edgar Codd’s 1970 landmark paper [‘A Relational Model of Data for Large Shared Data Banks’](https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf) presents a set of relation schemas (unnormalized in figure 3.a and in 1NF in figure 3.b). [![First normal f...

                                  Figures 3.a and 3.b, section 1.4, page 381 of Edgar Codd’s 1970 landmark paper [‘A Relational Model of Data for Large Shared Data Banks’](https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf)  presents a set of relation schemas (unnormalized in figure 3.a and in 1NF in figure 3.b).

The normalized relation schema children′ has a key dependency KEY(man#, childname), so it allows instances like the following relation:

| man# | childname | birthdate  |
| ---- | --------- | ---------- |
| 1    | "Thomas"  | 2016-03-12 |
| 2    | "Paul"    | 2011-08-02 |
| 2    | "Julia"   | 2012-01-25 |
| 3    | "Thomas"  | 2016-03-12 |

The tuples (1, "Thomas", 2016-03-12) and (3, "Thomas", 2016-03-12) can refer to two different children named ’Thomas’ in the real world, *or* to the same child named ’Thomas’ in the real world (implying that his two parents are employed by the same company). So the relationship between tuples and children in the real world is *many-to-one*, i.e. a tuple can only refer to a single child, but a child can be the referent of multiple tuples.

Does the relational model mandate that the relationship between tuples of a relation and their referents be one-to-one (thereby making Codd’s example incorrect since it is many-to-one)?

In other words: is aliasing forbidden in the relational model? Must we always use globally unique references in our relations, i.e. UUID keys?

Géry Ogam (71 rep)

Jun 2, 2021, 06:09 PM • Last activity: Jun 2, 2021, 06:56 PM

0 votes

2 answers

669 views

Person unique identification

mysql statistics uniqueidentifier

I am working on a sports database. I have table for persons(players), teams, leagues etc. The problem is, how do I identify a person uniqely? I was thinking that a new person could be added by either league organizer, team, or the player themselves, by registering to the website with this database....

                                  I am working on a sports database. 

I have table for persons(players), teams, leagues etc. 

The problem is, how do I identify a person uniqely? I was thinking that a new person could be added by either league organizer, team, or the player themselves, by registering to the website with this database. 

One player can play in multiple leagues. So when they are already in the database and a new league will want to add him again, there will be a duplicate. 

The only solution I can think of is using the persons national ID, as there potentially could be two persons with the same name, birthplace, and birth date. 

Another solution I can think of is alerting the user that there is already a person with the same or similar data. Then they can check if it is really new person or just a duplicate. However, this doesn't work well as the user could still add duplicates. 

What are your opinions on this problem? Can you give me some advice?

Rikib1999 (111 rep)

Dec 16, 2020, 11:04 PM • Last activity: Dec 17, 2020, 05:19 PM

4 votes

3 answers

7292 views

Should I use data type SERIAL for table id columns?

mysql uniqueidentifier

I found out that there is a mysql datatype called SERIAL, it appears to be designed for use as an id column. However some of my peers don't think that the real data type BIGINT implied by SERIAL is appropriate for tables that may never get big. Is there any reason to avoid using SERIAL type for a ta...

                                  I found out that there is a mysql datatype called SERIAL, it appears to be designed for use as an id column. However some of my peers don't think that the real data type BIGINT implied by SERIAL is appropriate for tables that may never get big. 

Is there any reason to avoid using SERIAL type for a table id column?

Are there any obvious caveats?

ThorSummoner (143 rep)

Nov 13, 2014, 06:26 PM • Last activity: Oct 21, 2020, 09:25 AM

1 votes

2 answers

58 views

Large Identifier Uniqueness Across Table

sql-server uniqueidentifier

We use `NEWSEQUENTIALID` to generate `uniqueidentifier` as table ID. Issue is raised after server reset as the seed of it is modified and **sequentiality** breaks. Is there another suggested solution to achieve large identifier uniqueness for a table? Thank you

                                  We use NEWSEQUENTIALID to generate uniqueidentifier as table ID.
Issue is raised after server reset as the seed of it is modified and **sequentiality** breaks.

Is there another suggested solution to achieve large identifier uniqueness for a table?

Thank you

Roi Shabtai (113 rep)

Oct 7, 2020, 09:42 AM • Last activity: Oct 13, 2020, 12:09 PM

14 votes

3 answers

14887 views

Indexing a PK GUID in SQL Server 2012

sql-server index-tuning clustered-index uniqueidentifier

My developers have setup their application to use GUID's as PK for pretty much all of their tables and by default SQL Server has setup the clustered index on these PK's. The system is relatively young and our biggest tables are just over a million rows, but we're taking a look at our indexing and wa...

                                  My developers have setup their application to use GUID's as PK for pretty much all of their tables and by default SQL Server has setup the clustered index on these PK's. 

The system is relatively young and our biggest tables are just over a million rows, but we're taking a look at our indexing and want to be able to scale quickly as it may be needed in the near future.

So, my first inclination was to move the clustered index to the created field which is a bigint representation of a DateTime. However, the only way I can make the CX unique would be to include the GUID column in this CX but order by created first. 

Would this make the clustering key too wide and would it boost performance for writes? Reads are important too, but writes are probably a bigger concern at this point.

njkroes (655 rep)

Oct 31, 2013, 04:27 PM • Last activity: Aug 21, 2020, 03:20 AM

3 votes

2 answers

1178 views

Hash IDs (such as base64 YouTube video Ids): store as int/bigint, or as string, and why?

mysql mariadb primary-key auto-increment uniqueidentifier

I want to generate "unguessable" IDs, rather than /record/1, /record/2, /record/3 etc. YouTube's style of using base 64 allows for 16.7 million IDs to be stored in just 4 characters (64 * 64 * 64 * 64). This would be more than enough for my needs. My only pondering is whether I should be storing the...

                                  I want to generate "unguessable" IDs, rather than /record/1, /record/2, /record/3 etc.

YouTube's style of using base 64 allows for 16.7 million IDs to be stored in just 4 characters (64 * 64 * 64 * 64).  This would be more than enough for my needs.

My only pondering is whether I should be storing them as ints in the database and converting them every time a request comes in (or when I need to generate a url), or storing them as the string that the user will see.  And why?

A couple of follow-up questions:

1) rather than generating random integers and converting them to the base64 string, should I use auto-increment numbers with a salt to randomise the alphabet, so I can get apparently "random" identifiers from IDs of 1 2 3 4 5 6 7 etc.  And why?

2) for table relationships should I use whatever I've generated as my PK/FK, or should that be a separate  column and just a simple auto-increment int, for some reason?

Codemonkey (265 rep)

Jun 2, 2020, 10:27 PM • Last activity: Jun 6, 2020, 10:39 PM

-2 votes

1 answers

53 views

If all associated attributes of an intended primary key are identical, is it still a true primary key?

primary-key duplication architecture uniqueidentifier surrogate-key

I'll start with an example. If I have a `person` table with intended surrogate primary key `Id`: +----+------+------------+-------------+ | Id | name | DoB | SSN | +----+------+------------+-------------+ | 1 | John | 1901-01-01 | 111-11-1111 | | 2 | Jane | 1902-02-02 | 222-22-2222 | | 3 | John | 19...

I'll start with an example. If I have a person table with intended surrogate primary key Id:

+----+------+------------+-------------+
| Id | name |    DoB     |     SSN     |
+----+------+------------+-------------+
|  1 | John | 1901-01-01 | 111-11-1111 |
|  2 | Jane | 1902-02-02 | 222-22-2222 |
|  3 | John | 1901-01-01 | 111-11-1111 |
+----+------+------------+-------------+

Note Ids 1 & 3 have the same attributes; They both represent the same person. Now from what we know about the theory behind what constitutes a primary key, which I think is well summarized here : > - The primary key must uniquely identify each record. > - A record’s primary-key value can’t be null. > - The primary key-value must exist when the record is created. > - The primary key must remain stable—you can’t change the primary-key field(s). > - The primary key must be compact and contain the fewest possible attributes. Consider the first bullet, "*The primary key must uniquely identify each record.*" In my example, I suppose whether or not each Id does represent uniqueness depends on what's really supposed to be considered unique. A different database record? Yes. A different person (what the records are supposed to represent)? No. So multiple Ids represent what is functionally the same subject generating the data, present in 2 records. A "two to one Id" of sorts. I've not read anything that directly address the scenario my example illustrates, as is relates to what is or is not a PK. 1. Does this example violate the theory behind what constitutes a primary key? 2. If not, does this example illustrate a violation of any larger principles of database architecture, or can this concept be reduced to something as simple as "duplication of data - clean it up"? Many thanks.

simply_sideways (3 rep)

May 5, 2020, 04:33 PM • Last activity: May 5, 2020, 10:02 PM

0 votes

1 answers

219 views

If PK is Non-Clustered index, how to identify clustered index candidate?

sql-server primary-key clustered-index nonclustered-index uniqueidentifier

I'm designing a database for insurance company. I want to make Primary key as GUID-uniqueidentifier. I want to create Index for each table in database for performance optimization. I did research and i came to know that GUID primary key should be Non Clustered always. I agreed that clustered index o...

                                  I'm designing a database for insurance company. I want to make Primary key as GUID-uniqueidentifier. I want to create Index for each table in database for performance optimization. 

I did research and i came to know that GUID primary key should be Non Clustered always. I agreed that clustered index on GUID will create fragmentation in memory. 

I have sample tables,

    CREATE TABLE [dbo].[ProductState](

    	[ProductStateId] [uniqueidentifier] NOT NULL,
    	[ProductId] [uniqueidentifier] NOT NULL,  -- Foreign key
    	[StateId] [uniqueidentifier] NOT NULL, -- Foreign key
    	[CreatedDate] [datetime] NOT NULL,
     CONSTRAINT [PK_ProductState] PRIMARY KEY NONCLUSTERED 
    (
    	[ProductStateId] ASC
    )

This is a mapping table between two master table in which only foreign key columns are available and which are again GUIDs. 

Another sample table,

    CREATE TABLE [dbo].ProductCompare(
    	[ProductCompareId] [uniqueidentifier] NOT NULL,
    	[ProductRateId] [uniqueidentifier] NOT NULL,          -- Foreign key
    	[ProductId] [uniqueidentifier] NOT NULL,              -- Foreign key
    	[Program] [int] NOT NULL,
    	[MarkUpValue] [decimal](19, 10) NULL,
    	[IsSelected] [bit] NOT NULL,
    	[ComparisonValue] [varchar](max) NULL,
    	[ComparisonOrder] [int] NOT NULL,
    	[CreatedDate] [datetime] NOT NULL,
     CONSTRAINT [PK_ProductCompare] PRIMARY KEY NONCLUSTERED 
    (
    	[ProductCompareId] ASC
    )

This is child table which has reference of master table along with another column, but i don't see any column here which can work as Clustered Index key.

In my entire database there are many scenarios of this kind where i don't have any column which can work as Clustered Index key.

I need suggestion how should i manage Indexing for database of insurance company web application. 
                                

Heta Desai (21 rep)

Apr 3, 2020, 12:15 PM • Last activity: Apr 4, 2020, 09:22 PM

Showing page 1 of 20 total questions