Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes

1 answers

191 views

Using PostgreSQL to hold an array of data

I am trying to use a postgreSQL data base column to store values from an array (or point to an enum table which does this). I will get an array of data like `['one','two','five']`. The array can contain any amount of items from 0 to 10. I have an enum table with these values: ``` id value 1 one 2 tw...

I am trying to use a postgreSQL data base column to store values from an array (or point to an enum table which does this). I will get an array of data like ['one','two','five']. The array can contain any amount of items from 0 to 10. I have an enum table with these values:

id    value
1     one
2     two
3     three
4     four
5     five

I want my database cell to point to all of the values that are contained in the array. Is this even possible or is there a common way to structure database tables to fix this issue. So my desired solution when passed the array ['one','two','five'] would be to have a single cell in my database hold the enum pointers 1, 2, 5.

tdammon (115 rep)

Oct 6, 2020, 04:04 PM • Last activity: Jun 24, 2025, 12:06 AM

0 votes

1 answers

218 views

Using ENUM as partition key when RANGE Partitioning PostgreSQL

postgresql partitioning postgresql-14 enum

I am working with PostgreSQL 14. I have a table called `Order` with a status column: ```sql CREATE TABLE "Order" ( "orderId" BIG SERIAL PRIMARY KEY NOT NULL, "orderDescription" TEXT NOT NULL, "statusId" SMALLINT NOT NULL ) PARTITION BY RANGE ("statusId"); ``` As shown, this table is partitioned by `...

I am working with PostgreSQL 14. I have a table called Order with a status column:

CREATE TABLE "Order" (
    "orderId" BIG SERIAL PRIMARY KEY NOT NULL,
    "orderDescription" TEXT NOT NULL,
    "statusId" SMALLINT NOT NULL
) PARTITION BY RANGE ("statusId");

As shown, this table is partitioned by RANGE based on the statusId. We need to place orders that have moved to a closed status in a separate partition. To achieve this, I implemented a simple trick. I defined a table called OrderStatus and assigned open statuses in the range of 10 to 20 and closed statuses in the range of 20 to 30:

CREATE TABLE "OrderStatus" (
    "statusId" SMALLINT PRIMARY KEY NOT NULL,
    "statusName" VARCHAR UNIQUE NOT NULL
);

INSERT INTO "OrderStatus" ("statusId", "statusName") VALUES (11, 'WAITING');   -- open
INSERT INTO "OrderStatus" ("statusId", "statusName") VALUES (12, 'OPEN');      -- open
INSERT INTO "OrderStatus" ("statusId", "statusName") VALUES (13, 'CANCELING'); -- open
INSERT INTO "OrderStatus" ("statusId", "statusName") VALUES (14, 'SENDING');   -- open
INSERT INTO "OrderStatus" ("statusId", "statusName") VALUES (21, 'FINISHED');  -- close
INSERT INTO "OrderStatus" ("statusId", "statusName") VALUES (22, 'CANCELED');  -- close

Based on this, the partitions are defined as follows:

CREATE TABLE "Order_Open" PARTITION OF "Order" FOR VALUES FROM (10) TO (20);
CREATE TABLE "Order_Close" PARTITION OF "Order" FOR VALUES FROM (20) TO (30);

So far, everything works as expected. Now, we are undergoing a system refactor, and one of the proposed changes is to convert the order statuses to an enum type. However, if we do this, since the numeric values of enums (enumsortorder column in pg_enum table) are defined by PostgreSQL, it seems that the trick we used for categorizing open and closed statuses will no longer be applicable. Here are the potential solutions I have considered, each with its own explanation, but I'm not sure if they are the best approaches: 1. Add a new boolean column isOpen to indicate whether the order is open or closed, and partition the table based on this column. I feel that having an additional column for this information is not ideal. 2. Manipulate the enumsortorder values of the enum in the pg_enum table to retain our categorization trick (I haven't tried this yet to see if it's possible). This approach seems risky and not quite right to me. 3. Change the partitioning type to LIST. While this does not pose significant issues, if we need to add a new value to the enum in the future, we would have to change the table definitions, which is not appealing. 4. Abandon the idea of using enum altogether and steel use statusId of OrderStatus table. I feel that for cases where we need to partition a table based on a column, using enum might not be suitable, especially for RANGE partitioning. Maybe there's a best practice for this situation that I'm not aware of. If anyone can provide insights or advice on this, I would greatly appreciate it.

Mofarah (35 rep)

Jun 22, 2024, 01:06 PM • Last activity: Jun 20, 2025, 10:03 AM

0 votes

1 answers

59 views

MariaDB: change enum

mariadb enum

I have a MariaDB database. mariadb Ver 15.1 Distrib 10.11.6-MariaDB One of my tabes uses an enum field (generated by Java JPA, not manually). +-----------+-----------------------------------------------------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default...

                                  I have a MariaDB database.

mariadb  Ver 15.1 Distrib 10.11.6-MariaDB

One of my tabes uses an enum field (generated by Java JPA, not manually).

    +-----------+-----------------------------------------------------------------+------+-----+---------+----------------+
    | Field     | Type                                                            | Null | Key | Default | Extra          |
    +-----------+-----------------------------------------------------------------+------+-----+---------+----------------+
    | id        | bigint(20)                                                      | NO   | PRI | NULL    | auto_increment |         
    | name      | varchar(255)                                                    | YES  | UNI | NULL    |                |
    | type      | enum('A','B','G','H','Z')                                       | YES  |     | NULL    |                |
    | meta      | longtext                                                        | YES  |     | NULL    |                |
    +-----------+-----------------------------------------------------------------+------+-----+---------+----------------+

This field is used in records of the table.

    1000 | sam | A | abc
    1004 | max | Z | 000
    1019 | tim | H | a89

Now I need to add another field to the enum - lets say it is of value "C".

My question is about the changing of the enum and how it is used internally.
I need to keep the existing enum-records unharmed and working..

Can I simple change the enum to a new value of

'A', 'B', 'C', 'G', 'H', 'Z'

or do I need to keep the order (maybe it uses numbers internally and a reorder would crash old records?)?

'A', 'B', 'G', 'H', 'Z', 'C'

I would alter with

    ALTER TABLE
        table_name
    MODIFY COLUMN
        column_name enum(
            'enum1',
            'enum2',
            ... )
                                

chris01 (101 rep)

May 19, 2025, 09:38 AM • Last activity: May 19, 2025, 11:29 AM

4 votes

1 answers

5650 views

PostgreSQL Enum : Search on ENUM type

postgresql performance enum postgresql-performance

Defined an ENUM CREATE TYPE currency AS ENUM('GBP', 'EUR', 'USD'); Using this as a type in a table CREATE TABLE if not exists transaction( id BIGSERIAL NOT NULL PRIMARY KEY , amount NUMERIC(35,4) DEFAULT 0.0, transaction_currency currency NOT NULL ); Created an index on transaction_currency CREATE I...

                                  Defined an ENUM

    CREATE TYPE currency AS ENUM('GBP', 'EUR', 'USD');

Using this as a type in a table

    CREATE TABLE if not exists transaction(
      id BIGSERIAL NOT NULL PRIMARY KEY ,
      amount NUMERIC(35,4) DEFAULT 0.0,
      transaction_currency currency NOT NULL
    );

Created an index on transaction_currency

    CREATE INDEX transaction_currency ON transaction(transaction_currency);

Inserted records

    INSERT INTO transaction(transaction_currency) VALUES ('EUR'), ('USD'), ('USD');

Tried searching in different ways, one by typecasting and other plain string

    SELECT * 
    FROM transaction WHERE transaction_currency= 'USD'

This did a sequential scan on the table instead of index

    QUERY PLAN
    Seq Scan on transaction  (cost=0.00..1.61 rows=24 width=15) (actual time=0.721..0.916 rows=24 loops=1)   
      Filter: (transaction_currency = 'USD')     
      Rows Removed by Filter: 25 
    Planning time: 2.634 ms 
    Execution time: 1.164 ms



    SELECT * 
    FROM transaction WHERE transaction_currency= 'USD'::currency

This did an index scan

    QUERY PLAN
    Index Scan using transaction_currency on transaction  (cost=0.14..8.56 rows=24 width=15) (actual time=0.062..0.296 rows=25 loops=1)   
      Index Cond: (transaction_currency = 'USD'::currency)
    Planning time: 2.581 ms 
    Execution time: 0.614 ms

**Question**:

Do we need to explicitly typecast enumerated columns in order to benefit from the index? 

If yes, can someone help me understanding how this query work?
Can we instruct postgres to automatically typecast the filter to enumerated type? 

If not what alternatives we have to this selective performance?

                                

Nimit (41 rep)

Nov 30, 2018, 11:04 AM • Last activity: May 5, 2025, 04:04 AM

2 votes

1 answers

217 views

Storing enum as smallint or string, performance impact

postgresql enum

I have a bunch of properties (columns in db) that should have a string value from a list of predefined values, for example `active`, `inactive`, `pending`. Outside the database, which is javascript (deno+react in front-end) I really want to have these as string enums, and not a integers, because it...

                                  I have a bunch of properties (columns in db) that should have a string value from a list of predefined values, for example active, inactive, pending.

Outside the database, which is javascript (deno+react in front-end) I really want to have these as string enums, and not a integers, because it adds some clarity to the code and actually reduces the amount of code you need to write.

In the database I have two options, either I store them as they are (strings), or as smallint then convert them to/from strings during the database select/insert.

Problem with the second option is that it requires additional transformation in the code, and when using typescript, this transformation is not so straightforward as doing a simple mapping, because ts doesn't let you re-assign different types to existing props, so you need to copy and do more stuff...

Anyway, I just wanted to ask, how much is the postgres performance and/or size overhead if I store them directly as strings in the db too, to eliminate transformation step? I imagine that each record would grow by around 20 bytes if I change from smallint to text and that would have some kind effect on performance? Also these columns have to be indexed, and that adds another 20 bytes for each record

Alex (181 rep)

Apr 6, 2025, 08:45 AM • Last activity: Apr 6, 2025, 09:19 AM

4 votes

2 answers

465 views

How is enum stored in MariaDB

mariadb enum

I have created a database where one of the fields of a table is defined as `enum('M', 'B')`. The tool that I am using, converts it to a **variant** type. I would like to override it and treat it like a character. So the question is, is it stored as a character in the database?

                                  I have created a database where one of the fields of a table is defined as enum('M', 'B').  The tool that I am using, converts it to a **variant** type.  I would like to override it and treat it like a character.  So the question is, is it stored as a character in the database?
                                

Rohit Gupta (2126 rep)

Jan 21, 2025, 01:22 PM • Last activity: Jan 23, 2025, 01:17 PM

2 votes

3 answers

8043 views

Add ENum Attribute to existing table using Alter

oracle alter-table enum

I am using oracle XE and trying to add a attribute that restricts to only three available value 'Small','Medium','large' to an existing table using Alter Table and Enum. Tried doing, ALTER TABLE TRUCK ADD TTYPE ENUM('SMALL','MEDIUM','LARGE'); but gets invalid option ALTER TABLE TRUCK ADD TTYPE ENUM(...

                                  I am using oracle XE and trying to add a attribute that restricts to only three  available value 'Small','Medium','large' to an existing table using Alter Table and Enum.

Tried doing,

    ALTER TABLE TRUCK ADD TTYPE ENUM('SMALL','MEDIUM','LARGE');

but gets invalid option 

    ALTER TABLE TRUCK ADD TTYPE ENUM('SMALL','MEDIUM','LARGE');
                                     *
where the error highlights after ENUM.

I think I am having syntax error. Please help to resolve.

nubbear (23 rep)

Nov 3, 2014, 06:24 PM • Last activity: Sep 2, 2024, 04:30 PM

0 votes

1 answers

48 views

Lookup tables: check by type id or type name?

trigger constraint psql enum

TLDR: Should I check an entity's type by lookup_table.id or lookup_table.type_name? I've learned that lookup tables are almost always preferrable to Enums due to their flexibility. However, I have some doubts about how to handle constraints on the "type" of an entity. let's say I have the following relations: - Employee - Employee_type - Project Every Project has a leader field referring to Employee's id column, and the following constraint: > Each project leader has to be an Employee of type "manager" Which can only be enforced using a trigger. If Employee_type was an enum field I could join Project and Employee on the leader's id, then check that type = 'manager'. But I'm using a lookup table, something like:

id | type_name
-----------------------
1  | regular employee
2  | Manager
3  | Technician

so, when checking for the type I can either check by id: Employee_type.id = 2 --2 stands for 'Manager'! - (-)rather obscure unless you specify the type's name in a comment - (-)relies on fact that id never changes - (+)allows to edit the type's display name without breaking code The alternative is checking by type_name: Employee_type.type_name = 'Manager' - (+)more human readable - (-)can't edit the type's name without breaking code This same problem also pops up when making queries, compare: SELECT name FROM Employee WHERE type = 2 --selects managers with

--Join can be hidden with a view
SELECT * FROM Employee JOIN Employee_type WHERE type_name = 'Manager'

My take from all of this is that the flexibility granted by lookup tables comes with the risk of the lookup's structure changing, invalidating code relying on it all the way up to application level. Is there a way to prevent this? Notes: 1. for the sake of simplicity, let's assume that the type of an employee always stays the same 2. I am aware that employee type could be modeled using a Manager table with a reference to the corresponding Employee row. The problem with this approach is that it doesn't scale well when there is a high number of subclasses, each with different relationships and little to no unique attributes

Afelium (1 rep)

Apr 28, 2024, 11:57 AM • Last activity: Apr 29, 2024, 06:24 AM

0 votes

1 answers

670 views

Convert PostgreSQL array type

postgresql alter-table array enum

Given an enum type (e.g. `letter`). What is the syntax to convert an existing array to the enum type? The existing values are valid for the enum. For example, how would I convert `letter_data` in: create table sample ( letter_data text[]; ); To the `letter` enum?

                                  Given an enum type (e.g. letter).

What is the syntax to convert an existing array to the enum type? The existing values are valid for the enum. For example, how would I convert letter_data in:

    create table sample (
     letter_data text[];
    );

To the letter enum?
                                

gerardw (115 rep)

Aug 29, 2023, 06:16 PM • Last activity: Aug 30, 2023, 12:06 AM

0 votes

1 answers

1145 views

Making use of ENUM in CONSTRAINT

postgresql constraint postgis enum

I am creating a PostGIS database in which a lot of schemas should have the same constraints. Meaning in the future, similar tables should have the same limitations of values entered. My constraint would be this ``` ALTER TABLE "Test project"."rainbow table" -- the table I want to change ADD CONSTRAI...

ALTER TABLE "Test project"."rainbow table"   -- the table I want to change
ADD CONSTRAINT values_constraint [the name of the constraint]
CHECK  (colors IN ('red', 'green', 'blue'))  -- limiting 3 values in colors

So I thought I will do this making use of enums, which I can define and alter (by re-creating after dropping if something changes) for the whole database. HOWEVER, I am struggling to call the enum values in the constraint defintion, which should be a check I believe

ALTER TABLE "Test project"."rainbow table"  -- the table I want to change
ADD CONSTRAINT values_constraint [the name of the constraint]
   CHECK  (colors IN (/*HOW TO REFERENCE THE ENUM ARRAY HERE?*/))

KoGIS (3 rep)

Jul 13, 2023, 12:53 PM • Last activity: Jul 24, 2023, 11:00 AM

2 votes

1 answers

3157 views

PostgreSQL Enum type. Pros and cons

postgresql enum

PostgreSQL has an [Enum data type][1]. What are the pros and cons of using Enums? I found the [advantages for MySQL][2] but don't know which apply for PostgreSQL and to what extend: > Readable queries and output. [1]: https://www.postgresql.org/docs/current/datatype-enum.html [2]: https://dev.mysql....

                                  PostgreSQL has an Enum data type . What are the pros and cons of using Enums? 
I found the advantages for MySQL  but don't know which apply for PostgreSQL and to what extend:

> Readable queries and output.

PythonForEver (133 rep)

Apr 4, 2023, 08:03 AM • Last activity: Apr 4, 2023, 02:25 PM

10 votes

2 answers

11538 views

ORDER BY gives ERROR: function array_position(text[], character varying) does not exist

postgresql order-by array enum

I have a pretty basic categorical column in a Postgres database that is currently stored as VARCHAR. I can select a count of each with: I though adding an `ORDER BY array_position()` would do it: SELECT color, count(*) FROM research GROUP BY color ORDER BY array_position(ARRAY['Red','Orange','Yellow...

                                  I have a pretty basic categorical column in a Postgres database that is currently stored as VARCHAR. I can select a count of each with:

I though adding an ORDER BY array_position() would do it:

    SELECT color, count(*)
    FROM research
    GROUP BY color 
    ORDER BY array_position(ARRAY['Red','Orange','Yellow','Green','Blue'], color);

But I'm seeing a type error:

    ERROR: function array_position(text[], character varying) does not exist
    SQL state: 42883
    Hint: No function matches the given name and argument types. You might need to add explicit type casts.
    Character: 98

What do I need to cast color as so I can use the array_position function to order it? 
                                

Amanda (305 rep)

Mar 21, 2018, 01:13 AM • Last activity: Mar 9, 2023, 02:22 PM

1 votes

3 answers

1431 views

Unique constraint on 3 different columns but a specific value in 3rd column allows duplicate row entries

mysql index users unique-constraint enum

I have a 'Users' table with columns `user_email`, `user_company_id` and `user_status`. The `user_status` column is an enum with values '1' or '0' which represents the users being either *active* or *inactive*. Is there a way to apply a unique constraint to these 3 columns such that it only allows **...

                                  I have a 'Users' table with columns user_email, user_company_id and user_status. The user_status column is an enum with values '1' or '0' which represents the users being either *active* or *inactive*. Is there a way to apply a unique constraint to these 3 columns such that it only allows **one** *unique, active* user email for a specific company but any number of duplicate entires for *inactive* emails?

E.g.: Consider a 'Users' table with the following entries

    CREATE TABLE users(
      user_id BIGINT(10) PRIMARY KEY AUTO_INCREMENT, 
      user_email VARCHAR(255) NOT NULL, 
      user_companyid BIGINT(10) NOT NULL, 
      user_status enum('1', '0'))

    INSERT INTO users(user_id, user_email, user_companyid, user_status) 
      VALUES (1,'test1@gmail.com','555','1');
    INSERT INTO users(user_id, user_email, user_companyid, user_status) 
      VALUES (2,'test2@gmail.com','555','1');
    INSERT INTO users(user_id, user_email, user_companyid, user_status) 
      VALUES (3,'test1@gmail.com','777','1');

    SELECT * FROM users;
 

    user_id | user_email      | user_companyid | user_status
    ------: | :-------------- | -------------: | :----------
          1 | test1@gmail.com |            555 | 1          
          2 | test2@gmail.com |            555 | 1          
          3 | test1@gmail.com |            777 | 1           

I shouldn't be able to add an existing, active email for a specfic company twice; the following should fail:

    INSERT INTO users(user_id, user_email, user_companyid, user_status) 
     VALUES (4,'test1@gmail.com','555','1'); 


If I update the status of one of the active users to '0' (inactive), I should be able to insert the same email again since the previous email status is inactive. The following should succeed:

    UPDATE users SET user_status = '0' WHERE user_id = 1;

    INSERT INTO users(user_id, user_email, user_companyid, user_status) 
      VALUES (4,'test1@gmail.com','555','1');

    user_id | user_email      | user_companyid | user_status
    ------: | :-------------- | -------------: | :----------
          1 | test1@gmail.com |            555 | 0          
          2 | test2@gmail.com |            555 | 1          
          3 | test1@gmail.com |            777 | 1          
          4 | test1@gmail.com |            555 | 1          


Also, the constraint should allow duplicate entries for inactive user emails. This should also succeed:

    UPDATE users SET user_status = '0' WHERE user_id = 4;
    
    SELECT * FROM users;
    
    user_id | user_email      | user_companyid | user_status
    ------: | :-------------- | -------------: | :----------
          1 | test1@gmail.com |            555 | 0          
          2 | test2@gmail.com |            555 | 1          
          3 | test1@gmail.com |            777 | 1          
          4 | test1@gmail.com |            555 | 0

                                

Jay (13 rep)

Apr 11, 2020, 04:05 PM • Last activity: Jan 12, 2023, 01:13 PM

0 votes

1 answers

1373 views

Why can't postgres index a text casted enum

postgresql index enum

I'm trying to add a default value to an enum column so a unique constraint will only allow one null like so: ``` CREATE UNIQUE INDEX "some_index" on some_table (coalesce(some_enum_column::text, 'some_default_value')); ``` Basically, I'm trying to replace Postgres-15's new `NULLS NOT DISTINCT` featur...

I'm trying to add a default value to an enum column so a unique constraint will only allow one null like so:

CREATE UNIQUE INDEX "some_index" on some_table (coalesce(some_enum_column::text, 'some_default_value'));

Basically, I'm trying to replace Postgres-15's new NULLS NOT DISTINCT feature in Postgres-14.5 And I'm getting this error:

-none
functions in index expression must be marked IMMUTABLE

From reading online and from this [answer](https://dba.stackexchange.com/a/276512) , I understand that this cast is problematic because the enum values can change, making this cast mutable. But if this is the case, how does Postgres index an enum column normally? What is the difference between the enum and its string representation in this case? Does it have some trigger to reindex on enum change? I could use a partial index in this example but in reality, my index includes multiple columns, some nullable and some not. I'm writing an extension to an ORM, so I need to be able to support an arbitrary amount of nullable columns.

CY-OD (109 rep)

Nov 15, 2022, 08:38 AM • Last activity: Nov 16, 2022, 09:29 AM

0 votes

1 answers

236 views

PostgreSQL: Return ENUM values like data_type modifiers

postgresql datatypes array enum

When querying the datatype using format_type(tttypid, atttypmod) from pg_attribute, I would like to return the ENUM values like atttypmod values. attname | type --------+------------------------------- a | numeric(12,3) b | timestamp(0) without time zone enum | color('Red', 'Green', 'Blue') Currentl...

                                  When querying the datatype using format_type(tttypid, atttypmod) from pg_attribute, I would like to return the ENUM values like atttypmod values. 

    attname | type
    --------+-------------------------------
    a       | numeric(12,3)
    b       | timestamp(0) without time zone
    enum    | color('Red', 'Green', 'Blue')

Currently, the format_type() call only returns the enum name without values.  I believe I need to CASE my select statement against the pg_type and pg_enum tables, but am having trouble with the proper syntax. My attempt is below.  Thank you!

    SELECT 
    	attname,
    	CASE
    		WHEN format_type(atttypid, atttypmod)::text = ANY(select ARRAY(select t.typname::text from pg_type t join pg_enum e on t.oid = e.enumtypid group by t.typname))
    			THEN format_type(atttypid, atttypmod) || '(' || array_to_string(enum_range(NULL::format_type(atttypid, atttypmod)), ',') || ')'
    		ELSE format_type(atttypid, atttypmod)
    		END AS type
    FROM   pg_attribute
    WHERE  attrelid = 'myTable'::regclass
    AND    attnum > 0
    AND    NOT attisdropped
    ORDER  BY attnum;
                                

PythonScrub (3 rep)

Nov 11, 2022, 04:55 PM • Last activity: Nov 12, 2022, 12:30 PM

0 votes

3 answers

1262 views

Mysql: Any way to properly index 3 ENUM columns with the same options ? (A OR B OR C)

mysql index enum

I have 3 enum() columns with the same option values inside. I first tried to use the "set" datatype which originally was meant to do that (hold multiple values from a set) but it seems that datatype isn't managed for 15+ years and isn't even supporting an index. Is there a nice way to index those 3...

                                  I have 3 enum() columns with the same option values inside.  
I first tried to use the "set" datatype which originally was meant to do that (hold multiple values from a set) but it seems that datatype isn't managed for 15+ years and isn't even supporting an index.  
Is there a nice way to index those 3 columns so I can use them in searched without destroying query performance ?  

SELECT * FROM TABLE WHERE a='x' OR b='x' OR c='x'  

I thought about creating a virtual field which uses a boolean logic (&) on the 3 enum field-numbers and combines them into a large number but that's quite a hack and not nice to maintain.  

Has someone solved this sort of task elegantly ?
(I do not want to use a support table and JOIN it, I want to stay with a single table)
                                

John (402 rep)

Dec 13, 2020, 12:28 AM • Last activity: Sep 4, 2022, 04:03 AM

2 votes

1 answers

1049 views

How to get the "best of both worlds" with natural and surrogate keys? And could DBMSs be better?

foreign-key primary-key enum surrogate-key natural-key

I am designing my first database, and I find myself frustrated by the choice between storing an integer or a string for each instance of a categorical variable. My understanding is that if I have a table containing cities that I want to make a child of a table of countries, the most performant way t...

                                  I am designing my first database, and I find myself frustrated by the choice between storing an integer or a string for each instance of a categorical variable. 

My understanding is that if I have a table containing cities that I want to make a child of a table of countries, the most performant way to do that is to have the PK of the countries table as a FK in in the table of cities. However for ease of use and debugging, it's nice to always have the string name associated with the country PK. Every solution I have considered either is not recommended or seems overly complex.

I'd like opinions the merits of these approaches (or hear about new ones) and also to understand if it has to be this way or if databases simply are this way because of tradition.

Possible approaches:

1. Use a string as a PK for countries. Then I will have a human-readable FK for it in any child tables. Obviously less performant than using integers, but I suspect it may be the least worst way to have the convenience I desire.

2. Create a view using application logic that join each the string name of the country to the states table. 
- I don't love this because if the application logic breaks, the tables become less readable. Also I would expect large join operations to have an even worse performance penalty than string PK/FKs.

3. Create a separate table to connect numeric IDs with the appropriate string ID. I'm not sure if it would be better to have a table coding each type of relation, or one big table with one big pool of IDs that cover all integer key-string value relations. I could then use application logic to look up the appropriate strings and fill the appropriate PK into the child table when it's string name is given by a user. 
- I feel like this might be pretty resource intensive too, as there would have to be a lookup every time a new row was added to the child. It also means that I would still have to create the views I want.

4. Use enum data type. Instinctively, this would be my go-to approach, as it seems the ideal balance between natural and synthetic keys: Use integer IDs and give the IDs a string label so that the string itself need not be repeated. 
- Unfortunately my research has found that this is not recommended. One reason for that is that categories cannot be deleted easily. I'm not sure if that is  dealbreaker for me, but I also wonder why DBMSs are designed this way. Aren't categorical variables commonly used enough to add convenience features for them?

Stonecraft (125 rep)

Jul 24, 2022, 06:43 PM • Last activity: Jul 25, 2022, 12:29 PM

12 votes

2 answers

7298 views

Are enums still evil (in MySQL 8.0)?

mysql mysql-8.0 enum

So I have a disagreement with a coworker about using enums vs lookup tables. [This very old article from 2011 and MyQL5.5](https://komlenic.com/244/8-reasons-why-mysqls-enum-data-type-is-evil/) still gets referenced again and again by those not liking enums. But we are using MySQL8.0, on databases t...

                                  So I have a disagreement with a coworker about using enums vs lookup tables.

[This very old article from 2011 and MyQL5.5](https://komlenic.com/244/8-reasons-why-mysqls-enum-data-type-is-evil/)  still gets referenced again and again by those not liking enums.

But we are using MySQL8.0, on databases that are not that big, like 50 to 60 tables, with 30ish enums columns.  
A few tables have up to a couple millions rows, most are way below 500K rows.  
Almost all enums have fewer than 10 values, and only very few have changed (once) in their life, always to add a new value at the end of the possible values.  
And we never "suffered" from any disadvantages referenced by the article like storing metadata alongside it, referencing the same values from other tables, etc...

Despite what the article says at point 2., today, adding a values at the end of the list doesn't update the whole table, but removing a value always update (or at least read) all rows.

I do understand there are two schools of though and I understand the advantages and disadvantages of both.

But is there, today on MySQL8, any actual/factual/technological reasons so **switch away from enums **for the existing columns** and justify all the work in both the app and db required to do so ?

Thanks for reading and your answers.

Florent Poujol (223 rep)

May 17, 2022, 09:16 PM • Last activity: May 18, 2022, 03:35 PM

1 votes

1 answers

1065 views

iterate through Postgres enum values?

postgresql enum

I have a Postgres enumerated type used to describe the node class (order sensitive) of a source/destination relationship table. Question is, how can I dynamically fetch the next value (if any) of an enumeration instance? ```sql CREATE TYPE MyEnum AS ENUM ('level1', 'level2', 'level3'); SELECT dst_ty...

CREATE TYPE MyEnum AS ENUM ('level1', 'level2', 'level3');

SELECT dst_type, dst_id
FROM my_relationships
WHERE src_type::MyEnum = next(dst_type::MyEnum)      -- does not work
-- WHERE src_type::MyEnum = dst_type::MyEnum + 1     -- does not work
AND src_id = 'd4fc1d5d-b054-d37c-4e80-2c04a6b3dbf8';

eliangius (155 rep)

Feb 11, 2022, 02:26 AM • Last activity: Feb 11, 2022, 06:44 AM

0 votes

1 answers

302 views

Does the collation of ENUM columns impact their size?

mysql mariadb mysql-8.0 enum mariadb-10.5

What does it mean when we say that an `ENUM` has a charset of `utf8mb4`, versus `latin1`? My understanding is that the `ENUM` holds a numeric index (1, 2, 3, 4...) that identifies which of the `ENUM` values (abc, def, ghi, jkl...). If I use `utf8mb4` for an `ENUM` column, is the column using more st...

                                  What does it mean when we say that an ENUM has a charset of utf8mb4, versus latin1?

My understanding is that the ENUM holds a numeric index (1, 2, 3, 4...) that identifies which of the ENUM values (abc, def, ghi, jkl...).  
If I use utf8mb4 for an ENUM column, is the column using more storage than it would if I used latin1?

Nuno (829 rep)

Feb 13, 2021, 01:06 AM • Last activity: Feb 13, 2021, 02:59 AM

Showing page 1 of 20 total questions