I have the IMDb database in PostgreSQL 16, and I am trying to generate random strings for performance benchmark. The relevant table is
title_basics
with the primary key column tconst
, VARCHAR(9). tconst
has values in the form of 'tt0000000', where the zeroes represent numbers, e.g., 'tt0000001' and 'tt9999999'.
I want to write an .sql file which can be run with pgbench, which generates random, padded numbers appended after the string 'tt', e.g.:
SELECT * FROM title_basics WHERE tconst = 'tt0000001';
With simple SQL, this would be trivial, but since pgbench supports only a small number of scalar functions, it seems difficult. I have also explored the use of SQL in the pgbench script, but constructs such as this do not seem to be supported:
\set random_number (SELECT random());
or:
\set random_number :randint(0, 9999999)
\set padding_length 7 - length(:'random_number')
\set primary_key 'tt' || repeat('0', :'padding_length') || :'random_number'
I also considered generating the numbers with SQL, but this effectively prevents PostgreSQL from using the index on tconst. I do not want to create an additional index to satisfy the query, as that would be problematic for other reasons.
SELECT *
FROM title_basics
WHERE tconst = 'tt' || LPAD((FLOOR(RANDOM()*(9916880 - 1 + 1)) + 1)::text, 7, '0');
Edit:
I found a somewhat functional solution of generating the random value in a CTE. This does not prevent PostgreSQL from using the index on tconst. However, I would be happy to see a more pgbench-oriented approach.
WITH a AS (
SELECT 'tt' || LPAD((FLOOR(RANDOM()*(9916880 - 1 + 1)) + 1)::text, 7, '0') AS tconst_random
)
SELECT *
FROM title_basics, a
WHERE tconst = a.tconst_random;
Asked by Fuiba
(5 rep)
May 16, 2023, 09:13 AM
Last activity: May 17, 2023, 07:20 AM
Last activity: May 17, 2023, 07:20 AM