Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

1 votes

2 answers

163 views

Explain plan will sort the result after join even the column included in index

I am using SQL Server 2022 Developer Trying get all AccessLog that classified to type 1. ``` SELECT [t].[Time], [u].[UserName], [t].[Type], [t].[Message] FROM [AccessLog] AS [t] LEFT JOIN [AppUser] AS [u] ON [t].[UserId] = [u].[Id] WHERE EXISTS (SELECT 1 FROM [LogCatalog] AS [c] WHERE [c].[Type] = 1...

I am using SQL Server 2022 Developer Trying get all AccessLog that classified to type 1.

SELECT [t].[Time], [u].[UserName], [t].[Type], [t].[Message]
FROM [AccessLog] AS [t]
         LEFT JOIN [AppUser] AS [u] ON [t].[UserId] = [u].[Id]
WHERE EXISTS (SELECT 1
              FROM [LogCatalog] AS [c]
              WHERE [c].[Type] = 1
                AND [c].[Name] = [t].[Type])
ORDER BY [t].[Time] DESC

For 1M record, it will need ~90s to execute on my computer. Most cost is on sort operate. I already have index on AccessLog.Time DESC, but the plan will sort again still after join. https://www.brentozar.com/pastetheplan/?id=HyXzc9UUp I have Index on AccessLog: 1. PK [Id] 2. IX [Time] DESC 3. IX [Time] DESC, [Type] ASC 4. IX [Type] ASC, [Time] DESC 5. IX [Type] ASC 6. IX [UserId] ASC 7. IX [Time] DESC, [UserId] ASC, [Type] ASC The query filter by [Type] and order by [Time], why the plan can not use the [Time],[Type] index but need to sort again?

Uni (11 rep)

Dec 12, 2023, 07:16 AM • Last activity: Jul 10, 2025, 11:06 PM

0 votes

1 answers

172 views

recreate row IDs in sorted table

mysql sorting

I built `word2vec` dictionary as a mysql table. There is millions of rows like this: ID CREATED UPDATED KEYWORD COUNT NUMPYBLOB NUMPYBLOB ... ... 121201 2018-03-08 04:01:23 2018-03-08 21:54:34 iulius 32 121202 2018-03-08 04:01:23 0000-00-00 00:00:00 admonitivo 1 121203 2018-03-08 04:01:23 0000-00-00...

                                  I built word2vec dictionary as a mysql table. There is millions of rows like this:

    ID      CREATED                 UPDATED                 KEYWORD     COUNT 	NUMPYBLOB   NUMPYBLOB ... ...
    121201 	2018-03-08 04:01:23 	2018-03-08 21:54:34 	iulius 	       32 	
    121202 	2018-03-08 04:01:23 	0000-00-00 00:00:00 	admonitivo 	    1 	
    121203 	2018-03-08 04:01:23 	0000-00-00 00:00:00 	lulius 	        1 	
    121204 	2018-03-08 04:01:23 	2018-03-08 16:56:14 	lugh 	       20 	
    121205 	2018-03-08 04:01:23 	2018-03-08 22:34:31 	contingencias  66 	
    121206 	2018-03-08 04:01:23 	2018-03-08 19:50:34 	liberándola    12 	
    121207 	2018-03-08 04:01:23 	0000-00-00 00:00:00 	agostillo 	    1 	
    121208 	2018-03-08 04:01:23 	2018-03-08 22:47:39 	autonomista   212 	
    121209 	2018-03-08 04:01:23 	2018-03-08 05:54:12 	astrológicamente4
    121210 	2018-03-08 04:01:23 	2018-03-08 18:54:43 	redondearse 	5


Now I want to sort this table by word occurrence counter (COUNT) and when the table is sorted rewrite autoincremental ID row from zero to X in order to word ocurrency goes down. I can't use COUNT column as primary key, because it is varies continuously and not unique. 
How I can do it?


                                

Dmitry (1 rep)

Mar 9, 2018, 02:43 AM • Last activity: Jul 4, 2025, 03:03 AM

0 votes

1 answers

268 views

Keep one row per ID while keeping previous order (from ORDER BY)

mysql group-by sorting

I'm actually working on a big query and stuck at the very end. I'm working on MySQL 5.7. ID // SubID // Criteria 1 // Criteria 2 // ... // Critera n 1 // 2 // x // y // ... // z 1 // 3 // x // y // ... // z 1 // 1 // x // y // ... // z 2 // 1 // x // y // ... // z 3 // 1 // x // y // ... // z I woul...

                                  I'm actually working on a big query and stuck at the very end.
I'm working on MySQL 5.7.

    ID // SubID // Criteria 1 // Criteria 2 // ... // Critera n
    1 //   2   //    x        //    y      //  ... //   z
    1 //   3   //    x        //    y      //  ... //   z
    1 //   1   //    x        //    y      //  ... //   z
    2 //   1   //    x        //    y      //  ... //   z
    3 //   1   //    x        //    y      //  ... //   z

I would need for each ID the best (ID, SubID) pair, based on the existing order which is based on sorting on criteria 1 to n.

I had a look on several solutions but:

 - GROUP BY picks randomly any (ID, SubID) pair and doesn't take into account the existing order
 - GROUP BY + ORDER BY is not sufficient as I need to sort on complex criteria
 - I can't recover the subID info if I use DISTINCT

I have the feeling that the solution is fairly simple but I'm really stuck and I can't find similar problems/solutions on the Internet (maybe I just need some sleep ;) )

Just to make it clear: I need to keep 1 row per ID but in the same order as before, and to return at least ID // SubID (but it would be perfect if it could return all the criteria as well).

Thank you in advance for your help guys!

**EDIT**

Ok so to add a little bit of context here, my criteria are either boolean or float values that I computed in a subquery.

It looks like:

    SELECT * FROM
    (SELECT *,
    CASE(formula) THEN 1 ELSE 0 as criteria_1,
    CASE(formula) THEN 1 ELSE 0 as criteria_2,
    ABS(formula), as criteria_3,
    SQRT(formula) as criteria_4
    FROM subquery
    WHERE criteria_1 = 0
    ORDER BY criteria_2 , criteria_3 DESC, criteria_4, value_z)
    "Remove (ID,SubID) duplicates while keeping the previous order"

I want to replace the last sentence (the one between quotes) by something, but GROUP BY ID, SubID just randomly picks any (ID, SubID) value.
                                

SaintMark (9 rep)

Jun 20, 2016, 01:26 AM • Last activity: May 19, 2025, 11:04 PM

0 votes

1 answers

269 views

mysql, select and sorting by group key + with first row that has that key in another column

mysql sorting

I'm trying to make a select that outputs everything, but is sorted by a group column with a "header column" that matches said group column on another column. I have no idea how this type of query should be written. For example, I want the data sorted like this. id, unit, group, col2, col3, col4, col...

                                  I'm trying to make a select that outputs everything, but is sorted by a group column with a "header column" that matches said group column on another column. 

I have no idea how this type of query should be written.  For example, I want the data sorted like this. 

    id,  unit,  group, col2,  col3,  col4,  col5
    ---------------------------------------------
    99,  ZZX,   null,  test1, test7, test5, test
    6,   AAA-B, ZZZ,   test1, test7, test5, test8
    77,  AAA-C, ZZX,   test1, test7, test5, test9
    101, ZZZ,   null,  test1, test7, test5, test
    122, AAB-A, ZZZ,   test1, test7, test5, test11
    176, AAB-B, ZZZ,   test1, test7, test5, test12

So basically, the unit should be sorted per each group key, but begins with the row that matches the group key. There are also rows with null for group, but are just header rows. You can even just tell me the high level, and I can write the query, I'm just not sure the best way to approach it.

J. M. Becker (129 rep)

Feb 25, 2020, 06:00 PM • Last activity: May 19, 2025, 09:07 PM

11 votes

2 answers

11153 views

ORDER BY and comparison of mixed strings of letters and numbers

sql-server sql-server-2012 collation sorting natural-sort

We need to do some reporting on values that are usually mixed strings of numbers and letters that need to be sorted 'naturally'. Things like, e.g. "P7B18" or "P12B3". @The strings will mostly be sequences of letters then numbers alternating. The number of these segments and the length of each could...

                                  We need to do some reporting on values that are usually mixed strings of numbers and letters that need to be sorted 'naturally'. Things like, e.g. "P7B18" or "P12B3". @The strings will mostly be sequences of letters then numbers alternating. The number of these segments and the length of each could vary, though.

We'd like the numeric portions of these to be sorted in numeric order. Obviously, if I just handle those string values directly with ORDER BY, then "P12B3" is going to come before "P7B18", since "P1" is earlier than "P7", but I'd like the reverse, as "P7" naturally precedes "P12".

I'd also like to be able to do range comparisons, e.g. @bin < 'P13S6' or some such. I don't have to handle floating point or negative numbers; these will strictly be non-negative integers that we're dealing with. String lengths and number of segments could potentially be arbitrary, with no fixed upper bounds.

In our case, string casing isn't important, though if there's a way to do this in a collation-aware fashion, others might find that useful. The ugliest part of all this is I'd like to be able to do both ordering, and range filtering in the WHERE clause.

If I were doing this in C#, it would be a pretty simple task: do some parsing to separate the alpha from the numeric, implement IComparable, and you're basically done. SQL Server, of course, doesn't appear to offer any similar functionality, at least as far as I'm aware.

Anybody know any good tricks to make this work? Is there some little-publicized ability to create custom CLR types that implement IComparable and have this behave as expected? I'm also not opposed to Stupid XML Tricks (see also: list concatenation), and I've got CLR regex matching/extracting/replacement wrapper functions available on the server as well.

**EDIT:**
As a slightly more detailed example, I'd want the data to behave something like this.

    SELECT bin FROM bins ORDER BY bin

    bin
    --------------------
    M7R16L
    P8RF6JJ
    P16B5
    PR7S19
    PR7S19L
    S2F3
    S12F0

i.e. break the strings into tokens of all letters or all numbers, and sort them either alphabetically or numerically respectively, with the leftmost tokens being the most significant sorting term. Like I mentioned, piece of cake in .NET if you implement IComparable, but I don't know how (or if) you can do that sort of thing in SQL Server. It's certainly not something I've ever come across in 10 or so years of working with it.
                                

db2 (9708 rep)

Jan 20, 2016, 02:28 PM • Last activity: May 17, 2025, 04:01 AM

0 votes

1 answers

335 views

MongoDB huge amount of temp files

mongodb sorting

I have a MongoDB replica set. The primary one creates a huge amount of temp files with the name **extsort-sort-executor** in every seconds, which generates more than 80Gb of temp files under 10 minutes... If I set the **allowDiskUseByDefault** to **false** I get the following error: *Plan executor e...

"stats": {
    "works": 88450,
    "restoreState": 88,
    "saveState": 88,
    "advanced": 0,
    "spilledDataStorageSize": 0,
    "nReturned": 0,
    "inputStage": {
      "works": 88450,
      "restoreState": 88,
      "stage": "COLLSCAN",
      "saveState": 88,
      "advanced": 88450,
      "nReturned": 88450,
      "needYield": 0,
      "docsExamined": 88450,
      "needTime": 0,
      "isEOF": 0,
      "direction": "forward"
    },
    "failed": true,
    "usedDisk": false,
    "type": "simple",
    "isEOF": 0,
    "memLimit": 104857600,
    "stage": "SORT",
    "sortPattern": {
      "-$natural": 1
    },
    "needYield": 0,
    "needTime": 88449,
    "totalDataSizeSorted": 0,
    "spills": 0
  },
  "cmd": {
    "filter": {},
    "lsid": {
      "id": {
        "$uuid": "2617b402-0c61-4035-bc39-d616e412fbb3"
      }
    },
    "$readPreference": {
      "mode": "primary"
    },
    "$db": "local",
    "$clusterTime": {
      "clusterTime": {
        "$timestamp": {
          "t": 1704123123,
          "i": 1
        }
      },
      "signature": {
        "keyId": 0,
        "hash": {
          "$binary": {
            "base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=",
            "subType": "0"
          }
        }
      }
    },
    "find": "oplog.rs",
    "sort": {
      "-$natural": 1
    }
  },
  "error": {
    "code": 292,
    "codeName": "QueryExceededMemoryLimitNoDiskUseAllowed",
    "errmsg": "Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting."
  }
}

Can someone point me to the right direction where can I found the source of issue?

rs.status()

    {
      set: 'hz0',
      date: 2024-01-01T20:30:05.973Z,
      myState: 1,
      term: Long('12'),
      syncSourceHost: '',
      syncSourceId: -1,
      heartbeatIntervalMillis: Long('2000'),
      majorityVoteCount: 3,
      writeMajorityCount: 3,
      votingMembersCount: 4,
      writableVotingMembersCount: 4,
      optimes: {
        lastCommittedOpTime: { ts: Timestamp({ t: 1704141003, i: 1 }), t: Long('12') },
        lastCommittedWallTime: 2024-01-01T20:30:03.989Z,
        readConcernMajorityOpTime: { ts: Timestamp({ t: 1704141003, i: 1 }), t: Long('12') },
        appliedOpTime: { ts: Timestamp({ t: 1704141003, i: 1 }), t: Long('12') },
        durableOpTime: { ts: Timestamp({ t: 1704141003, i: 1 }), t: Long('12') },
        lastAppliedWallTime: 2024-01-01T20:30:03.989Z,
        lastDurableWallTime: 2024-01-01T20:30:03.989Z
      },
      lastStableRecoveryTimestamp: Timestamp({ t: 1704140953, i: 1 }),
      electionCandidateMetrics: {
        lastElectionReason: 'priorityTakeover',
        lastElectionDate: 2024-01-01T12:39:12.915Z,
        electionTerm: Long('12'),
        lastCommittedOpTimeAtElection: { ts: Timestamp({ t: 1704112747, i: 1 }), t: Long('11') },
        lastSeenOpTimeAtElection: { ts: Timestamp({ t: 1704112747, i: 1 }), t: Long('11') },
        numVotesNeeded: 3,
        priorityAtElection: 15,
        electionTimeoutMillis: Long('10000'),
        priorPrimaryMemberId: 1,
        numCatchUpOps: Long('0'),
        newTermStartDate: 2024-01-01T12:39:12.926Z,
        wMajorityWriteAvailabilityDate: 2024-01-01T12:39:12.944Z
      },
      members: [
        {
          _id: 0,
          name: 'thor-live.d1:27017',
          health: 1,
          state: 1,
          stateStr: 'PRIMARY',
          uptime: 28268,
          optime: [Object],
          optimeDate: 2024-01-01T20:30:03.000Z,
          lastAppliedWallTime: 2024-01-01T20:30:03.989Z,
          lastDurableWallTime: 2024-01-01T20:30:03.989Z,
          syncSourceHost: '',
          syncSourceId: -1,
          infoMessage: '',
          electionTime: Timestamp({ t: 1704112752, i: 1 }),
          electionDate: 2024-01-01T12:39:12.000Z,
          configVersion: 15,
          configTerm: 12,
          self: true,
          lastHeartbeatMessage: ''
        },
        {
          _id: 1,
          name: 'thor-live.d2:27017',
          health: 1,
          state: 2,
          stateStr: 'SECONDARY',
          uptime: 28263,
          optime: [Object],
          optimeDurable: [Object],
          optimeDate: 2024-01-01T20:30:03.000Z,
          optimeDurableDate: 2024-01-01T20:30:03.000Z,
          lastAppliedWallTime: 2024-01-01T20:30:03.989Z,
          lastDurableWallTime: 2024-01-01T20:30:03.989Z,
          lastHeartbeat: 2024-01-01T20:30:04.769Z,
          lastHeartbeatRecv: 2024-01-01T20:30:04.202Z,
          pingMs: Long('0'),
          lastHeartbeatMessage: '',
          syncSourceHost: 'thor-live.d1:27017',
          syncSourceId: 0,
          infoMessage: '',
          configVersion: 15,
          configTerm: 12
        },
        {
          _id: 2,
          name: 'thor-live.d3:27017',
          health: 1,
          state: 2,
          stateStr: 'SECONDARY',
          uptime: 28263,
          optime: [Object],
          optimeDurable: [Object],
          optimeDate: 2024-01-01T20:30:03.000Z,
          optimeDurableDate: 2024-01-01T20:30:03.000Z,
          lastAppliedWallTime: 2024-01-01T20:30:03.989Z,
          lastDurableWallTime: 2024-01-01T20:30:03.989Z,
          lastHeartbeat: 2024-01-01T20:30:04.769Z,
          lastHeartbeatRecv: 2024-01-01T20:30:04.762Z,
          pingMs: Long('0'),
          lastHeartbeatMessage: '',
          syncSourceHost: 'thor-live.d2:27017',
          syncSourceId: 1,
          infoMessage: '',
          configVersion: 15,
          configTerm: 12
        },
        {
          _id: 3,
          name: 'thor-live.d4:27017',
          health: 1,
          state: 2,
          stateStr: 'SECONDARY',
          uptime: 28263,
          optime: [Object],
          optimeDurable: [Object],
          optimeDate: 2024-01-01T20:30:03.000Z,
          optimeDurableDate: 2024-01-01T20:30:03.000Z,
          lastAppliedWallTime: 2024-01-01T20:30:03.989Z,
          lastDurableWallTime: 2024-01-01T20:30:03.989Z,
          lastHeartbeat: 2024-01-01T20:30:04.769Z,
          lastHeartbeatRecv: 2024-01-01T20:30:04.760Z,
          pingMs: Long('0'),
          lastHeartbeatMessage: '',
          syncSourceHost: 'thor-live.d2:27017',
          syncSourceId: 1,
          infoMessage: '',
          configVersion: 15,
          configTerm: 12
        }
      ],
      ok: 1,
      '$clusterTime': {
        clusterTime: Timestamp({ t: 1704141003, i: 1 }),
        signature: {
          hash: Binary.createFromBase64('AAAAAAAAAAAAAAAAAAAAAAAAAAA=', 0),
          keyId: Long('0')
        }
      },
      operationTime: Timestamp({ t: 1704141003, i: 1 })
    }

SnackK0515 (1 rep)

Jan 1, 2024, 03:53 PM • Last activity: Apr 29, 2025, 12:02 PM

1 votes

1 answers

497 views

How can I first sort by color & then in alphabetical order in ssrs?

sorting ssrs-2016

I'm still new to ssrs. In my query the color is determined by location. In my query I did ORDER BY [Location], Underwriter. Now this works perfectly fine in my query, but when I run it in SSRS it doesn't sort like in the query. I get this[![enter image description here][1]][1] what I would like to d...

                                  I'm still new to ssrs. In my query the color is determined by location. In my query I did 
ORDER BY [Location], Underwriter. Now this works perfectly fine in my query, but when I run it in SSRS it doesn't sort like in the query. I get this

what I would like to do is first group the colors together and & set the underwriters in alphabetical order i.e. Balluka, Bill(light blue)
           Matanane, Brian(green)
           Walsh, Daniel(green)
           Davidson, Christine (pink)
           Hamilton, Marianne (pink)
           Karger, Tierney (pink)
           Marroquin, Melanie (pink)
           Sylvester, Todd (pink)
           Hopp, Robert (white)
           Lohr, Jeff (white)
           Raskovic, Hanna (white)

Now what I tried doing in ssrs was in the design I right clicked underwriter> interactive sorting: tried sorting by underwriter & location but nothing happens when I run it. Please help 
 

                                

user203633 (11 rep)

Mar 19, 2020, 05:07 PM • Last activity: Apr 19, 2025, 12:03 PM

2 votes

2 answers

51 views

Collate a query result column with a simple list?

postgresql query sorting

I have many queries that return some statistical data on a subject in various tables. The result includes a "classification" that is one of {"C","P","M"}. I want the query to return the rows in the order {"M","P","C"} i.e. its is not alphabetic nor any built in collation. The data is irrelevant, but...

                                  I have many queries that return some statistical data on a subject in various tables. The result includes a "classification" that is one of {"C","P","M"}. I want the query to return the rows in the order {"M","P","C"} i.e. its is not alphabetic nor any built in collation.

The data is irrelevant, but say I want to query something like a jockey's strike rate by those classifications and order the result by those codes in that order.
I hope that makes sense?

So, is there a way to create a Postgresql Collation that can help me or is there another way? Note there are many queries, so a complex CASE clause is not my preferred solution as I'd have to put it in every query (in every program that may even contain it's own query as a literal.

Adam Ant (23 rep)

Mar 29, 2025, 07:54 AM • Last activity: Mar 31, 2025, 01:10 PM

2 votes

1 answers

488 views

How does the use of derived tables speed up sorting?

performance join sorting firebird derived-tables

In the article [45 Ways To Speed Up Firebird Database][1], it states that > 30. Use derived tables to optimize `SELECT` with `ORDER BY`/`GROUP BY` > Another way to optimize SQL query with sorting is to use derived > tables to avoid unnecessary sort operations. Instead of > > SELECT FIELD_KEY, FIELD1...

                                  In the article 45 Ways To Speed Up Firebird Database , it states that

> 30. Use derived tables to optimize SELECT with ORDER BY/GROUP BY

> Another way to optimize SQL query with sorting is to use derived
> tables to avoid unnecessary sort operations. Instead of
> 
>     SELECT FIELD_KEY, FIELD1, FIELD2, ... FIELD_N
>     FROM T
>     ORDER BY FIELD2
> 
> use the following modification:
> 
>     SELECT T.FIELD_KEY, T.FIELD1, T.FIELD2, ... T.FIELD_N FROM
>         (SELECT FIELD_KEY FROM T ORDER BY FIELD2) T2
>     JOIN T ON T.FIELD_KEY = T2.FIELD_KEY

I assume T.FIELD_KEY means a primary key column.

In other words, it shunts the sorting to a subquery, and then 'self-joins' to get the rest of the necessary columns.

As I understand it, this will let Firebird do the sorting on a narrower resultset which would use less memory and be more cache-friendly, but surely it shouldn't give that much a boost? Especially when it needs to fetch the other columns afterwards, potentially in a random I/O pattern too.

On the other hand, this reeks of undefined behaviour in that there's no guarantee (in standard SQL) that the engine would retain the order after the join -- it could just join the other way, wouldn't it?

Are my guesses correct? Am I missing something?
                                

oals (121 rep)

May 24, 2016, 01:15 PM • Last activity: Jan 7, 2025, 12:35 PM

4 votes

2 answers

990 views

Why is sorting a table (loaded with random data) faster than actually sorting random data?

postgresql sorting

I want to run a benchmark of sorting random records using external merge sort algorithm on Postgresql. So I tried the following 2 ways (one right after another, keeping all parameters/configurations same): Attempt 1: ``` CREATE TABLE test(id BIGINT, name varchar(200)); INSERT INTO test (id,name) SEL...

CREATE TABLE test(id BIGINT, name varchar(200));
INSERT INTO test (id,name) SELECT (random() * 1000000), concat(CONCAT(md5(random()::text), md5(random()::text))) FROM generate_series(1, 1000000) as t;

explain analyze select * from test order by id, name;
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Gather Merge  (cost=41486.43..63526.06 rows=188898 width=426) (actual time=76.477..207.253 rows=1000000 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Sort  (cost=40486.40..40722.52 rows=94449 width=426) (actual time=73.418..101.593 rows=333333 loops=3)
         Sort Key: id, name
         Sort Method: external merge  Disk: 29744kB
         Worker 0:  Sort Method: external merge  Disk: 26008kB
         Worker 1:  Sort Method: external merge  Disk: 25512kB
         ->  Parallel Seq Scan on test  (cost=0.00..14278.49 rows=94449 width=426) (actual time=0.011..20.945 rows=333333 loops=3)
 Planning Time: 2.820 ms
 Execution Time: 227.090 ms
(11 rows)

Attempt 2:

explain analyze SELECT (random() * 1000000) as id, concat(CONCAT(md5(random()::text), md5(random()::text))) as name
FROM generate_series(1, 1000000) as t order by id, name;
                                                                QUERY PLAN                                                                 
-------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=194348.85..196848.85 rows=1000000 width=40) (actual time=1707.086..1768.986 rows=1000000 loops=1)
   Sort Key: ((random() * '1000000'::double precision)), (concat(concat(md5((random())::text), md5((random())::text))))
   Sort Method: external merge  Disk: 81256kB
   ->  Function Scan on generate_series t  (cost=0.00..40000.00 rows=1000000 width=40) (actual time=55.734..1388.681 rows=1000000 loops=1)
 Planning Time: 0.191 ms
 JIT:
   Functions: 3
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 0.338 ms (Deform 0.000 ms), Inlining 0.000 ms, Optimization 0.497 ms, Emission 11.837 ms, Total 12.672 ms
 Execution Time: 1841.843 ms
(10 rows)

Can someone explain to me, why sorting randomly generated data is slower than sorting the similar random data from the disk? I re-ran both the queries with max_parallel_workers_per_gather = 0; the latency of the first query dropped to 360ms, while as expected, the second query hadn't budged.

Rohan Dvivedi (43 rep)

Dec 27, 2024, 08:40 AM • Last activity: Dec 27, 2024, 03:55 PM

1 votes

2 answers

126 views

Execution Plan Estimates vs Actuals with Inequality Filters

sql-server query-performance sorting cardinality-estimates

I have the following SQL query: ```sql declare @p1 INT = 20240703; declare @p2 INT = 20240703; declare @p3 NVARCHAR(50) = N'USA'; SELECT R.taxareaid, R.filtertypes FROM region R JOIN country C ON R.countryid = C.countryid where C.name = @p3 and R.effdate = @p2 ORDER BY R.taxareaid OPTION (RECOMPILE)...

I have the following SQL query:

declare @p1 INT = 20240703;
declare @p2 INT = 20240703;
declare @p3 NVARCHAR(50) = N'USA';

SELECT R.taxareaid, R.filtertypes
FROM region R
JOIN country C ON R.countryid = C.countryid 
where C.name = @p3
and  R.effdate = @p2
ORDER BY R.taxareaid 
OPTION (RECOMPILE);

https://www.brentozar.com/pastetheplan/?id=B1FTNkXHkx

CREATE TABLE [dbo].[Region](
	[regionId] [numeric](18, 0) NOT NULL,
	[taxAreaId] [numeric](18, 0) NOT NULL,
	[effDate] [numeric](8, 0) NOT NULL,
	[expDate] [numeric](8, 0) NOT NULL,
	[countryId] [numeric](18, 0) NOT NULL,
	[mainDivisionId] [numeric](18, 0) NOT NULL,
	[subDivisionId] [numeric](18, 0) NOT NULL,
	[cityId] [numeric](18, 0) NOT NULL,
	[postalCodeId] [numeric](18, 0) NOT NULL,
	[cityCompressedId] [numeric](18, 0) NOT NULL,
	[subDivCompressedId] [numeric](18, 0) NOT NULL,
	[filterTypes] [numeric](32, 0) NOT NULL,
	[updateId] [numeric](18, 0) NOT NULL,
 CONSTRAINT pk_region PRIMARY KEY CLUSTERED 
(
	[regionId] ASC
)

### Problem: - **Execution Plan Estimates**: When I look at the execution plan, I notice that the estimates are much smaller than the actual rows processed. Although I'm using OPTION (RECOMPILE) to prevent parameter sniffing, I'm still not getting accurate estimates. I’ve also updated statistics on the region table using a full scan, but the estimates are still incorrect. - **TempDB Spill**: The query is leading to a spill to TempDB during sorting. ### What I’ve Tried: 1. **Updated Statistics**: I performed a full scan to update statistics on the region table. 2. **Indexes**: I created an index on region(taxareaid) and a composite index on (countryid, effdate, expdate, taxareaid), but I am still seeing sorting in the execution plan. ### My Questions: 1. **How can I get more accurate execution plan estimates** to avoid the TempDB spill during sorting? 2. **How can I avoid the sorting operation entirely?** Are there other strategies I can try, given that indexing doesn’t seem to resolve the issue?

sebeid (1415 rep)

Dec 19, 2024, 11:32 PM • Last activity: Dec 21, 2024, 06:05 AM

2 votes

4 answers

6996 views

Is it possible to sort by the sequence of input IDs?

postgresql order-by sorting

I want to find a user's favorite music list. First step, I query all `id` from user's `favorite` table. In the next step I query the music list by `id` from `songs` table. But now I want to sort rows by the timestamp column `favorite.added_at` desc. I do not want to join table, so I want to figure o...

                                  I want to find a user's favorite music list. First step, I query all id from user's favorite table. In the next step I query the music list by id from songs table. But now I want to sort rows by the timestamp column favorite.added_at desc.

I do not want to join table, so I want to figure out if it's possible to sort by the input id in the in query.  
For example:


    -- the 3087 is first record
    select *
    from songs s 
    where id in (3087,5122)
    
    -- the 5122 is first record
    select *
    from songs s 
    where id in (5122,3087)

I have tried in PostgreSQL 13, and the row with id = 3087 was always first. Seems like PostgreSQL does not fetch rows by the sequence of values in the IN list?
                                

Dolphin (939 rep)

Sep 26, 2021, 04:53 AM • Last activity: Dec 9, 2024, 05:32 PM

1 votes

1 answers

47 views

MySQL order by with conditional clauses

mysql mysql-8.0 order-by sorting

I have these 2 tables CREATE TABLE `users` ( `id` bigint NOT NULL AUTO_INCREMENT, `status` varchar(255) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=330031656 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ; CREATE TABLE `user_meta` ( `id` int NOT NULL AUTO_INCREMENT, `user_id...

I have these 2 tables CREATE TABLE users ( id bigint NOT NULL AUTO_INCREMENT, status varchar(255) DEFAULT NULL, PRIMARY KEY (id) ) ENGINE=InnoDB AUTO_INCREMENT=330031656 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ; CREATE TABLE user_meta ( id int NOT NULL AUTO_INCREMENT, user_id bigint NOT NULL, meta_id bigint NOT NULL, value bigint NOT NULL, PRIMARY KEY (id), KEY usermeta_user_id_meta_type_meta_value (user_id,meta_id,value), CONSTRAINT user_meta_ibfk_1 FOREIGN KEY (user_id) REFERENCES users (id) ) ENGINE=InnoDB AUTO_INCREMENT=16728 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ; The customer requirement is that they want to sort based on value, for a specific meta_id. For example, let's say there are 10 meta_ids, and the customer selects meta_id=1111. In this case, all the users who have meta_id=1111 associated should be sorted by their value first, and the other users who don't have meta_id=1111 can come in any order at the bottom. There have been similar questions asked and I've tried to form queries based on their answers, but they don't seem to work for me. This [answer](https://stackoverflow.com/a/3550974) and [this one too](https://dba.stackexchange.com/questions/201719/how-to-use-a-conditional-order-by-in-mysql) talk about using if-else and case-when-then statements, but when I try any of these

select u.id, um.meta_id, um.value from users u
inner join user_meta um on um.user_id = u.id
order by if(um.meta_id=1111, value, 1);

select u.id, um.meta_id, um.value from users u
inner join user_meta um on um.user_id = u.id
order by case um.meta_id
when 1111 then value else 1 end;

select u.id, um.meta_id, um.value from users u
inner join user_meta um on um.user_id = u.id
order by case 
when um.meta_id = 1111 then value else u.id end;

I get this

+-----------+---------+------------+
| id        | meta_id | value      |
+-----------+---------+------------+
| 326480529 |     200 | 1730358000 |
| 326850494 |    1111 | 1730185200 |
| 326785127 |    1111 | 1730271600 |
| 326833934 |    1111 | 1730358000 |
| 326467136 |    1111 | 1730358000 |
| 328079379 |    1111 | 1730793600 |
+-----------+---------+------------+

I want all the users with meta_id=1111 to come at the top, but neither do they come at the top, nor are they sorted within themselves. Similarly, for desc order, the ones with meta_id=1111 should come at the top, sorted in descending order, and all the others can come at the bottom, such as this

+-----------+---------+------------+
| id        | meta_id | value      |
+-----------+---------+------------+
| 328079379 |    1111 | 1730793600 |
| 326833934 |    1111 | 1730358000 |
| 326467136 |    1111 | 1730358000 |
| 326785127 |    1111 | 1730271600 |
| 326850494 |    1111 | 1730185200 |
| 326480529 |     200 | 1730358000 |
+-----------+---------+------------+

I'd really appreciate any help or direction in getting this right. Thank you so much! I'm also posting some INSERT statements for both the tables to make it easier to replicate on your local machines

INSERT INTO users (id,status) VALUES (328079379,'active');
INSERT INTO users (id,status) VALUES (326833934,'active');
INSERT INTO users (id,status) VALUES (326467136,'deleted');
INSERT INTO users (id,status) VALUES (326785127,'inactive');
INSERT INTO users (id,status) VALUES (326850494,'removed');
INSERT INTO users (id,status) VALUES (326480529,'active');


INSERT INTO user_meta (id,user_id,meta_id,value) VALUES (13155,328079379,1111,1730793600);
INSERT INTO user_meta (id,user_id,meta_id,value) VALUES (13045,326833934,1111,1730358000);
INSERT INTO user_meta (id,user_id,meta_id,value) VALUES (13009,326467136,1111,1730358000);
INSERT INTO user_meta (id,user_id,meta_id,value) VALUES (13010,326785127,1111,1730271600);
INSERT INTO user_meta (id,user_id,meta_id,value) VALUES (13051,326850494,1111,1730185200);
INSERT INTO user_meta (id,user_id,meta_id,value) VALUES (13008,326480529,200,1730358000);

Sidharth Samant (203 rep)

Nov 15, 2024, 01:47 PM • Last activity: Nov 15, 2024, 02:05 PM

1 votes

1 answers

52 views

Why does ORDER BY significantly slow down my query with a computed score of trigram similarity and null field?

full-text-search postgresql-9.6 order-by sorting

I'm working on optimizing a query in PostgreSQL, and I've encountered a performance issue when using the ORDER BY clause. The query is intended to search profiles based on a similarity match to a name (for example: 'john') and then order the results by a computed score. The score is a combination of...

                                  
I'm working on optimizing a query in PostgreSQL, and I've encountered a performance issue when using the ORDER BY clause. The query is intended to search profiles based on a similarity match to a name (for example: 'john') and then order the results by a computed score. The score is a combination of word similarity and whether the profile has an avatar.

Here's the query:

    SELECT
        uuid,
        type,
        byline,
        display_name,
        username,
        avatar,
        (
            word_similarity('john', search_text) +
            CASE WHEN avatar != '' THEN 1 ELSE 0 END
        ) AS combined_score
    FROM test_mv_all_profiles
    WHERE
        'john'   Sort  (cost=35130.07..35158.41 rows=11335 width=52) (actual time=8092.502..8092.565 rows=100 loops=1)
            Sort Key: ((word_similarity('john'::text, search_text) + (CASE WHEN ((avatar)::text  ''::text) THEN 1 ELSE 0 END)::double precision)) DESC
            Sort Method: top-N heapsort  Memory: 51kB
            Buffers: shared hit=66811
            ->  Bitmap Heap Scan on test_mv_all_profiles  (cost=187.84..34696.86 rows=11335 width=52) (actual time=69.060..8052.737 rows=90765 loops=1)
                  Recheck Cond: ('john'::text   Bitmap Index Scan on test_idx_mv_social_profile_search_text_trigram_idx_gin  (cost=0.00..185.01 rows=11335 width=0) (actual time=58.323..58.323 rows=91483 loops=1)
                        Index Cond: ('john'::text   Index Scan using test_idx_mv_social_profile_search_text_trigram_idx on test_mv_all_profiles  (cost=0.42..44444.13 rows=11335 width=52) (actual time=0.506..4.417 rows=100 loops=1)
            Index Cond: ('john'::text <% search_text)
            Rows Removed by Index Recheck: 1
            Buffers: shared hit=311
    Planning time: 0.118 ms
    Execution time: 4.482 ms

My questions:

 - Why does the ORDER BY clause slow down the query so much? 
 - Is there a
   way to optimize this query while keeping the ORDER BY clause? Would
   adding an index on the computed score help, and if so, how should I
   approach that?

Additional Information:

The table test_mv_all_profiles is a materialized view with around 11M rows.

We are using a rather old version of Postgres (9.6) so some newer features are not available to us in the mean time.

The search_text field is a concatenation of multiple columns (like username, first name, and last_name).

I already have a trigram index on search_text for the similarity search.

I'm looking for advice on how to maintain performance while still being able to sort by the combined score. Any insights or recommendations would be greatly appreciated!


                                

Sheila Loekito (11 rep)

Aug 28, 2024, 10:11 PM • Last activity: Aug 29, 2024, 10:29 AM

3 votes

2 answers

3936 views

postgresql random function returns duplicate results when order by random used in query

postgresql sorting random

I'm using postgresql 11.1-1 on Windows 10. On my system, if I run a query like the following, I will get a different number for each random column: postgres=# SELECT random() as sortOrder, random() as col2; sortorder | col2 -------------------+------------------- 0.607938482426107 | 0.12123422557488...

                                  I'm using postgresql 11.1-1 on Windows 10. On my system, if I run a query like the following, I will get a different number for each random column:


    postgres=# SELECT random() as sortOrder, random() as col2;
         sortorder     |       col2
    -------------------+-------------------
     0.607938482426107 | 0.121234225574881
    (1 row)

But if I add an ORDER BY clause, as shown below, then random returns the same random number each time for every row.

    postgres=# SELECT random() as sortOrder, random() as col2 
                   FROM generate_series(0,1) 
                   ORDER BY sortOrder;
         sortorder     |       col2
    -------------------+-------------------
     0.100375576410443 | 0.100375576410443
     0.170669795479625 | 0.170669795479625
    (2 rows)

I'm assuming there is a technical reason for this, but is it possible in this last query to get a different random value for each column?

**Edit:**
I may have oversimplified in my original question. The actual query I was attempting was more like the following:

    SELECT column1, random(), column2, random()
                   FROM table1
                   ORDER BY random();

So, having an answer that can be combined with a table query is also important.
                                

TMorgan (143 rep)

Mar 1, 2019, 10:44 PM • Last activity: Jun 30, 2024, 09:25 PM

0 votes

2 answers

432 views

Sort results by number of matching words

mysql order-by sorting

I have a table (called sentences) in which I have some sentences. I want people to be able to search it with multiple words, which would make the SQL query look like this: SELECT content FROM sentences WHERE content LIKE "% %" OR content LIKE "% %" OR content LIKE "% %" etc... ; How can I sort the r...

                                  I have a table (called sentences) in which I have some sentences. I want people to be able to search it with multiple words, which would make the SQL query look like this:

    SELECT content FROM sentences WHERE content LIKE "%%"
                                     OR content LIKE "%%"
                                     OR content LIKE "%%" etc... ;

How can I sort the results by the number of occurrences of different words, the most matching sentence comes first, etc...?

loicEzt (1 rep)

Oct 20, 2020, 08:27 PM • Last activity: Jun 25, 2024, 02:33 PM

2 votes

2 answers

21732 views

Clever way to order a json item by key in postgresql (two examples)

postgresql json order-by sorting

I need to sort a json item in a record, so here are two examples: # First case ``` create table jsontable ( jsonitem json ); ``` Next I insert a json item: ``` insert into jsontable(jsonitem) values ('{ "3" : "foo", "2" : "bar", "1" : "qux" }'); ``` Then I query the values: ``` select t.jsonitem fro...

I need to sort a json item in a record, so here are two examples: # First case

create table jsontable (
    jsonitem json
);

Next I insert a json item:

insert into jsontable(jsonitem) values ('{ "3" : "foo", "2" : "bar", "1" : "qux" }');

Then I query the values:

select t.jsonitem from jsontable t;
                 jsonitem                  
-------------------------------------------
 { "3" : "foo", "2" : "bar", "1" : "qux" }
(1 row)

Let's suppose the key is unique and is an integer > 0; **Question 1:** ¿ How to sort the json item values by the key value, and get the following ?

{ "1" : "qux", "2" : "bar", "3" : "foo" }

# Second case This case use named keys and named values.

create table jsontable_arr(
   jsonitem json
   );

insert into jsontable_arr(jsonitem) values ('[  { "key" : "3" , "value": "foo"}, { "key" : "2" , "value": "bar"}, { "key" : "1" , "value": "qux"} ]');

select t.* from jsontable_arr t;
                                                jsonitem
--------------------------------------------------------------------------------------------------------
 [  { "key" : "3" , "value": "foo"}, { "key" : "2" , "value": "bar"}, { "key" : "1" , "value": "qux"} ]
(1 row)

**Question 2:** ¿ How to sort the json array item by the key value, and get the following?

[  { "key" : "1" , "value": "qux"}, { "key" : "2" , "value": "bar"}, {"key" : "3" , "value": "foo"} ]

Thanks for your suggestions.

nesiax (43 rep)

Dec 26, 2020, 05:48 PM • Last activity: Jun 12, 2024, 02:08 PM

26 votes

1 answers

27620 views

How to preserve the original order of elements in an unnested array?

postgresql sorting array parse

Given the string: > 'I think that PostgreSQL is nifty' I would like to operate on the individual words found within that string. Essentially, I have a separate from which I can get word details and would like to join an unnested array of that string on this dictionary. So far I have: select word, me...

                                  Given the string:

> 'I think that PostgreSQL is nifty'

I would like to operate on the individual words found within that string. Essentially, I have a separate from which I can get word details and would like to join an unnested array of that string on this dictionary. 

So far I have:

    select word, meaning, partofspeech
    from unnest(string_to_array('I think that PostgreSQL is nifty',' ')) as word
    from table t
    join dictionary d
    on t.word = d.wordname;

This accomplishes the fundamentals of what I was hoping to do, but it does not preserve the original word order.

Related question:  
PostgreSQL unnest() with element number 
                                

swasheck (10755 rep)

Oct 19, 2012, 08:00 PM • Last activity: May 15, 2024, 08:26 PM

0 votes

2 answers

1461 views

Sort by multiple columns, using nulls first on one of them

postgresql order-by sorting

Given a `members` table, having the following columns: accepted_at datetime , updated_at datetime NOT NULL Given this query: select * from members order by accepted_at DESC NULLS FIRST, updated_at DESC We invariably always want `accepted_at IS NULL` at the beginning of the list, then the most recent...

                                  Given a members table, having the following columns:

      accepted_at datetime
    , updated_at  datetime NOT NULL

Given this query:

    select * from members
    order by accepted_at DESC NULLS FIRST, updated_at DESC

We invariably always want accepted_at IS NULL at the beginning of the list, then the most recent members updated following in the list.  
The above query prioritizes on the updated_at column, so I end up with accepted members at the beginning of list.

Ben (179 rep)

Feb 8, 2023, 10:49 AM • Last activity: May 13, 2024, 02:58 PM

11 votes

6 answers

3207 views

Is it bad practice to have a "most_recent" boolean column in addition to a "create_at" timestamp column to track the latest version of a record?

best-practices sorting data-versioning

The table looks like this, it's SCD type 2: ``` +-----------+------------------+------------------------+ | id (text) | version (serial) | created_at (timestamp) | +-----------+------------------+------------------------+ ``` For 99% of queries we will be searching the entire table and filtering by...

The table looks like this, it's SCD type 2:

+-----------+------------------+------------------------+
| id (text) | version (serial) | created_at (timestamp) |
+-----------+------------------+------------------------+

For 99% of queries we will be searching the entire table and filtering by additional columns and join tables. For these queries we're only interested in the most recent version of a record per unique ID. We will also be sorting by created_at and other columns. To make it easy to find the most current records I was considering to add a most_recent (boolean) column as described in the answer here: https://stackoverflow.com/questions/34495479/add-constraint-to-make-column-unique-per-group-of-rows/34495621#34495621 However I realized we already have the created_at column which tells us this information - we could use a DISTINCT clause in our search queries and order by created_date as described by @Svet's answer here: https://stackoverflow.com/questions/17327043/how-can-i-select-rows-with-most-recent-timestamp-for-each-key-value However, we'd then have to re-order the results by the column that we actually want to use to show the data. It seems simpler in the long run to add the extra 'current' field, and like it would be more performant, but is it also bad practice?

Henry (213 rep)

Jan 11, 2024, 07:19 PM • Last activity: Jan 14, 2024, 12:43 AM

Showing page 1 of 20 total questions