Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

2 votes

3 answers

1459 views

Automatically updating a record using count() of another collection

I have been looking at the docs and searching for videos on my problem, but I find nothing. I have a db called `HPDASH` where I have 2 collections, `scripts` and `serverList`. My problem is that I want to create a field for `serverList` that has the `count()` of the collection `scripts`. I have trie...

                                  I have been looking at the docs and searching for videos on my problem, but I find nothing. I have a db called HPDASH where I have 2 collections, scripts and serverList. My problem is that I want to create a field for serverList that has the count() of the collection scripts. 

I have tried this: 

db.scripts.insert([{name: 'local_script'}, {name: 'qidsloader'}]) 

db.serverList.insert({script_count: db.scripts.count() }) 

The problem is that when I insert another record to scripts, the data on scripts_count stays as 2.

Is it possible to achieve this on MongoDB?

sampjr (21 rep)

Jul 28, 2017, 09:13 PM • Last activity: Jul 3, 2025, 04:09 PM

0 votes

2 answers

451 views

postgresql cumulative counts in date range

postgresql group-by count date window-functions

I'm trying to get the cumulative count of rows (by `group_id`) between two dates that represent a time period where the row was active. I have a table like this: ``` group_id | id | type | start_date | end_date ----------+--------+------+------------+------------ 33 | 119435 | AAA | 2013-05-21 | 201...

I'm trying to get the cumulative count of rows (by group_id) between two dates that represent a time period where the row was active. I have a table like this:

group_id |   id   | type | start_date |  end_date
----------+--------+------+------------+------------
       33 | 119435 | AAA  | 2013-05-21 | 2014-05-19
       33 |  15144 | AAA  | 2013-05-21 | 2015-05-18
       33 |  29393 | AAA  | 2013-05-21 | 2016-05-23
       33 | 119437 | AAA  | 2013-05-21 | 2017-05-15
       33 |  62380 | AAA  | 2013-05-21 | 2099-12-31
       33 | 119436 | AAA  | 2013-05-21 | 2099-12-31
       33 |  27346 | AAA  | 2013-05-21 | 2099-12-31
       33 |  28529 | AAA  | 2014-05-20 | 2099-12-31
       33 | 221576 | AAA  | 2015-05-19 | 2099-12-31
       33 | 253893 | AAA  | 2016-05-24 | 2099-12-31
       33 | 251589 | AAA  | 2017-05-16 | 2099-12-31
       33 | 285245 | AAA  | 2019-01-24 | 2099-12-31
       34 | 253893 | AAA  | 2016-05-24 | 2099-12-31
       34 | 251589 | AAA  | 2017-05-16 | 2099-12-31
       34 | 285245 | AAA  | 2019-01-24 | 2099-12-31
       34 | 285246 | AAA  | 2019-05-31 | 2099-12-31

... and I need to get active counts for each of those date ranges like this:

group_id | start_date |  end_date  | active
----------+------------+------------+--------
       33 | 2013-05-21 | 2014-05-19 |      7
       33 | 2013-05-21 | 2015-05-18 |      8
       33 | 2013-05-21 | 2016-05-23 |      9
       33 | 2013-05-21 | 2017-05-15 |     10
       33 | 2013-05-21 | 2099-12-31 |     12
       33 | 2013-05-21 | 2099-12-31 |     12
       33 | 2013-05-21 | 2099-12-31 |     12
       33 | 2014-05-20 | 2099-12-31 |     11
       33 | 2015-05-19 | 2099-12-31 |     10
       33 | 2016-05-24 | 2099-12-31 |      9
       33 | 2017-05-16 | 2099-12-31 |      8
       33 | 2019-01-24 | 2099-12-31 |      8
       34 | 2016-05-24 | 2099-12-31 |      1
       34 | 2017-05-16 | 2099-12-31 |      2
       34 | 2019-01-24 | 2099-12-31 |      3
       34 | 2019-05-31 | 2099-12-31 |      4

I've tried various combinations of LAG and LEAD, with and without CTEs, but cannot come up with a solution. Is there a way to do this in a single query? If not a single query, perhaps a combination of queries in a UDF? **UPDATE** Per @jjanes comment below, I believe my source table is setup incorrectly. I think I should create the source table like this instead:

group_id |   id   | type | start_date |  end_date
----------+--------+------+------------+------------
  ... (skipped group 33) ...
       34 | 253893 | AAA  | 2016-05-24 | 2017-05-15
       34 | 253893 | AAA  | 2017-05-16 | 2019-01-23
       34 | 253893 | AAA  | 2019-01-24 | 2019-05-30
       34 | 253893 | AAA  | 2019-05-31 | 2099-12-31
       34 | 251589 | AAA  | 2017-05-16 | 2019-01-23
       34 | 251589 | AAA  | 2019-01-24 | 2019-05-30
       34 | 251589 | AAA  | 2019-05-31 | 2099-12-31
       34 | 285245 | AAA  | 2019-01-24 | 2019-05-30
       34 | 285245 | AAA  | 2019-05-31 | 2099-12-31
       34 | 285246 | AAA  | 2019-05-31 | 2099-12-31

With that change in the source data, the outcome of actives (showing only group 34 here) would be like this:

group_id | start_date |  end_date  | active
----------+------------+------------+--------
       34 | 2016-05-24 | 2017-05-15 |      1
       34 | 2017-05-16 | 2019-01-23 |      2
       34 | 2019-01-24 | 2019-05-30 |      3
       34 | 2019-05-31 | 2099-12-31 |      4

jacaetevha (1 rep)

Mar 21, 2020, 08:20 PM • Last activity: Jun 9, 2025, 09:01 AM

1 votes

2 answers

240 views

How to order by COUNT(*) of unique entries in GROUP BY?

mysql query group-by count mariadb-10.5

I search through a tag_map as SELECT ArticleID,COUNT(*) AS C FROM WordMap WHERE WordID IN( SELECT WordID FROM Words WHERE Word IN ('word1', 'word2') ) GROUP BY ArticleID ORDER BY C DESC I want to get article ids in which all words (here word1 and word2) appear first. The above query returns in order...

                                  I search through a tag_map as

    SELECT ArticleID,COUNT(*) AS C FROM WordMap 
    WHERE WordID IN(
    SELECT WordID FROM Words WHERE Word IN ('word1', 'word2')
    ) 
    GROUP BY ArticleID 
    ORDER BY C DESC

I want to get article ids in which all words (here word1 and word2) appear first.

The above query returns in order of the number (e.g., if word1 appears 100 times).

NOTE: WordMap table is huge, and this is the reason I went for sub-query rather than JOIN.
                                

Googlebot (4551 rep)

Oct 9, 2020, 07:28 AM • Last activity: Jun 8, 2025, 03:09 PM

1 votes

1 answers

853 views

Triggers: counting rows in a bridge table after INSERT

mysql trigger count many-to-many

I'm using a trigger in order to make a Statistic Table. I have this many-to-many relationship where I have an `Order` that can have multiple `Product`'s and viceversa. So after an INSERT on `Order`, using JPA, new rows are automatically inserted in the bridge table `Order_Products`. When in the Trig...

I'm using a trigger in order to make a Statistic Table. I have this many-to-many relationship where I have an Order that can have multiple Product's and viceversa. So after an INSERT on Order, using JPA, new rows are automatically inserted in the bridge table Order_Products. When in the Trigger I use

SET @numOfOPs = (
        SELECT COUNT(IDOrder) 
        FROM Order_Product 
        WHERE IDOrder = NEW.ID)

on new entries, the count on Order_Procut seems to return 0 (or the value before the insert on Order). Here the signature of the Trigger:

CREATE TRIGGER Order_AFTER_INSERT 
    AFTER INSERT ON Order 
    FOR EACH ROW

What could be the problem? I will add additional information if needed.

Filippo Scaramuzza (11 rep)

Feb 12, 2022, 05:04 PM • Last activity: May 31, 2025, 08:03 AM

0 votes

1 answers

252 views

MySQL Count(Table1.ColId) Where IN (any of Table2.Col1 ,Table2.Col2)

mysql count

I have 2 Tables Table 1 has Id, Name Table 2 has Id, Col1_IdVal, Col2_IdVal,Col3_IdVal,Col4_IdVal I would like simple solution for Count(T1.Id) WHERE T1.Id In (T2.Col1_IdVal,T2.Col2_IdVal,T2.Col3_IdVal,T2.Col4_IdVal) Something I have tried does not return correct result set , too few in the result :...

                                  I have 2 Tables 

    Table 1 has Id, Name
    Table 2 has Id, Col1_IdVal, Col2_IdVal,Col3_IdVal,Col4_IdVal

I would like simple solution for 

    Count(T1.Id) WHERE T1.Id In (T2.Col1_IdVal,T2.Col2_IdVal,T2.Col3_IdVal,T2.Col4_IdVal)

Something I have tried does not return correct result set , too few in the result :

    SELECT COUNT(T1.Id) FROM Table1 As T1     
    LEFT JOIN Table2 AS T2_1 ON T2_1.T2Column_1 = T1.Id     
    LEFT JOIN Table2 AS T2_2 ON T2_2.T2Column_2 = T1.Id    
    LEFT JOIN Table2 AS T2_3 ON T2_3.T2Column_3 = T1.Id    
    LEFT JOIN Table2 AS T2_4 ON T2_4.T2Column_4 = T1.Id    
    LEFT JOIN Table2 AS T2_5 ON T2_5.T2Column_5 = T1.Id;

Table 1 has Unique Values - it is Primary Key, Table 2 can have any number of values MANY.  Table2 columns are not foreign keys of Table1 - they are values that must exist in Table1 but other than that - they are not FKeys.

So basically Count(Id) Where Id Value Is in T2, Cols 1..n

StixO (177 rep)

Jul 15, 2019, 05:04 PM • Last activity: May 19, 2025, 10:06 PM

0 votes

1 answers

357 views

MySQL How can I create a dynamic result of numbers grouped by an "upto" value with a count

mysql group-by count

I'm looking to group a set of numbers by an "upto" value and show a count of how many results are up to this value omitting any zeros e.g. data could be 38600, 46526, 35813, 26629, 36127, 38196, 21006, 21375, 14867, 32187, 26498, 13506, 62730, 31187, 23876, 45946, 54845, 48000, 50000, 36783 And the...

+-----+-----+
|upto |count|
+-----+-----+
|15000|2    |
|25000|5    |
|30000|7    |
|35000|9    |
|40000|14   |
|50000|18   |
|60000|19   |
|70000|20   |
+-----+-----+

I've got as far as

SELECT
       (CASE
        WHEN value  500000 THEN (CEIL(value) / 100000) * 100000)
END) as upto, 
COUNT(*) as count
FROM table
GROUP BY upto;

Which gives the following

+-----+-----+
|upto |count|
+-----+-----+
|15000|2    |
|25000|3    |
|30000|2    |
|35000|2    |
|40000|5    |
|50000|4    |
|60000|1    |
|70000|1    |
+-----+-----+

Rob Bates (1 rep)

Nov 2, 2021, 09:01 AM • Last activity: Apr 23, 2025, 11:05 PM

0 votes

1 answers

55 views

Sqlite count while also grouping?

sqlite count

select * from tblA a left join tblB b on b.id = a.bId group by a.bId If I group based on a column, I get only the first record of the records that have same `bID` value. Is there a way to also get the total number of records that have same `bId`, for each row? Without doing a separate query... id |...

                                  
    select * from tblA a 
     left join tblB b on b.id = a.bId 
       group by a.bId 


If I group based on a column, I get only the first record of the records that have same bID value. Is there a way to also get the total number of records that have same bId, for each row? Without doing a separate query...


    id | bId
    --------
    1  |  1
    2  |  1
    3  |  1
    4  |  2
    5  |  3
    
    
    so in the above query I would expect
    
    id | bId. | counts
    -------------------
    1  |  1  |  3   <- 3 records with bId = 1
    4  |  2  |  1
    5  |  3  |  1


                                

Alex (181 rep)

Mar 24, 2025, 01:29 PM • Last activity: Mar 24, 2025, 03:08 PM

1 votes

2 answers

1902 views

How to count number of occurrences of each key in a MongoDB collection efficiently (one pass?)?

mongodb aggregate count

I'm a MongoDB noob and I have a MongoDB collection with millions of documents, each with tens of keys (average ~ 60, large variance) from a set of around 100. I'd like to get a count of each key across the entire collection. For example, if the documents in the collection were: ```json {"_id":0, "fo...

{"_id":0, "foo": 0, "bar":1, "baz":2}
{"_id":1, "foo": 0, "baz":7, "qux":11, "quux":13}
{"_id":2, "foo": 1, "bar":1, "quux":3}

then the desired output would be:

{"_id":3, "foo":3, "bar":2, "baz":2, "qux":1, "quux": 2}

I can "explode" the collection with $objectToArray, $unwind, $group, and then $count, but it's slow. Is there something that could do this efficiently in one pass through the collection? Something like,

[notional psuedocode]
output={}
foreach document:
  foreach key:
    if output.key exists:
      output.key+=1
    else:
      output.key=1

=> output: {key1: key1_count, ...}

rickhg12hs (111 rep)

Jan 21, 2022, 06:01 PM • Last activity: Feb 7, 2025, 10:02 AM

2 votes

1 answers

569 views

Slow ranking values in MySQL

mysql performance percona count

I have a table of around 30 million retail products, with various INT columns that describe sales counters and pricing at different retailers. I'm supposed to display well over 70 different rankings on every individual product's page. Example: #23 best seller in China, #732 best seller in USA, #13 c...

                                  I have a table of around 30 million retail products, with various INT columns that describe sales counters and pricing at different retailers.

I'm supposed to display well over 70 different rankings on every individual product's page.

Example: #23 best seller in China, #732 best seller in USA, #13 cheapest price in Norway, etc.

I've thought of various solutions to this problem:

1. Create ranking fields - I guessed it wouldn't work since just a single drop in price would mean updating millions of rows to reduce their ranking field by 1 for example.
2. Use COUNT - COUNT the amount of products cheaper than X : display ranking. I'm still considering this, however a single query takes 200 ms on average, so running 70 of them looks like it would take ages.

Really looking forward to advice from experienced database admins on this topic, as my knowledge feels very basic.

Additional information:

Hardware is dual Xeon CPUs, 15k RPM hard drives, 32 GB DDR3, LEMP stack, running latest Percona, all tables are XtraDB engine.

**Edit**: Posting CREATE statement as requested

    CREATE TABLE products (
      id varchar(64) COLLATE utf8_bin NOT NULL,
      retailer_id int(10) unsigned NOT NULL,
      sales_usa int(10) unsigned NOT NULL,
      sales_eu int(10) unsigned NOT NULL,
      sales_asia int(10) unsigned NOT NULL,
      sales_africa int(10) unsigned NOT NULL,
      sales_france int(10) unsigned NOT NULL,
      sales_germany int(10) unsigned NOT NULL,
      sales_russia int(10) unsigned NOT NULL,
      sales_china int(10) unsigned NOT NULL,
      sales_india int(10) unsigned NOT NULL,
      sales_uk int(10) unsigned NOT NULL,
      sales_spain int(10) unsigned NOT NULL,
      sales_norway int(10) unsigned NOT NULL,
      sales_sweden int(10) unsigned NOT NULL,
      sales_japan int(10) unsigned NOT NULL,
      PRIMARY KEY (id)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

Sample ranking statement:

    SELECT id FROM products WHERE sales_japan > 12345

After which I perform a mysql_num_rows() on the result to get that product's ranking in Japan for example.

                                

Tom (121 rep)

Feb 6, 2015, 10:38 PM • Last activity: Jan 26, 2025, 01:07 AM

1 votes

1 answers

645 views

mongo cluster collection count showing different every time

mongodb count

I had a replica set earlier which I have converted to cluster. Current configuration is: - 3 shard - 4 router - 4 config server When I am trying to get count of one of collection sometime it is showing null and some time some value , why this is happening? Please help.

                                  I had a replica set earlier which I have converted to cluster.

Current configuration is:

 - 3 shard

 - 4 router
 - 4 config server

When I am trying to get count of one of collection sometime it is showing null and some time some value , why this is happening? 

Please help.

kamal kishore (21 rep)

Jul 4, 2017, 09:14 AM • Last activity: Jan 19, 2025, 01:04 PM

0 votes

1 answers

1608 views

Efficient way to check for number of records more than a limit from a table in SQL Server

sql-server performance count query-performance

I came across queries that can most efficiently get the total sum of the records in the table. However, i want to group the records in a table by a column and check if the count of the records is higher than 1000, if not so, i need to return the count and if so, i can return -1. TableDesign CREATE T...

                                  I came across queries that can most efficiently get the total sum of the records in the table. However, i want to group the records in a table by a column and check if the count of the records is higher than 1000, if not so, i need to return the count and if so, i can return -1.

TableDesign

    CREATE TABLE [dbo].[Accounts](
    	[Id] [bigint] IDENTITY(1,1) NOT NULL,
    	[Description] [nvarchar](255) NOT NULL,
    	[CreatedById] [bigint] NOT NULL,
    	[CreatedDate] [datetimeoffset](7) NOT NULL,
    	[ModifiedById] [bigint] NOT NULL,
    	[ModifiedDate] [datetimeoffset](7) NOT NULL,
    	[IsActive] [bit] NOT NULL,
    	[StagingStatusId] [bigint] NULL,
    	[IsApproved] [bit] NOT NULL,
    	[IsActive] [bit] NOT NULL
    )

Currently, I have the following query that groups the data by the status and then returns the count. 

    select StagingStatusId, count(1) AS Total
    from I8Core.AccountCode
    where IsActive = 1 and IsApproved = 1
    group by StagingStatusId

However, as the number of records in the database table is higher (> 100k), the query gets slower and delayed. 

The aim of this query is to know what statusids are higher than 999 and what are lesser than 999.

> For the records that are higher than 999 (Ex. if statusid = 1 & count
> = 1244, we need as statusid=1,count=999+), else, we need to return statusid=1, count=998 (for any value less than 999).

I found that the below query can return the total row count per table, but this cannot be applied to this scenario.

Kindly suggest the best practice to achieve this.
                                

Saran (11 rep)

Nov 1, 2018, 05:26 PM • Last activity: Nov 19, 2024, 02:08 PM

-1 votes

1 answers

155 views

What happened to Oracle's COUNTDISTINCT() function?

oracle count oracle-19c distinct

I need the count of unique values, and am trying to use Oracle's [COUNTDISTINCT()](https://docs.oracle.com/cd/E93962_01/bigData.Doc/eql_onPrem/src/reql_aggregation_countdistinct.html) function to obtain that: ```sql select COUNTDISTINCT(a.m_label) from user_rep a, user_group_rep b, trn_grp_rep c whe...

select COUNTDISTINCT(a.m_label)
        from user_rep a, user_group_rep b, trn_grp_rep c
        where b.m_user_id = a.m_reference
...

This results in ORA-00904: "COUNTDISTINCT": invalid identifier... Using plain COUNT() works, but is not returning correct results. What am I doing wrong? What'd be an efficient work-around?

Mikhail T. (166 rep)

Nov 12, 2024, 05:43 PM • Last activity: Nov 12, 2024, 06:07 PM

2 votes

2 answers

232 views

sum() of aggregate count()?

postgresql count sum

I have a query where I pull the domain portion from email addresses out of a column and then count how many users from which domain I have. How do I get the total count (all users over all domains) in the same query? ``` select count(regexp_replace( email_id , '^[^@]+@','')) as count, regexp_replace...

select count(regexp_replace( email_id , '^[^@]+@','')) as count, regexp_replace( email_id , '^[^@]+@','') as domain from logins 
  where last_login between '2024-08-06' and '2024-09-06'  
  group by regexp_replace( email_id , '^[^@]+@','') 
  order by count;

tink (155 rep)

Sep 10, 2024, 08:52 PM • Last activity: Sep 12, 2024, 06:11 AM

0 votes

2 answers

219 views

How to get records which meet conditions stored in another tables

count

Problem: How to get passed requirements where there are 4 tables in such relation\ requirements->linksresults<-result Scenario: Table1 - requirements |req_id |status| |:-------:|:----:| |req-1 |open | |req-2 |open | Table2 - test_cases |test_id |status | |:------------:|:-------:| |tc-1 |open | |tc-2 |cancelled| Table3 - links |link_id |req_id |test_id | |:-------:|:------:|:------:| |link-1 |req-1 |tc-1 | |link-2 |req-1 |tc-2 | Table4 - test results |result_id|test_id |result | |:-------:|:--------:|:---------:| |res-1 |tc-1 |passed | |res-2 |tc-2 |failed | So far I've tried such an approach to generate a table req_id | cnt_test_cases | cnt_results and treat requirement as passed when cnt_test_cases = cnt_results(='passed') but it doesn't work as these calculated columns doesn't exist for 'Having' statement at the end

SELECT DISTINCT requirements.req_id,

(SELECT COUNT (DISTINCT test_cases.test_id)
FROM test_cases, links
WHERE links.req_id = requirements.req_id
AND links.test_id = test_cases.test_id
AND test_cases.status != 'cancelled') as cnt_test_cases,

(SELECT COUNT (DISTINCT results.result)
FROM results, links
WHERE links.req_id = requirements.req_id
AND links.test_id = results.test_id
AND results.reuslt = 'passed') as cnt_test_cases) as cnt_results

Having cnt_test_cases = cnt_results Maybe there is a better approach?

Łukasz Góra (11 rep)

Aug 8, 2024, 10:00 AM • Last activity: Aug 12, 2024, 07:31 PM

1 votes

1 answers

7392 views

How do I query only rows where multiple instances with the same ID exist with Postgres?

postgresql count where aggregate-filter

I have a table where I want to gather some statistics, but only for items that are repeated two or more times. Here is how the a simplified table looks like: CREATE TABLE test ( id bigint, something text ); Most rows have a unique `id`, some have repeated entries where `something` differs. What I'm...

                                  I have a table where I want to gather some statistics, but only for items that are repeated two or more times. Here is how the a simplified table looks like:

    CREATE TABLE test (
        id bigint,
        something text
    );

Most rows have a unique id, some have repeated entries where something differs.

What I'm wondering is whether there is a way to read the table but only items that have 2 or more rows with the same id.

I know how to count the number of duplicated rows using the COUNT() and GROUP BY:

    SELECT id, COUNT(id) AS count FROM test GROUP BY id;

I don't really see how to use the COUNT(id) in a WHERE or FILTER clause (since aggregate functions are not allowed there).

---

Just in case, a very small data sample would be:

     id | something
    ----|-----------
     1  | first
     2  | second
     2  | third

In the result I want to see:

     id | something
    ----|-----------
     2  | second
     2  | third

                                

Alexis Wilke (135 rep)

Feb 2, 2023, 07:58 PM • Last activity: Aug 5, 2024, 11:23 PM

7 votes

2 answers

11020 views

What is the most efficient way to count the number of rows in a table?

postgresql performance count query-performance

I am using Postgres with the following query: select count(*) from image; The primary key on this table is non-incrementing; it's a unique serial number for the images stored in the table. Our app often attempts to ingest images that have already been recorded in the database, so the primary key/ser...

                                  I am using Postgres with the following query:

    select count(*) from image;

The primary key on this table is non-incrementing; it's a unique serial number for the images stored in the table. Our app often attempts to ingest images that have already been recorded in the database, so the primary key/serial number ensures they are only recorded once. 

Now we are wondering if we should have gone with an incrementing primary key instead. We have 1,259,369 images in the database and it takes about 7 minutes for the count query to run.

Our app will never delete images from this table - so an incrementing primary key would allow us to check the value of the last ID which would equal the number of rows in the table.

andrewniesen (661 rep)

Mar 17, 2015, 11:41 AM • Last activity: Jun 29, 2024, 10:28 PM

1 votes

2 answers

2614 views

How to speed up counting and filtering on table with a lot of columns and filter conditions?

postgresql index postgresql-performance count where

I have TABLE_A in my database that has a lot of columns. Let's say there are 100 columns: COLUMN_1, COLUMN_2, COLUMN_3, .. COLUMN_100 There are so many columns not because of denormalized design, but because each row represent Entity with a lot of properties (in this example - with 99 properties, fi...

                                  I have TABLE_A in my database that has a lot of columns.

Let's say there are 100 columns: COLUMN_1, COLUMN_2, COLUMN_3, .. COLUMN_100

There are so many columns not because of denormalized design, but because each row represent Entity with a lot of properties (in this example - with 99 properties, first column is just an id)

My Application should do the following tasks:

- Receive a set of filter conditions from the user
- Execute a query in the database that filters the data on given conditions and then should count the result rows

There are no limits on the number of filter conditions:

- there may be no filter conditions
- there may be filter conditions on every column of the table
- there may be filter conditions on some columns of the table (for example on 50 columns out of all)

So below is an example query that my application executes:

    SELECT 
        COUNT(*)
    FROM 
        TABLE_A 
    WHERE 
        COLUMN_1 = 'test'
        AND COLUMN_2 != 'q'
        AND COLUMN_45 > 5
        AND COLUMN_45  1000
    
TABLE_A doesn't have any indexes - each column can be used for filtering and there are no sets of queries that are run much more frequently than the rest.

I also don't use any kind of cache on any level: insert and update operations happen not very often but more often than at least 2 query with the same filter conditions occur.

So in case of every query the sequential search is executed. It was not a problem earlier, but now the run-times of the query became unacceptable (number of rows in the table increased a lot).

So.. 

**Is there any solution on how to speed up the execution of the task?** 

Any suggestions would be appreciated - I can change anything except one thing - I should use SQL. Currently I use PostgreSQL, but I would change DBMS if it will help.
                                

nik (111 rep)

Feb 6, 2018, 02:51 PM • Last activity: Jun 21, 2024, 07:02 AM

1 votes

1 answers

139 views

PostgreSQL - Sum all row that satisfy a condition into a single row

postgresql query count statistics sum

I'm trying to extract some statistics from a Postgres database and I made this query: ```sql SELECT city, job_count, TO_CHAR(job_count * 100 / SUM(job_count) OVER (), '90D00%') AS job_share FROM ( SELECT localities.name as city, COUNT(*) AS job_count FROM jobads JOIN localities ON jobads.locality_uu...

I'm trying to extract some statistics from a Postgres database and I made this query:

SELECT
  city,
  job_count,
  TO_CHAR(job_count * 100 / SUM(job_count) OVER (), '90D00%') AS job_share
FROM (
  SELECT
    localities.name as city,
    COUNT(*) AS job_count
  FROM jobads
  JOIN localities ON jobads.locality_uuid = localities.uuid
  WHERE jobads.external = true
  GROUP BY localities.name
  ORDER BY job_count DESC
) AS job_count_table;

Here's the result it returns:

city          | job_count | job_share
-----------------------+-----------+-----------
 City #1               |      1300 |  13.00%
 City #2               |       700 |   7.00%
 City #3               |       400 |   4.00%
 ...
 City #1200            |         1 |   0.01%
(1200 rows)

It returns hundred of rows and most of them have a really low job count. I would like to merge all row that have a job_count less then 100 into a single row that would so the output would look something like this:

city          | job_count | job_share
-----------------------+-----------+-----------
 City #1               |      1300 |  13.00%
 City #2               |       700 |   7.00%
 City #3               |       400 |   4.00%
 Other cities          |      2000 |  20.00%
(4 rows)

Any idea how can I do that?

Mateja Maric (13 rep)

May 8, 2024, 01:34 PM • Last activity: May 8, 2024, 02:11 PM

4 votes

2 answers

3491 views

Count 2 tables same query - PG 9.3

postgresql count cte

I've got two tables that I need a count on both of them. table1: id table2: cid (table1.id = table2.cid) -- Join can be done by this. I already got two queries that I'm using, and wanna put them in one single query. Query related to `table1`: WITH c_status AS ( select CASE WHEN regdate = 1 THEN 'yes...

                                  I've got two tables that I need a count on both of them.

    table1: id
    table2: cid (table1.id = table2.cid) -- Join can be done by this.

I already got two queries that I'm using, and wanna put them in one single query.

Query related to table1:

    WITH c_status AS (
    select 
    CASE 
      WHEN regdate = 1 THEN 'yes'
      WHEN regdate = 2 THEN 'no'
      from table1
      end as status_1
    )
    
    select status_1, count(*) AS c_status_count from c_status group by status_1

OUTPUT:

    yes	548
    no	2269

Query related to table2:

    WITH u_status AS (
    select 
    CASE 
      WHEN regdate = 1 THEN 'yes'
      WHEN regdate = 2 THEN 'no'
      from table2
      end as status_2
    )
    
    select status_2, count(*) AS u_status_count from u_status group by status_2

OUTPUT:

    yes	564256
    no	31452345234

Question:

How can I put those two queries together? I want one single query reporting both counts.


UPDATE: Desired output:

    u_status yes	548
    u_status no	    2269
    c_status yes	564256
    c_status no	31452345234
                                

Patrick B. (311 rep)

Jul 9, 2017, 09:51 PM • Last activity: Apr 14, 2024, 07:15 PM

10 votes

4 answers

27668 views

Postgresql extremely slow count (with index, simple query)

postgresql performance count postgresql-9.5 query-performance

I need to run these simple queries on a table with millions of rows: ``` SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123; ``` ``` SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123 AND "subscriptions"."trashed_at" IS NULL; ``` The count result...

I need to run these simple queries on a table with millions of rows:

SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123;

SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123 AND "subscriptions"."trashed_at" IS NULL;

The count result for both queries, for project 123, is about 5M. I have an index in place on project_id, and also another index on (project_id, trashed_at):

"index_subscriptions_on_project_id_and_created_at" btree (project_id, created_at DESC)
"index_subscriptions_on_project_id_and_trashed_at" btree (project_id, trashed_at DESC)

The problem is that both queries are extremely slow and take about 17s each. These are the results of EXPLAIN ANALIZE:

QUERY PLAN                                                                                      
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2068127.29..2068127.30 rows=1 width=0) (actual time=17342.420..17342.420 rows=1 loops=1)
   ->  Bitmap Heap Scan on subscriptions  (cost=199573.94..2055635.23 rows=4996823 width=0) (actual time=1666.409..16855.610 rows=4994254 loops=1)
         Recheck Cond: (project_id = 123)
         Rows Removed by Index Recheck: 23746378
         Heap Blocks: exact=131205 lossy=1480411
         ->  Bitmap Index Scan on index_subscriptions_on_project_id_and_trashed_at  (cost=0.00..198324.74 rows=4996823 width=0) (actual time=1582.717..1582.717 rows=4994877 loops=1)
               Index Cond: (project_id = 123)
 Planning time: 0.090 ms
 Execution time: 17344.182 ms
(9 rows)

QUERY PLAN                                                                                      
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2047881.69..2047881.70 rows=1 width=0) (actual time=17557.218..17557.218 rows=1 loops=1)
   ->  Bitmap Heap Scan on subscriptions  (cost=187953.70..2036810.19 rows=4428599 width=0) (actual time=1644.966..17078.378 rows=4994130 loops=1)
         Recheck Cond: ((project_id = 123) AND (trashed_at IS NULL))
         Rows Removed by Index Recheck: 23746273
         Heap Blocks: exact=131144 lossy=1480409
         ->  Bitmap Index Scan on index_subscriptions_on_project_id_and_trashed_at  (cost=0.00..186846.55 rows=4428599 width=0) (actual time=1566.163..1566.163 rows=4994749 loops=1)
               Index Cond: ((project_id = 123) AND (trashed_at IS NULL))
 Planning time: 0.084 ms
 Execution time: 17558.522 ms
(9 rows)

What is the problem? What can I do to improve the performance (i.e. count in a few seconds)?

collimarco (653 rep)

Aug 22, 2019, 11:54 AM • Last activity: Apr 10, 2024, 01:12 PM

Showing page 1 of 20 total questions