Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes

1 answers

40 views

Why is my mean much more bigger than my Execution Time using hyperfine to benchmark my query performance?

```none QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=21267.11..21286.98 rows=461 width=31) (actual time=1.711..1.712 rows=1 loo...

QUERY PLAN                                                                                        

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

GroupAggregate  (cost=21267.11..21286.98 rows=461 width=31) (actual time=1.711..1.712 rows=1 loops=1)
Group Key: *
\-\>  Sort  (cost=21267.11..21268.91 rows=719 width=35) (actual time=1.564..1.591 rows=719 loops=1)
Sort Key: *
Sort Method: quicksort  Memory: 69kB
\-\>  Nested Loop  (cost=70.03..21233.00 rows=719 width=35) (actual time=0.483..1.454 rows=719 loops=1)
\-\>  Index Scan using *  (cost=0.28..8.30 rows=1 width=27) (actual time=0.017..0.018 rows=1 loops=1)
Index Cond: *
\-\>  Bitmap Heap Scan on measurements m  (cost=69.75..21213.91 rows=719 width=32) (actual time=0.240..0.994 rows=719 loops=1)
Recheck Cond: *
Filter: *
Rows Removed by Filter: 5241
Heap Blocks: exact=50
\-\>  Bitmap Index Scan on * (cost=0.00..69.57 rows=6018 width=0) (actual time=0.224..0.224 rows=5960 loops=1)
Index Cond: *
Planning Time: 0.697 ms
**Execution Time: 1.766 ms**
(17 rows)

                                              QUERY PLAN                                                                                        

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

GroupAggregate  (cost=21267.11..21286.98 rows=461 width=31) (actual time=0.897..0.898 rows=1 loops=1)
Group Key: *
\-\>  Sort  (cost=21267.11..21268.91 rows=719 width=35) (actual time=0.795..0.831 rows=719 loops=1)
Sort Key: *
Sort Method: quicksort  Memory: 69kB
\-\>  Nested Loop  (cost=70.03..21233.00 rows=719 width=35) (actual time=0.178..0.718 rows=719 loops=1)
\-\>  Index Scan using * (cost=0.28..8.30 rows=1 width=27) (actual time=0.004..0.005 rows=1 loops=1)
Index Cond: *
\-\>  Bitmap Heap Scan on measurements m  (cost=69.75..21213.91 rows=719 width=32) (actual time=0.081..0.457 rows=719 loops=1)
Recheck Cond: *
Filter: *
Rows Removed by Filter: 5241
Heap Blocks: exact=50
\-\>  Bitmap Index Scan on * (cost=0.00..69.57 rows=6018 width=0) (actual time=0.073..0.073 rows=5960 loops=1)
Index Cond: * 
Planning Time: 0.336 ms
**Execution Time: 0.929 ms**
(17 rows)

                                              QUERY PLAN                                                                                        

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

GroupAggregate  (cost=21267.11..21286.98 rows=461 width=31) (actual time=0.873..0.873 rows=1 loops=1)
Group Key: *
\-\>  Sort  (cost=21267.11..21268.91 rows=719 width=35) (actual time=0.794..0.813 rows=719 loops=1)
Sort Key: *
Sort Method: quicksort  Memory: 69kB
\-\>  Nested Loop  (cost=70.03..21233.00 rows=719 width=35) (actual time=0.168..0.717 rows=719 loops=1)
\-\>  Index Scan using * (cost=0.28..8.30 rows=1 width=27) (actual time=0.004..0.004 rows=1 loops=1)
Index Cond: *
\-\>  Bitmap Heap Scan on measurements m  (cost=69.75..21213.91 rows=719 width=32) (actual time=0.071..0.457 rows=719 loops=1)
Recheck Cond: * 
Filter: *
Rows Removed by Filter: 5241
Heap Blocks: exact=50
\-\>  Bitmap Index Scan on * 
(cost=0.00..69.57 rows=6018 width=0) (actual time=0.063..0.063 rows=5960 loops=1)
Index Cond: *
Planning Time: 0.304 ms
**Execution Time: 0.903 ms**
(17 rows)

\---------------------------------------------------------------------------------------------------------------

Time (mean ± σ):      **98.1 ms** ±  28.1 ms    \[User: 30.7 ms, System: 11.1 ms\]Range (min … max):    75.6 ms … 129.5 ms    3 runs

\---------------------------------------------------------------------------------------------------------------

I'm using hyperfine to benchmark the performance of my query in PostgreSQL. I used --runs 3 option to run it three times. As you can see, the execution time for all the three times I'm running the query takes 1.766, 0.929 and 0.903 ms respectively. My question is, why is the mean = 98.1 ms? What does this mean represent? Because it does not make any sense that the execution time is between 0.9 ms - 1.7 ms, while the mean of them is 98.1 ms. I tested to execute this same query in Postico and it took 0.903 ms. I'm just curious what the mean represents if it does not represent the execution average time.

Nuh Jama (1 rep)

Apr 26, 2024, 01:57 PM • Last activity: Apr 27, 2024, 06:53 PM

0 votes

1 answers

115 views

Why does MariaDB execution time is doubled for the same query (LOAD DATA INFILE)?

mariadb time load execution

I observed a strange behaviour regarding the execution time of a query to import a CSV file in an empty table created beforehand. The query execution time to import the file increases while repeating the import. I meet this behaviour while importing 10 times the same medium CSV file (0.6 GB, 6 colum...

                                  I observed a strange behaviour regarding the execution time of a query to import a CSV file in an empty table created beforehand. The query execution time to import the file increases while repeating the import.

I meet this behaviour while importing 10 times the same medium CSV file (0.6 GB, 6 columns, 8 million rows) using TRUNCATE then LOAD DATA INFILE, repeated 10 times within one MariaDB connection.
On the first iteration, the CSV import takes 40 seconds, then about 50 seconds on the second iteration, and from the third to the 10th iteration, the execution time reaches a plateau at 85 +/- 5 s.

I performed the test twice : 
   - on the mariadb shell (alias "mysql" on GNU Linux)
   - on python3 using mysql.connector

And I get the same result, i.e. an execution time that doubles (see figure)...

**• What could explain (or avoid) the execution time being doubled between the first and the third iteration ?**

Steps to reproduce the behaviour :
 
1. Initiation : create (just once) the empty table with a generated primary key (idRow) : 

       CREATE TABLE myTable (col1 VARCHAR(14), col2 VARCHAR(14), col3 VARCHAR(10), col4 VARCHAR(5), col5 VARCHAR(5), col6 VARCHAR(19), idRow INT NOT NULL AUTO_INCREMENT, PRIMARY KEY (idRow));

2. Repeat steps A. and B. 10 times and collect the execution time of step B. for each iteration :

A. Empty the table using TRUNCATE :

    TRUNCATE TABLE myTable;

B. Then import a 0.6 GB-large CSV file of 8 million rows and 6 columns :
  
    LOAD DATA INFILE "/myData/myFile.csv" INTO TABLE myTable FIELDS TERMINATED BY "," LINES TERMINATED BY "\n" IGNORE 1 ROWS (col1, col2, col3, col4, col5, col6) SET idRow=NULL;

Any help to understand this phenomenon would be welcome, dont hesitate to ask for more info. 

*Why do I do this ?
The goal is to build a procedure to measure robustly the statistics of the execution time of any query, and how much it fluctuates determines the number of iterations one needs to get a relevant sample size. I was surprised that any query could fluctuate of 100% in execution time.*

Giorgio

MariaDB server :
   - OS : Linux Mint 20
   - mariadb version : 10.3.38-MariaDB-0ubuntu0.20.04.1-log
   - innodb version : 10.3.38

[update] I made other interesting observations :

(i) : On the same OS session (i.e. no reboot) : closing the mariadb connection, or restarting the mariadb service (systemctl restart mariadb) does not prevent the 2nd iteration getting slower (50 to 87 s) than the first.

(ii) : After rebooting the OS, the B query gets fast again (40 sec).

GiorgioAbitbolo (1 rep)

Jun 8, 2023, 07:40 PM • Last activity: Jun 10, 2023, 06:57 AM

6 votes

3 answers

1323 views

Is there an option or hint possible to improve performance of query with multiple values in the "in" clause

sql-server performance execution-plan execution

We have a table CustomerNote with 4 columns ID, CustomerID, Note, Date There is an index on CustomerID asc, Date desc When the following query is executed select top 30 Date from CustomerNote where CustomerID in (1,5) order by Date desc The index is used, but it's still fetching ALL CustomerNotes fo...

                                  We have a table CustomerNote with 4 columns ID, CustomerID, Note, Date

There is an index on CustomerID asc, Date desc

When the following query is executed 

    select top 30 
        Date 
    from CustomerNote
    where CustomerID in (1,5)
    order by Date desc

The index is used, but it's still fetching ALL CustomerNotes for the customerID's 1 & 5, to then sort/top, causing a lot of CPU usage.

This is due to the multiple values in the "in" clause. I know that the "in" clause will never have more values than 10, so it would be a much better approach if sql server iterates over the 10, fetches at least 30 per customerID and the merge, sorts & tops. Is there a query hint or option to achieve this?

MichaelD (573 rep)

May 2, 2023, 01:30 PM • Last activity: May 3, 2023, 10:48 AM

Showing page 1 of 3 total questions