How to Make Queries on a DATETIME Column Efficient If My Primary Query Pattern is an Hour?

0 votes

1 answer

83 views

##### Context Here is the DDL that I am intending to use to define the table for a logistics/delivery company.

CREATE TABLE scraping_details (
    id INT IDENTITY(1,1) PRIMARY KEY, -- Identity insert and autoincrement 
    unique_id VARCHAR2(64) NOT NULL,
    ts DATETIME NOT NULL, -- Timezone naive 
    pickup_zip VARCHAR2(6) NOT NULL,
    pickup_long NUMBER NOT NULL,
    pickup_lat NUMBER NOT NULL,
    dest_zip VARCHAR2(6) NOT NULL,
    dest_long NUMBER NOT NULL,
    dest_lat NUMBER NOT NULL,
    UNIQUE (unique_id)
);
SET IDENTITY_INSERT scraping_details OFF;

##### Query Pattern The most frequent query pattern that I foresee, will always seek the ts, pickup_zip and dest_zip columns for a specific hour of a specific day. That means, we will want all the rows (and above columns) where ts is between 19th June 2024, 10 am to 10:59:59 am. ##### Questions * How to modify the table creation command, especially ts to make this query as efficient as possible? Any kind of clustering or indexing on this row will help? I can trade some insertion latency to make this query efficient. * About the implications of turning off the identity insert, can I insert the rows from a polars dataframe (using SQLAlchemy) where the original dataframe does _not_ have the id column? Does it mean the database will create the corresponding numbers? ##### Backend Technology If important, my company is using an Oracle ADB for this purpose. Mentioning this as I believe different backends have different functionalities.

Asked by Della (73 rep)

Jun 17, 2024, 07:52 AM
Last activity: Jun 18, 2024, 08:07 AM

How to Make Queries on a DATETIME Column Efficient If My Primary Query Pattern is an Hour?

Related Questions