Sample Header Ad - 728x90

How to Make Queries on a DATETIME Column Efficient If My Primary Query Pattern is an Hour?

0 votes
1 answer
83 views
##### Context Here is the DDL that I am intending to use to define the table for a logistics/delivery company.
CREATE TABLE scraping_details (
    id INT IDENTITY(1,1) PRIMARY KEY, -- Identity insert and autoincrement 
    unique_id VARCHAR2(64) NOT NULL,
    ts DATETIME NOT NULL, -- Timezone naive 
    pickup_zip VARCHAR2(6) NOT NULL,
    pickup_long NUMBER NOT NULL,
    pickup_lat NUMBER NOT NULL,
    dest_zip VARCHAR2(6) NOT NULL,
    dest_long NUMBER NOT NULL,
    dest_lat NUMBER NOT NULL,
    UNIQUE (unique_id)
);
SET IDENTITY_INSERT scraping_details OFF;
##### Query Pattern The most frequent query pattern that I foresee, will always seek the ts, pickup_zip and dest_zip columns for a specific hour of a specific day. That means, we will want all the rows (and above columns) where ts is between 19th June 2024, 10 am to 10:59:59 am. ##### Questions * How to modify the table creation command, especially ts to make this query as efficient as possible? Any kind of clustering or indexing on this row will help? I can trade some insertion latency to make this query efficient. * About the implications of turning off the identity insert, can I insert the rows from a polars dataframe (using SQLAlchemy) where the original dataframe does _not_ have the id column? Does it mean the database will create the corresponding numbers? ##### Backend Technology If important, my company is using an Oracle ADB for this purpose. Mentioning this as I believe different backends have different functionalities.
Asked by Della (73 rep)
Jun 17, 2024, 07:52 AM
Last activity: Jun 18, 2024, 08:07 AM