How to Make Queries on a DATETIME Column Efficient If My Primary Query Pattern is an Hour?
0
votes
1
answer
83
views
##### Context
Here is the DDL that I am intending to use to define the table for a logistics/delivery company.
CREATE TABLE scraping_details (
id INT IDENTITY(1,1) PRIMARY KEY, -- Identity insert and autoincrement
unique_id VARCHAR2(64) NOT NULL,
ts DATETIME NOT NULL, -- Timezone naive
pickup_zip VARCHAR2(6) NOT NULL,
pickup_long NUMBER NOT NULL,
pickup_lat NUMBER NOT NULL,
dest_zip VARCHAR2(6) NOT NULL,
dest_long NUMBER NOT NULL,
dest_lat NUMBER NOT NULL,
UNIQUE (unique_id)
);
SET IDENTITY_INSERT scraping_details OFF;
##### Query Pattern
The most frequent query pattern that I foresee, will always seek the ts
, pickup_zip
and dest_zip
columns for a specific hour of a specific day. That means, we will want all the rows (and above columns) where ts
is between 19th June 2024, 10 am to 10:59:59 am.
##### Questions
* How to modify the table creation command, especially ts
to make this query as efficient as possible? Any kind of clustering or indexing on this row will help? I can trade some insertion latency to make this query efficient.
* About the implications of turning off the identity insert, can I insert the rows from a polars dataframe (using SQLAlchemy) where the original dataframe does _not_ have the id column? Does it mean the database will create the corresponding numbers?
##### Backend Technology
If important, my company is using an Oracle ADB for this purpose. Mentioning this as I believe different backends have different functionalities.
Asked by Della
(73 rep)
Jun 17, 2024, 07:52 AM
Last activity: Jun 18, 2024, 08:07 AM
Last activity: Jun 18, 2024, 08:07 AM