Sample Header Ad - 728x90

Time based incremental load query optimization and index design

1 vote
1 answer
233 views
**Problem** I try to optimize an extract/load query on a highly normalized table structure. Extract/load is implementing incremental loading using a datetime column to determine only records which have changed since the last extract/load. The extract/load query after joins selects hundreds of millions of records. Each individual table joined has a date column "last_modified" which indicates when a record was modified last. To determine the combined last modified date of an joined record I select the greatest value of "last_modified" of each individual table. This solution seams not to scale as with hundred of millions of records the filter predicate on the combined "last_modified" does not use any index and requires full table access on all individual tables and takes too long to complete (1h+) just to filter after the combined "last_modified" information. **Background** - Oracle database (multiple versions starting from 11g) - I'm not a dba or have much Oracle know how - Extract/load process (especially delta loading) is not time critical. If I can guarantee that data is up to date/extracted every 24h its fine. **Sample** The following is a simplified sample setup. Assume there is a index on reference_key.
CREATE TABLE shipment (
    key VARCHAR2(10) NOT NULL,
    last_modified DATE,  -- represents when a record in shipment was last modified
    CONSTRAINT shipment_pk PRIMARY KEY (key)
);

CREATE TABLE invoice (
    key VARCHAR2(10) NOT NULL,
    reference_key VARCHAR2(10),
    last_modified DATE, -- represents when a record in invoice was last modified
    CONSTRAINT invoice_pk PRIMARY KEY (key), 
    CONSTRAINT invoice_fk_shipment FOREIGN KEY (reference_key) REFERENCES shipment(key)
);

CREATE TABLE line (
    key VARCHAR2(10) NOT NULL,
    reference_key VARCHAR2(10),
    last_modified DATE, -- represents when a record in line was last modified   
    CONSTRAINT line_pk PRIMARY KEY (key),
    CONSTRAINT line_fk_invoice FOREIGN KEY (reference_key) REFERENCES invoice(efksid)
);

--To get flat de-normalized records a join like the following is needed

SELECT
    shipment.key || '-' || line.key || '-' || invoice.key AS key,
    GREATEST(line.last_modified, invoice.last_modified, shipment.last_modified) AS last_modified
FROM
    line
    JOIN invoice ON line.reference_key = invoice.key
    JOIN shipment ON invoice.reference_key = shipment.key;
To extract/load data since the next 100000 records from a given date I could run the following.
SELECT *
FROM (
    SELECT
        shipment.key || '-' || line.key || '-' || invoice.key AS key,
        GREATEST(line.last_modified, invoice.last_modified, shipment.last_modified) AS last_modified
    FROM
        line
        JOIN invoice ON line.reference_key = invoice.key
        JOIN shipment ON invoice.reference_key = shipment.key)
WHERE last_modified > '${target_last_modified}'
ORDER BY last_modified;
FETCH NEXT 100000 ROWS ONLY;
The problem this query does not finish/feasible with hundred of millions of records (can even go up to 1-2 billion records). **What I have tried** - Indices on the individual last_modified columns seams not to help as it is not considered? when using greatest and still does a full table access scan. - A functional/and computed column(with an index) seams also not an option as it would need to incorporate multiple tables which is not possible? in Oracle. - Materialized views to preselect the combined last_modified column and create an index on it. If possible I would like to avoid those. I think I think I could not use a refresh-on-commit materialized view due to the high transactions volume on used tables. **Questions** - In this scenario are there any tricks using indices which could speed up my query using the last_modified field. - If materialized views are my only options what kind of refresh strategies would be feasible. A full refresh would (very likely) take too long. Are there any incremental refresh strategies - if yes would I not just move my extract/load problem from the query to the materialized view? Thanks for any input/hint given!
Asked by Clemens K (11 rep)
Mar 8, 2023, 02:46 PM
Last activity: Jun 4, 2025, 02:01 PM