Sample Header Ad - 728x90

How do I get wildcard partition pruning in mssql 2022 , polybase and a s3-storage to work

1 vote
1 answer
103 views
Executing an OPENROWSET query with wildcards in the bulk path and the arguments in the WHERE clause seems not to work when using PolyBase to connect to S3-compatible storage. I have an on-premises SQL Server 2022 (16.0.4120.1) with PolyBase enabled. The external source is an S3-compatible object storage (Minio) running on a Linux Red Hat server. My version of the query: The S3 storage bucket is partitioned in source/year/month/date/file.txt. For the query below, the path is: testbucket/files_txt/year=2024/month=05/date=06/*.txt.
SELECT 
    r.filepath() AS filepath,
    r.filepath(1) AS [year],
    r.filepath(2) AS [month],
    COUNT_BIG(*) AS [rows]
FROM OPENROWSET( 
        BULK '/testbucket/files_txt/year=*/month=*/date=*/*.txt',
        DATA_SOURCE = 's3_ds', 
        FORMAT = 'CSV' ,
        FIRSTROW = 2,
        FIELDTERMINATOR = ';',
        ROWTERMINATOR = '\n'
    ) 
WITH ( 
    col1 varchar(800) 
) AS [r] 
WHERE 
    r.filepath(1) COLLATE Latin1_General_BIN = '2024'
    AND r.filepath(2) COLLATE Latin1_General_BIN = '05'
    AND r.filepath(3) COLLATE Latin1_General_BIN = '06'
GROUP BY 
    r.filepath(),
    r.filepath(1),
    r.filepath(2)
ORDER BY filepath;
When I execute the query as above, it takes several minutes to get back a result. However, when I run the query and replace the wildcards with the year, month, and date figures I have in the WHERE clause to the bulk string, it brings back the result in seconds. SQL Server gave the message that adding collation would enable pushdown, but I’m not sure if that is possible in this case. However, adding the collation did not seem to have any effect. Is this due to the fact that the polybase REST API does not mange to add the wildcard values to the query before it scans the storage partion scheme? I know that I can create a dynamic SQL to fix this. However, I thought this would work better since there are several examples in the documentation. Or have I missed something?
Asked by Stomik (11 rep)
Aug 15, 2024, 12:22 PM
Last activity: Aug 17, 2024, 04:49 AM