Sample Header Ad - 728x90

Fastest way to split/store a long string for charindex function

8 votes
2 answers
1705 views
I have a 1 TB string of digits. Given a 12-character sequence of digits I want to get the start-position of this sequence in the original string (charindex function). I have tested this with a 1GB string and a 9-digit-substring using SQL Server, storing the string as a varchar(max). Charindex takes 10 secs. Breaking up the 1GB string in 900 byte overlapping chunks and creating a table (StartPositionOfChunk, Chunkofstring) with chunkofstring in binary collation, indexed takes under 1 sec. Latter method for 10GB,10 digit-substring rises charindex to 1,5 min. I would like to find a faster storage method. ### Example string of digits: 0123456789 - substring to search 345 charindex('345','0123456789') gives 4 *Method 1*: I can now store this in a SQL Server table strtable consisting of one column colstr and perform:
select charindex('345',colstr) from strtable
*Method 2*: or I can make up a table **strtable2 (pos,colstr1)** by splitting up the original string: **1;012 | 2;123 | 3;234 a.s.o** and then we can have the query
select pos from strtable2 where colstr1='345'
*Method 3*: I can make up a table **strtable2 (pos2,colstr2)** by splitting up the original string into **larger chunks 1;01234 | 4;34567 | 7;6789** and then
select pos2+charindex('345',colstr2) from strtable2 where colstr2 like '%345%'
First method is the slowest. Second method blows up the database storage size! *Method 3*: Setting colstr2 length to 900 bytes in binary collation, creating an index on this column takes 1 sec for 1GB string and 9 digit substring search. For 10GB string and 10 digit substring ist takes 90 secs. **Any other idea how to make this faster (maybe by utilizing the string consists of Digits, with Long integers,....)?** Search is always for a 12 digit substring in a 1TB string of digits SQL Server 2017 Developer Edition, 16 cores, 16GB RAM. Primary goal is search speed! 10 digits in a 10GB string (for performance testing).
Asked by Werner Aumayr (181 rep)
Feb 15, 2019, 07:34 PM
Last activity: Feb 19, 2019, 01:35 AM