Fastest way to split/store a long string for charindex function
8
votes
2
answers
1705
views
I have a 1 TB string of digits. Given a 12-character sequence of digits I want to get the start-position of this sequence in the original string (
charindex
function).
I have tested this with a 1GB string and a 9-digit-substring using SQL Server, storing the string as a varchar(max)
. Charindex
takes 10 secs. Breaking up the 1GB string in 900 byte overlapping chunks and creating a table (StartPositionOfChunk, Chunkofstring) with chunkofstring in binary collation, indexed takes under 1 sec. Latter method for 10GB,10 digit-substring rises charindex to 1,5 min. I would like to find a faster storage method.
### Example
string of digits: 0123456789 - substring to search 345
charindex('345','0123456789') gives 4
*Method 1*: I can now store this in a SQL Server table strtable consisting of one column colstr
and perform:
select charindex('345',colstr) from strtable
*Method 2*: or I can make up a table **strtable2 (pos,colstr1)** by splitting up the original string: **1;012 | 2;123 | 3;234 a.s.o** and then we can have the query
select pos from strtable2 where colstr1='345'
*Method 3*: I can make up a table **strtable2 (pos2,colstr2)** by splitting up the original string into **larger chunks 1;01234 | 4;34567 | 7;6789** and then
select pos2+charindex('345',colstr2) from strtable2 where colstr2 like '%345%'
First method is the slowest.
Second method blows up the database storage size!
*Method 3*: Setting colstr2 length to 900 bytes in binary collation, creating an index on this column takes 1 sec for 1GB string and 9 digit substring search. For 10GB string and 10 digit substring ist takes 90 secs.
**Any other idea how to make this faster (maybe by utilizing the string consists of Digits, with Long integers,....)?**
Search is always for a 12 digit substring in a 1TB string of digits SQL Server 2017 Developer Edition, 16 cores, 16GB RAM. Primary goal is search speed! 10 digits in a 10GB string (for performance testing).
Asked by Werner Aumayr
(181 rep)
Feb 15, 2019, 07:34 PM
Last activity: Feb 19, 2019, 01:35 AM
Last activity: Feb 19, 2019, 01:35 AM