Sample Header Ad - 728x90

Why does SQL Server Full Text Search (FTS) rank go down when match count goes up due to highly rare word

5 votes
1 answer
111 views
Given this query: declare @aboutPredicateOpt nvarchar(4000) = N'IsAbout( PICCO weight(0.1), IC228 weight(0.1) )'; select RowId, BrandId, ObjectId, IsActive, OrderNumber, SearchableDescription, Rank from QuickSearchProducts_Temp qsp join Containstable(QuickSearchProducts_Temp, SearchableDescription, @aboutPredicateOpt) as ct on ct.[key] = qsp.RowId where qsp.IsActive = 1 order by iif(OrderNumber in ( '6403072'), 0, 1), ct.rank desc A number of rows are returned that contain both terms. Note the row of interest has artificially been moved to top (using "order by" even though its rank is same). Two non-rare terms Now add a very rare term to the aboutPredicateOpt: declare @aboutPredicateOpt nvarchar(4000) = N'IsAbout( 15301530 weight(0.1), PICCO weight(0.1), IC228 weight(0.1) )'; And the rank for the row of interest drops dramatically even though it is the only row (of 50k) that contains that (rare) string. Add Rare Term And for just the rare word: declare @aboutPredicateOpt nvarchar(4000) = N'IsAbout( 15301530 weight(0.1) )'; Only that row is returned which is great. Only the rare Chats with AI said the FTS formula likely involves overall rarity across all rows and concluded that high rarity should increase the rank when a match is found. Seems to be the opposite though. Note that each row's SearchableDescription contains a distinct set of terms so that the frequency per row of any given term matches should always be one (ignoring wildcards searches for now). Now a few ways to compensate for this are being considered: - Increase weights for "longer" or more complex terms or rarity (from histogram). - Calculate a ranking that counts matches using like and combine it into the main ranking. But mainly I want to understand why the Rank **drops by 50%** from the top tier it was formally in. (Sql Server 2022) Table Definition with FTS index: CREATE TABLE [QuickSearchProducts_Temp] ( RowId int IDENTITY(1, 1) NOT NULL, ObjectId bigint NOT NULL, BrandId smallint NOT NULL DEFAULT ((0)), OrderNumber nvarchar(200) NOT NULL, IsActive bit NOT NULL, SearchableDescription nvarchar(1000) NULL ); ALTER TABLE QuickSearchProducts_Temp add constraint PK_QSP primary key (RowId); CREATE FULLTEXT CATALOG QuickSearchFullTextCatalog AS DEFAULT; CREATE FULLTEXT INDEX ON QuickSearchProducts_Temp ( SearchableDescription ) KEY INDEX PK_QSP ON QuickSearchFullTextCatalog WITH CHANGE_TRACKING AUTO; set identity_insert QuickSearchProducts_Temp ON; -- The following inserts only 10k rows of the original 50k -- but still illustrates the point with similar results. -- insert into QuickSearchProducts_Temp (RowId, ObjectId, BrandId, OrderNumber, IsActive, SearchableDescription) select Number as RowId, Number+10000 as ObjectId, Number % 5 as BrandId, iif(Number = 0, '6403072', convert(varchar(30), 10000-Number)) as OrderNumber, 1 as IsActive, SearchableDescription = case when Number = 0 then '15301530 Picco Ic228 bing bang boom' when Number <= 410 then 'Picco Ic228 bing bang boom' when Number <= 959 then 'Picco bing bang boom' else 'lotta other rows ' + convert(varchar(10), Number) end -- for this use numbers table or adjust https://www.sqlservercentral.com/steps/bones-of-sql-the-calendar-table from Common.NumberSequence_0_10000_View where Number < 10000
Asked by crokusek (2162 rep)
Aug 20, 2025, 06:45 AM
Last activity: Aug 21, 2025, 08:37 PM