Performance gap between WHERE IN (1,2,3,4) vs IN (select * from STRING_SPLIT('1,2,3,4',','))

3 votes

3 answers

2132 views

sql-server performance execution-plan string-splitting query-performance

I seem to be having a huge performance gap between using hard coded values for a SELECT IN, vs a STRING_SPLIT. The query plans are identical except for the last phase where the index seek is getting executed multiple times for the STRING_SPLIT code. The outcome is a CPU time of about 90000 vs about 15000 (according to dm_exec_query_stats), so the difference is enormous. I have posted the 2 plans here... 1. Hardcoded Plan 2. String Split Plan The interesting thing is the query plan shows the cost as being pretty much the same, but in when I check dm_exec_query_stats the cost (last_worker_time) is massively different. Here are 2 outputs from the query plan... 0x79DEAD79D1F149CD 16199 select * from fn_get_samples(1) s where s.sample_id in (2495,2496,2497,2498,2499,2500,2501,2502,2503,2504) 0x4A073840486B252C 86689 select * from fn_get_samples(1) s where s.sample_id in (select value as id from STRING_SPLIT('2495,2496,2497,2498,2499,2500,2501,2502,2503,2504',',')) The function code is...

CREATE FUNCTION [dbo].[fn_get_samples]
(
	@user_id int
)
RETURNS TABLE
AS
RETURN (
	-- get samples
	select s.sample_id,language_id,native_language_id,s.source_sentence,s.markup_sentence,s.latin_sentence,
		s.translation_source_sentence,s.translation_markup_sentence,s.translation_latin_sentence,
		isnull(sample_vkl.knowledge_level_id,1) as vocab_knowledge_level_id,
		isnull(sample_gkl.grammar_knowledge_level_id,0) as grammar_knowledge_level_id,
		s.polite_level_id,
		case when isnull(tr1.leitner_deck_index,0)=0 then 0 else cast((tr1.leitner_deck_index-1)  as float)/cast((max_leitner_deck_index-1) as float) end as progress_percentage,
		case when isnull(tr2.leitner_deck_index,0)=0 then 0 else cast((tr2.leitner_deck_index-1)  as float)/cast((max_leitner_deck_index-1) as float) end as listening_progress_percentage,
		case when f.object_id is null then 0 else 1 end as is_favorite,
		case when st.object_id is null then 0 else 1 end as is_studied,
		s.has_error,
		s.is_deleted,
		f.create_datetime as favorite_datetime,
		st.create_datetime as studied_datetime,
		s.create_user_id,
		s.create_datetime,
		isnull(s.modify_user_id,s.create_user_id) as modify_user_id,
		isnull(s.modify_datetime,s.create_datetime) as modify_datetime,
		s.display_order
		from samples s
		left outer join sample_knowledge_level_votes klv on klv.sample_id=s.sample_id and klv.user_id=@user_id
		left outer join favorites f on f.user_id=@user_id and f.object_type_id=(select object_type_id from object_types ot where ot.object_type_name='Pattern Sample') and f.object_id=s.sample_id
		left outer join studied st on st.user_id=@user_id and st.object_type_id=(select object_type_id from object_types ot where ot.object_type_name='Pattern Sample') and st.object_id=s.sample_id
		left outer join leitner_tracking tr1 on tr1.user_id=@user_id and tr1.object_type_id=(select object_type_id from object_types ot where ot.object_type_name='Pattern Sample') and tr1.object_id=s.sample_id and tr1.skill_type_id=(select skill_type_id from skill_types where skill_type_name=N'Guess Pronunciation from Meaning')
		left outer join leitner_tracking tr2 on tr2.user_id=@user_id and tr2.object_type_id=(select object_type_id from object_types ot where ot.object_type_name='Pattern Sample') and tr2.object_id=s.sample_id and tr2.skill_type_id=(select skill_type_id from skill_types where skill_type_name=N'Guess Meaning from Pronunciation')
		cross join (select max(leitner_deck_index) as max_leitner_deck_index from leitner_decks) dm
		left outer join vw_sample_user_grammar_kl sample_gkl on sample_gkl.user_id=@user_id and sample_gkl.sample_id=s.sample_id
		left outer join vw_sample_avg_kl sample_vkl on sample_vkl.sample_id=s.sample_id
		where is_deleted=0
)

It appears to be something to do with the 'vw_sample_avg_kl' join. If I comment out that join and the calculated column for 'vocab_knowledge_level_id' then the two query timings become very similar. Soon as I add it back it they differ greatly. Here is the code for that view...

CREATE VIEW [dbo].[vw_sample_avg_kl]
	AS 
	select sample_id,knowledge_level_id from (
		select sample_id,knowledge_level_id,count(*) as frequency,RANK() over (partition by sample_id order by count(*) desc,knowledge_level_id) as myrank
		from sample_knowledge_level_votes
		group by sample_id,knowledge_level_id
	) sample_kl_ranking
	where myrank=1

The id field is INT. My two queries when shown in dm_exec_query_stats look like this...

0x4A073840486B252C	41096	select * from fn_get_samples(@user_id) s    where s.sample_id in (select * from STRING_SPLIT(@sample_id_list,','))

0x79DEAD79D1F149CD	7849	select * from fn_get_samples(1) s where s.sample_id in (2495,2496,2497,2498,2499,2500,2501,2502,2503,2504)

(slightly different example data set from above hence the slightly different timings, but you can see the huge gap in performance)

Asked by raeldor (143 rep)

Mar 13, 2017, 12:36 AM
Last activity: Apr 24, 2020, 03:58 PM

Performance gap between WHERE IN (1,2,3,4) vs IN (select * from STRING_SPLIT('1,2,3,4',','))

Related Questions