NUMAnodes and SQL Server performance
-2
votes
1
answer
942
views
Pretty sure I understand this, but wanted to be sure I do.
We have a SQL Server 2016, that is running with 2 NumaNodes, each with 8 vCPUs. The Max Degrees of Parallelism (MAXDOP) is set to 8.
This doesn't sound right to me. First question: Is that as bad an idea as I think it is?
From my research, I need to tell them to reduce the VM Settings to make this run in a single NUMANode. We seem to be having random periods where queries that were running in 170 ms, are now timing out at 30+ seconds! So, we do a quick look, and it is 5% CPU use, and low disk I/O use, and reasonable network use... Basically, the machine is about idle.. We also looked for queries waiting on locks, and there were none. We are running the queries on the Secondary in a AG Group (a Read Only Query)
So, my guess: It has gotten enough load that it switched and ran one the view in question (gets run about 4,000 times per day) on one of the vCPUs in the 2nd NUMANode, and then decided that execution plan should be always run on that node. The result is that all the data it is accessing is cached in the other node's memory, and it needs to fetch it across the inter-node link (remote memory), so it does that, but that ends up being a lot slower (170 times?), and the queries are now all stacking up running more and more of them across this remote link... until it will always timeout because the remote memory is saturated...
Is that analysis valid? I would hate to submit this as a solution to why the query suddenly jumps up in time if this is totally not correct. And it will be hard to convince them they will get better performance with 8 CPUs than with 16.
Oh, additional evidence to back up my claim: If I
select * into #tmp from myView OPTION (MAXDOP 16)
then I get a performance change of about -5% to -12% - meaning that it takes longer to run the query than if I just used the 8 vCPUs. However, not massively so.
So my question is: Is there any validity to my analysis?
Update: A couple other things, I got a lot of the information from: https://codenotary-compliance.medium.com/vmware-vsphere-why-checking-numa-configuration-is-so-important-9764c16a7e73
Secondly, If I do a select * from sys.[dm_os_nodes]
Then I get foreign_commited_KB as 5,414,260 or 5 GB, from the information above, that would sound like it is committed out of the other node which (is bad?)
Asked by Traderhut Games
(173 rep)
Jul 13, 2022, 11:08 PM
Last activity: Jul 14, 2022, 03:14 PM
Last activity: Jul 14, 2022, 03:14 PM