Sample Header Ad - 728x90

Database Administrators

Q&A for database professionals who wish to improve their database skills

Latest Questions

0 votes
0 answers
627 views
SQL Server and drive formatting (block size)
We are having major performance and latency issues with SQL Server 2019. We are running on a "quiet" **VMware** instance connected to a **vSAN**. According to MSFT documentation, best performance is obtained if the drives are formatted at 64k block size vs 4k. Doing some research, VMware publishes t...
We are having major performance and latency issues with SQL Server 2019. We are running on a "quiet" **VMware** instance connected to a **vSAN**. According to MSFT documentation, best performance is obtained if the drives are formatted at 64k block size vs 4k. Doing some research, VMware publishes that you should NOT use 64k but 4k block size if on a vSAN. So which one is it and what is the science behind this?
Gary P. (1 rep)
Feb 5, 2023, 06:45 PM • Last activity: Feb 6, 2023, 09:43 AM
-2 votes
1 answers
1163 views
In a SAN based SQL server, where does the 64K allocation block size formatting need to occur?
I understand sql data and log file drive must be formatted 64K allocation block size. In a SAN environment, there are physical disks which are part of the SAN storage pool; and from the storage pool we create LUN; and then assign the LUN to the Hyper-V and we create disks out of the LUN. Then a disk...
I understand sql data and log file drive must be formatted 64K allocation block size. In a SAN environment, there are physical disks which are part of the SAN storage pool; and from the storage pool we create LUN; and then assign the LUN to the Hyper-V and we create disks out of the LUN. Then a disk is assigned to the SQL server VM. Volumes (drives) are then created from this disk for the data and log file. Suppose I format the drives to be of 64K allocation block size, then does that really help? Or do I need to format the physical disks in the SAN?
variable (3590 rep)
Jul 18, 2022, 07:49 AM • Last activity: Jul 20, 2022, 07:23 AM
31 votes
2 answers
13746 views
I/O requests taking longer than 15 seconds
Usually our weekly full backups finish in about 35 minutes, with daily diff backups finishing in ~5 minutes. Since tuesday the dailies have taken almost 4 hours to complete, way more than should be required. Coincidentally, this started happening right after we got a new SAN/disk config. Note that t...
Usually our weekly full backups finish in about 35 minutes, with daily diff backups finishing in ~5 minutes. Since tuesday the dailies have taken almost 4 hours to complete, way more than should be required. Coincidentally, this started happening right after we got a new SAN/disk config. Note that the server is running in production and we have no overall issues, it's running smoothly - except for the IO issue that's primarily manifested itself in the backup performance. Looking at dm_exec_requests during the backup, the backup is constantly waiting on ASYNC_IO_COMPLETION. Aha, so we have disk contention! However, neither the MDF (logs are stored on local disk) nor backup drive have any activity (IOPS ~= 0 - we have plenty of memory). Disk queue length ~= 0 as well. CPU hovers around 2-3%, no issue there either. The SAN is a Dell MD3220i, the LUN consisting of 6x10k SAS drives. The server is connected to the SAN through two physical paths, each going through a separate switch with redundant connections to the SAN - a total of four paths, two of them being active at any time. I can verify that both connections are active through task manager - splitting the load perfectly evenly. Both connections are running 1G full duplex. We used to use jumbo frames, but I've disabled them to rule out any issues here - no change. We have another server (same OS+config, 2008 R2) that is connected to other LUNs, and it shows no issues. It is however not running SQL Server, but just sharing CIFS on top of them. However, one of its LUNs preferred path is on the same SAN controller as the troublesome LUNs - so I've ruled that out as well. Running a couple of SQLIO tests (10G test file) seems to indicate that IO is decent, despite the issues: sqlio -kR -t8 -o8 -s30 -frandom -b8 -BN -LS -Fparam.txt IOs/sec: 3582.20 MBs/sec: 27.98 Min_Latency(ms): 0 Avg_Latency(ms): 3 Max_Latency(ms): 98 histogram: ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+ %: 45 9 5 4 4 4 4 4 4 3 2 2 1 1 1 1 1 1 1 0 0 0 0 0 2 sqlio -kW -t8 -o8 -s30 -frandom -b8 -BN -LS -Fparam.txt IOs/sec: 4742.16 MBs/sec: 37.04 Min_Latency(ms): 0 Avg_Latency(ms): 2 Max_Latency(ms): 880 histogram: ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+ %: 46 33 2 2 2 2 2 2 2 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 sqlio -kR -t8 -o8 -s30 -fsequential -b64 -BN -LS -Fparam.txt IOs/sec: 1824.60 MBs/sec: 114.03 Min_Latency(ms): 0 Avg_Latency(ms): 8 Max_Latency(ms): 421 histogram: ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+ %: 1 3 14 4 14 43 4 2 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 6 sqlio -kW -t8 -o8 -s30 -fsequential -b64 -BN -LS -Fparam.txt IOs/sec: 3238.88 MBs/sec: 202.43 Min_Latency(ms): 1 Avg_Latency(ms): 4 Max_Latency(ms): 62 histogram: ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+ %: 0 0 0 9 51 31 6 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 I realize that these aren't exhaustive tests in any way, but they do make me comfortable in knowing that it isn't complete rubbish. Note that the higher write performance is caused by the two active MPIO paths, whereas reading will only use one of them. Checking the Application event log reveals events like these scattered around: SQL Server has encountered 2 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [J:\XXX.mdf] in database [XXX] (150). The OS file handle is 0x0000000000003294. The offset of the latest long I/O is: 0x00000033da0000 They're not constant, but they do happen regularly (a couple per hour, more during backups). Alongside that event, the System event log will post these: Initiator sent a task management command to reset the target. The target name is given in the dump data. Target did not respond in time for a SCSI request. The CDB is given in the dump data. These also occur on the non-problematic CIFS server running on the same SAN/Controller, and from my Googling they seem to be non-critical. Note that all servers use the same NICs - Broadcom 5709Cs with up-to-date drivers. The servers themselves are Dell R610's. I'm not sure what to check for next. Any suggestions? **Update - Running perfmon** I tried recording the Avg. Disk sec/Read & Write perf counters while performing a backup. The backup starts out blazingly, and then basically stops dead at 50%, crawling slowly towards 100%, but taking 20x the time it should've. Task monitor during start of backup Shows both SAN paths being utilized, then dropping off. Perform during same Backup initiated around 15:38:50 - notice all looking good, and then there's a series of peaks. I'm not concerned with the writes, only reads seem to hang. Task monitor during end of backup Note very little action on/off, though blazing performance at the very end. Perfmon during same Note a 12sec maximum, though average is overall good. **Update - Backing up to NUL device** To isolate read issues and simplify things, I ran the following: BACKUP DATABASE XXX TO DISK = 'NUL' The results were exactly the same - starts out with a burst read and then stalls, resuming operations now and then: Results **Update - IO stalls** I ran the dm_io_virtual_file_stats query from Jonathan Kehayias and Ted Kruegers book (page 29), as recommended by Shawn. Looking at the top 25 files (one data file each - all results being data files), it would seem like reads are worse than writes - perhaps because writes go directly to the SAN cache whereas cold reads needs to hit disk - just a guess though. IO Stalls **Update - Wait stats** I did three tests to gather some wait stats. Wait stats are queried using Glenn Berry / Paul Randals script . And just to confirm - the backups are not being done to tape, but to an iSCSI LUN. Results are similar if done to local disk, with results similar to the NUL backup. Cleared stats. Ran for 10 minutes, normal load: No backup Cleared stats. Ran for 10 minutes, normal load + normal backup running (didn't complete): Normal backup Cleared stats. Ran for 10 minutes, normal load + NUL backup running (didn't complete): NUL backup **Update - Wtf, Broadcom?** Based on Mark Storey-Smiths suggestions and Kyle Brandts previous experiences with Broadcom NICs, I decided to do some experimentation. As we've got multiple active paths, I could relatively easily change the configuration of the NICs one by one without causing any outages. Disabling TOE and Large Send Offload yielded a near perfect run: enter image description here Processed 1064672 pages for database 'XXX', file 'XXX' on file 1. Processed 21 pages for database 'XXX', file 'XXX' on file 1. BACKUP DATABASE successfully processed 1064693 pages in 58.533 seconds (142.106 MB/sec). So which is the culprit, TOE or LSO? TOE enabled, LSO disabled: enter image description here Didn't finish the backup as it took forever - just as the original problem! TOE disabled, LSO enabled - looking good: enter image description here Processed 1064680 pages for database 'XXX', file 'XXX' on file 1. Processed 29 pages for database 'XXX', file 'XXX' on file 1. BACKUP DATABASE successfully processed 1064709 pages in 59.073 seconds (140.809 MB/sec). And as a control, I disabled both TOE and LSO to confirm the issue was gone: enter image description here Processed 1064720 pages for database 'XXX', file 'XXX' on file 1. Processed 13 pages for database 'XXX', file 'XXX' on file 1. BACKUP DATABASE successfully processed 1064733 pages in 60.675 seconds (137.094 MB/sec). In conclusion it seems the enabled Broadcom NICs TCP Offload Engine caused the problems. As soon as TOE was disabled, everything worked like a charm. Guess I won't be ordering any more Broadcom NICs going forward. **Update - Down goes the CIFS server** Today the identical and functioning CIFS server started exhibiting IO requests hanging. This server wasn't running SQL Server, just plain Windows Web Server 2008 R2 serving shares over CIFS. As soon as I disabled TOE on it as well, everything was back to running smooth. Just confirms I won't ever be using TOE on Broadcom NICs again, if I can't avoid the Broadcom NICs at all, that is.
Mark S. Rasmussen (1455 rep)
Jan 19, 2012, 02:17 PM • Last activity: Jul 8, 2022, 08:01 AM
0 votes
1 answers
113 views
Db file and transaction log segregation in SAN
I understand that if we are using HDD then it will be advisable to store the data file and transaction log on different drives (which point to different disks). However, if the server is using SAN, then whether it's still advisable to put them in different drives? I have checked with our infra team,...
I understand that if we are using HDD then it will be advisable to store the data file and transaction log on different drives (which point to different disks). However, if the server is using SAN, then whether it's still advisable to put them in different drives? I have checked with our infra team, our SAN is using flash cards (there are 16 drives in a storage pool where we create logical volumes which is distributed to connected servers).
Lee Weng Ching (1 rep)
Jun 3, 2022, 09:48 AM • Last activity: Jun 3, 2022, 11:59 AM
-1 votes
2 answers
58 views
How is SQL Sever able to improve the performance using partitioned tables in a SAN storage environment?
Link: https://learn.microsoft.com/en-us/sql/relational-databases/partitions/partitioned-tables-and-indexes?view=sql-server-ver15#:~:text=SQL%20Server%20supports%20table%20and,are%20mapped%20into%20individual%20partitions. > SQL Server supports table and index partitioning. The data of > partitioned...
Link: https://learn.microsoft.com/en-us/sql/relational-databases/partitions/partitioned-tables-and-indexes?view=sql-server-ver15#:~:text=SQL%20Server%20supports%20table%20and,are%20mapped%20into%20individual%20partitions . > SQL Server supports table and index partitioning. The data of > partitioned tables and indexes is divided into units that may > optionally be spread across more than one filegroup in a database. The > data is partitioned horizontally, so that groups of rows are mapped > into individual partitions. The benefit of partitions comes from the fact that each partition can be placed on a separate disk, there by allowing for parallel disk access. However in a SAN storage environment is there any benefit of partitioning because all files are scattered amongst SAN disks.
variable (3590 rep)
Mar 25, 2022, 01:27 PM • Last activity: Mar 25, 2022, 02:54 PM
1 votes
0 answers
67 views
Is the recommendation to have different files on separate disks valid for SQL server in a SAN (RAID) network?
Link: https://learn.microsoft.com/en-us/sql/relational-databases/databases/database-files-and-filegroups?view=sql-server-ver15#Recommendations > To maximize performance, create files or filegroups on different > available disks as possible. Put objects that compete heavily for > space in different f...
Link: https://learn.microsoft.com/en-us/sql/relational-databases/databases/database-files-and-filegroups?view=sql-server-ver15#Recommendations > To maximize performance, create files or filegroups on different > available disks as possible. Put objects that compete heavily for > space in different filegroups. > Use filegroups to enable placement of objects on specific physical > disks. > Put different tables used in the same join queries in different > filegroups. This step will improve performance, because of parallel > disk I/O searching for joined data. > Put heavily accessed tables and the nonclustered indexes that belong > to those tables on different filegroups. Using different filegroups > will improve performance, because of parallel I/O if the files are > located on different physical disks. > Don't put the transaction log file(s) on the same physical disk that > has the other files and filegroups. In a SAN network, is there any benefit to creating files or filegroups on different disks? or any of the above quotes? I am asking because the SAN network has multiple disks and a single file is scattered amongst multiple disks.
variable (3590 rep)
Mar 25, 2022, 01:13 PM
0 votes
1 answers
40 views
Whether there is any benefit of separating data/log files, setting fill factor, index rebuild/reorganize in a SAN storage?
Let's assume that my storage is a SAN. This means it contains multiple disks. Even though we have logical drives configured, essentially everything - including the OS, SQL Server, and applications - all run on the same SAN, which means the data on the disk is highly fragmented. So I want to ask: Is...
Let's assume that my storage is a SAN. This means it contains multiple disks. Even though we have logical drives configured, essentially everything - including the OS, SQL Server, and applications - all run on the same SAN, which means the data on the disk is highly fragmented. So I want to ask: Is there any benefit when applying any of the following when the storage is SAN based? 1. Separating data/log files 2. Setting fill factor 3. Index rebuild/reorganize
variable (3590 rep)
Mar 8, 2022, 08:21 AM • Last activity: Mar 10, 2022, 02:21 PM
2 votes
2 answers
2249 views
Investigating/Troubleshooting I/O Spikes
We have been experiencing some relatively large I/O (IOPS) spikes on the SAN that supports our production SQL Server. The spikes seem to fall at an exact time after the hour, each hour. We have investigated every known scheduled task source (SQL Agent, backups, scheduled SSRS Reports, etc.) and can'...
We have been experiencing some relatively large I/O (IOPS) spikes on the SAN that supports our production SQL Server. The spikes seem to fall at an exact time after the hour, each hour. We have investigated every known scheduled task source (SQL Agent, backups, scheduled SSRS Reports, etc.) and can't find any rhyme or reason to it so far. We have use Activity Monitor as well, but it has not yielded any answers to date. How would one go about definitively finding the source of I/O spikes? Are there monitoring tools (commercial or otherwise) that would help pinpoint the problem?
Phil Sandler (465 rep)
Sep 25, 2012, 05:44 PM • Last activity: Nov 30, 2021, 06:46 AM
1 votes
1 answers
262 views
Performance of temp db files on SAN
It is generally recommended to put the tempdb database on disks that differ from those that are used by user databases. I have 4 databases on my sql server, ranging from 5GB to 300GB in size. `D:` drive is used for user dbs `E:` drive for log files `F:` drive for tempdb Note: the drives are logical....
It is generally recommended to put the tempdb database on disks that differ from those that are used by user databases. I have 4 databases on my sql server, ranging from 5GB to 300GB in size. D: drive is used for user dbs E: drive for log files F: drive for tempdb Note: the drives are logical. They are not physically separate drives because they belong to SAN. Effectively this means that - based on SAN/RAID level, the data will be physically stored/spread across multiple disks. Is this a concern and going against the above recommendation?
variable (3590 rep)
Oct 31, 2021, 01:46 AM • Last activity: Nov 1, 2021, 02:57 PM
0 votes
2 answers
196 views
MS SQL: SSD Trim after TRUNCATE / DROP?
I consider to use one of those fancy new SSD-only SAN systems for our new Microsoft SQL Server. "Fancy" means, that it supports / use stuff as block compression / deduplication / 0x00-recognitation / disk space overprovioning etc. When I drop or truncate a table the SQL server usually only marks the...
I consider to use one of those fancy new SSD-only SAN systems for our new Microsoft SQL Server. "Fancy" means, that it supports / use stuff as block compression / deduplication / 0x00-recognitation / disk space overprovioning etc. When I drop or truncate a table the SQL server usually only marks the pages in the page allocation map as free, but does not touch the data in the pages itself (e.g. overwriting it with 0x00). This is very fine (because fast) with HDDs but SSDs prever to know if a block is free and could be overwritten by its internal routines or if the data are still important and needs to be preserved. So my question: does the Microsoft SQL Server supports the SSD-Trim-Command to deallocate space in some way? Of course without shrinking shrinking the file or doing other manual / slow stuff which fragments my indexes, blocks my queries etc. Or is there a way to force the SQL server to overwrite the unassigned data by 0x00 (which will be recognized by the SAN, so it frees up the space internal), even if this is the worser option compared to the SSD-TRIM? PS: I know that it usually makes sense to have some (!) unassigned space in the database files to use for new data, but just assume that I delete more than I will add in the next 6 months or that I work with partitions and the deleted stuff resists in other files than the current hot data (but the old filegroup is still not empty, so it can't be dropped / shrinked).
Thomas Franz (885 rep)
Oct 25, 2021, 11:59 AM • Last activity: Oct 29, 2021, 02:54 PM
0 votes
2 answers
1111 views
Why is it not important whether data and log files share the same disk or not in a SAN?
I have a hard disk. I create 3 drives on it. One for OS. One for sql data files. One for sql log files. However I am reading that suppose my storage layer is not a hard disk, but a SAN based infrastructure, then it is irrelevant where the data and log files are placed because all access is random. W...
I have a hard disk. I create 3 drives on it. One for OS. One for sql data files. One for sql log files. However I am reading that suppose my storage layer is not a hard disk, but a SAN based infrastructure, then it is irrelevant where the data and log files are placed because all access is random. What does this mean?
variable (3590 rep)
Sep 15, 2021, 03:27 PM • Last activity: Sep 15, 2021, 05:27 PM
1 votes
2 answers
303 views
High Level SAN Migration Strategy. Is this a good idea?
In the company I work for we are planning decommissioning an old SAN serving our production servers Which are 2 Nodes running 2 SQL Server instances using Clustering (one is the backup for one instance and the primary for the other and visceversa). Instead of that old SAN device (I believe it is a V...
In the company I work for we are planning decommissioning an old SAN serving our production servers Which are 2 Nodes running 2 SQL Server instances using Clustering (one is the backup for one instance and the primary for the other and visceversa). Instead of that old SAN device (I believe it is a VNX 500, but the technical specifications really doesn’t matter from what I’m asking here…) we will have a new one that more or less should replace the existing one with better capabilities and more storage. In a nutshell, what we are going to accomplish is this. enter image description here Me as a DBA, have participated in tons of SQL Server Migrations, but in this case we are not really migrating the database servers, we are only pulling the SAN rug underneath and replacing it with a flashier, fancier one… We have discussed an approach with our enterprise architect and it goes more or less like this (veeeery high level) 1. Set up file synchronization using some tool (TBD) that will work at the file-block level to synchronize the contents of the data and log files. This will be running as a service while the existing database server in production is working (using the old SAN). 2. Have this synchronization running for a day or two and observe no errors occur. 3. On the night were we do the Cutover we - Shutdown the SQL Server Services (on both instance) using MS Cluster Management Console - Shutdown the Clusters themselves using MS Cluster Management Console - Present the new SAN device to the SQL Servers - Remove the old partitions (un assign letters) - Assign those letters to the LUNs on the new SAN - Bring the Cluster online - Start SQL Server Service 4. Test, Test, Test: consistency Checks, Verify SQL Server Error Logs and as look for other issues In other words, we are thinking about putting SQL Server to sleep for a while, make the necessary infrastructure changes that should be transparent to SQL Server so when we bring the service online things will look to SQL just the same they were before, but only this time we are reading off the new SAN. The reason for not doing a straight backup/restore of the databases is because we are trying to minimize downtime and we believe if we find the right tool to synchronize at the file level, we could end up just doing a final sync to catch up with the latest changes (once we shut down SQL Server Service) as opposed to major file copy operations (our databases are rather big and the downtime must be minimal). I am not asking for a full review of the strategy or discussing this at a super low level, all I’m asking the community members who would like to chime in is: - Have you done anything remotely similar to this? did it work? - Any gotchas? - Anything missing here (at a high level please, I know I’m missing TONS of stuff, but I’m talking about ciritical high level steps) - DO you know of any tool that may provide the functionality we are looking for in order to synchronize files this way (do an initial sync and then only sync bits/sectors/clusters inside the file that have changed) Thanks and sorry for the long post, but I felt that if I didn’t include at least a bit of detail the questions would be rather vague.
Martin Surasky (773 rep)
Mar 2, 2018, 09:17 PM • Last activity: Aug 24, 2021, 11:01 AM
2 votes
1 answers
3048 views
Checking database files for physical fragmentation
I searched the Internet and did not find clear explanation for when I should try defragmenting physical database files for a DB residing on a SAN. I understand that there are tools and procedures inside some SANs that deal with that, but also there are SANs which lack those features. My question has...
I searched the Internet and did not find clear explanation for when I should try defragmenting physical database files for a DB residing on a SAN. I understand that there are tools and procedures inside some SANs that deal with that, but also there are SANs which lack those features. My question has 3 parts: - How to check for physical fragmentation of database files? - Is it necessary to do this if the files reside on a SAN? - Is it necessary to create bigger physical database files to avoid fragmentation? We use a rented virtual server. The database is growing and we have some strange problems. I want to be sure that defragmentation is not an issue. Also I am just generally curious.
Bogdan Bogdanov (1163 rep)
Oct 23, 2014, 09:06 AM • Last activity: Jun 17, 2021, 03:02 PM
2 votes
1 answers
118 views
Restore from local disk vs SAN on physical machine (SQL Server Linux)
I am new to Linux and our client decided to use SQL Server on Linux. As soon as we started testing, we ran into performance issues. The server in question is a physical machine configured with a SSD local disk and a "natively" attached Storage. I don't know what a quoted "natively" actually mean her...
I am new to Linux and our client decided to use SQL Server on Linux. As soon as we started testing, we ran into performance issues. The server in question is a physical machine configured with a SSD local disk and a "natively" attached Storage. I don't know what a quoted "natively" actually mean here. Linux guy's exact words below > the storage is "natively" attached to the server The storage area network is also built on top of SSD and I copied a 4.7 GB backup file from there over to local SSD within a second. cp \gsfs\sql\backups\testdb01.bak \tmp\backups\testdb01.bak But when I try to restore from 1. SAN > RESTORE DATABASE successfully processed 612466 pages in 369.425 seconds (12.952 MB/sec). 2. Local Disk > RESTORE DATABASE successfully processed 612466 pages in 26.248 seconds (182.295 MB/sec). The database files are also stored on the SAN and while SAN to physical machine has no connectivity issues, I am certain, the way MS SQL reads the files on the SAN has something to do with the slowness. The database size is not big enough and the database is usually idle based on wait statistics enter image description here I want to understand why SQL Server is restoring from SAN slower than local file system. I would understand if the difference was little, but currently, the difference is dramatic. Since our data files and log files are also on SAN, we need to ensure disk is read/written on as fast as possible. - Most of settings are set as default - network packet size is 4096 - No antivirus installed
Shamvil Kazmi (171 rep)
Mar 3, 2021, 03:06 PM • Last activity: Mar 14, 2021, 07:16 PM
-1 votes
1 answers
73 views
RAID 0 for development
in your experiences does of real world testing of RAID5 come "close" or "far" behind RAID0 in terms of performance as the underlying storage for a heavy write OLTP workload? Please for the sake of the question ignore any concerns of the usage RAID0 because it lacks durability, as this is out of the...
in your experiences does of real world testing of RAID5 come "close" or "far" behind RAID0 in terms of performance as the underlying storage for a heavy write OLTP workload? Please for the sake of the question ignore any concerns of the usage RAID0 because it lacks durability, as this is out of the scope of the question. It can be assumed the environment is virtualized in VMWare, the storage median is SAS SSD, the DBMS SQL Server 2017, the OS for the VM is Ubuntu, there is high bandwidth between storage and guest vm 10Gbps for network and 12Gbps for storage. Thanks you
birchmeier (3 rep)
Feb 5, 2021, 04:34 PM • Last activity: Feb 8, 2021, 04:43 PM
2 votes
3 answers
1120 views
Will SQL Server go offline if it loses network connectivity to SAN where master and msdb system databases reside?
I have a setup where 3 servers combined into Availability Group All 3 servers have local directly attached SSD drives, and user database files are hosted on these drives >But the system databases (master and msdb) of each server in AG, are hosted on a SAN device that is accessed over the network Did...
I have a setup where 3 servers combined into Availability Group All 3 servers have local directly attached SSD drives, and user database files are hosted on these drives >But the system databases (master and msdb) of each server in AG, are hosted on a SAN device that is accessed over the network Did not move those to local SSD drives yet Questions: In a hypothetical situation where network connection between any of the servers and the SAN device is lost (bad cable, bad NIC, some temporary network glitch etc.), - Will SQL Server service on that server go offline or stop working properly immediately ? - Or it continues to work for some time if master and msdb were cached in RAM before network went down ?
Aleksey Vitsko (6195 rep)
Jan 19, 2021, 02:04 PM • Last activity: Jan 23, 2021, 04:12 PM
6 votes
1 answers
4134 views
SQL Server on SAN, same LUN: one logical drive vs multiple
First, let me start by saying that I did notice there are multiple similar questions to this, but neither of them is exactly what I want to ask, and not one of them has a definite answer. Second, let me confirm that I do understand, that it's recommended to use different LUNs/spindles for log/data e...
First, let me start by saying that I did notice there are multiple similar questions to this, but neither of them is exactly what I want to ask, and not one of them has a definite answer. Second, let me confirm that I do understand, that it's recommended to use different LUNs/spindles for log/data even on SAN/virtualized environment. Now the question: IF there is just one single LUN assigned to a SQL Server virtual machine, is there a **performance** (NOT management, security, or any other) difference between following configurations: 1. *One* virtual disk file (vhd, vmdk, whatever) with *one* virtual controller with *one* logical partition in virtual OS, and tempdb/data/log in that partition 2. *One* virtual disk file with *one* virtual controller but *multiple* logical partitions/drive letters for tempdb/data/log 3. *Multiple* virtual disk files one *separate* virtual controller with *separate* partitions for tempdb/data/log So far I have heard the following answers: 1. There is *no* or *negligible* performance difference, as in the end, its still just same spindle with same IO capacity 2. There *is* a performance difference in *some* scenarios because virtual controllers have separate (virtual) IO queues and scheduling in the *guest OS* If important, let's assume the workload is very large number of threads with very small requests (so deep small queue). I would like to get this settled once and for all, so I would like to ask to stay on topic of *performance on one LUN*, and refrain from suggestions for optimizing the layout.
mrQQ (81 rep)
Mar 21, 2016, 01:37 PM • Last activity: Oct 28, 2020, 04:17 PM
6 votes
2 answers
3872 views
Slow checkpoint and 15 second I/O warnings on flash storage
Last couple of weeks we've been working on getting to the root cause of what could likely be the cause of the occurrence of these I/O issues and slowdown of the checkpoints. At first glance it looks to be clearly an I/O subsystem error and the SAN admin was to be blamed for it. But recently we chang...
Last couple of weeks we've been working on getting to the root cause of what could likely be the cause of the occurrence of these I/O issues and slowdown of the checkpoints. At first glance it looks to be clearly an I/O subsystem error and the SAN admin was to be blamed for it. But recently we changed the SAN to utilize Full Flash but as of today the error still pops up and I have no clue as to why since every metric, whether wait stats or any other metric, that I run to check if SQL server is a possible culprit seems out to return normal. It doesn't really add up. It could also be very likely that something else is chewing the disk and SQL Server is becoming victimized here...but I am not able to find out what? Dbs are in Availability Groups and as and when these events occur we do see role changes and flip overs occurring along with timeouts. Any help in figuring this out would be highly appreciated. Let me know if any further details is needed. Error msgs. below > SQL Server has encountered 14212 occurrence(s) of I/O requests taking > longer than 15 seconds to complete on file [E:\MSSQL\DATA\ABC.mdf] in > database [ABC] (7). The OS file handle is 0x0000000000000D64. The > offset of the latest long I/O is: 0x0000641262c000 > SQL Server has > encountered 5347 occurrence(s) of I/O requests taking longer than 15 > seconds to complete on file [E:\MSSQL\DATA\XYZ.mdf] in database [XYZ] > (7). The OS file handle is 0x0000000000000D64. The offset of the > latest long I/O is: 0x0000506c060000 > > FlushCache: cleaned up 111476 bufs with 62224 writes in 925084 ms > (avoided 19 new dirty bufs) for db 7:0 > average throughput: 0.94 MB/sec, I/O saturation: 55144, context switches 98407 > last target outstanding: 10240, avgWriteLatency 14171 FlushCache: cleaned up 5616 bufs with 3126 writes in 248687 ms > (avoided 3626 new dirty bufs) for db 6:0 > average throughput: 0.18 MB/sec, I/O saturation: 10080, context switches 20913 > last target outstanding: 2, avgWriteLatency 3 Here's the virtual file stats info over a 30 minute span: enter image description here And wait stats as well: enter image description here Here is the note from the system architect: > We separate workloads for high I/O intense workloads (such as DB) so that we only have one per host. The specs for the current host is Dell R730 with 16 cores of Xeon E5-2620 (2 sockets), 512GB, and 2x10G interconnects for storage. No other VM on the cluster nor host are experiencing these issues. Storage for VMs and workloads is on Pure FA-x20. General System Information: - SQL Server 2012 sp3-cu9 (Enterprise Edition) - Total RAM: 128 GB - Total DB size: Close to 1 TB
Feivel (507 rep)
Apr 2, 2019, 03:05 AM • Last activity: Apr 4, 2019, 12:53 PM
0 votes
1 answers
179 views
Flash storage and auto-grow
Our databases files (data and log) of (a lot of) SQL Servers are hosted on flash storage (SAN), what would you use as auto-grow property for databases having size round 500GB (data) and 150Gb (log)? Knowing in the background that you cannot size 5 years ahead because of storage price. Subsidiary que...
Our databases files (data and log) of (a lot of) SQL Servers are hosted on flash storage (SAN), what would you use as auto-grow property for databases having size round 500GB (data) and 150Gb (log)? Knowing in the background that you cannot size 5 years ahead because of storage price. Subsidiary question: Is it still accurate to take care of file fragmentation on SSD SAN volumes?
Filoo (33 rep)
Mar 28, 2019, 02:57 PM • Last activity: Mar 28, 2019, 05:31 PM
1 votes
2 answers
1686 views
SAN IOPS figure
Recently we've been having high disk latency issue and wanted to benchmark one of our mount points on a preprod server to get an idea of how much IOPS, Throughput and latency our SAN is capable of. Infra Hardware is maintained by another team and we do not have any visibility what they do. We used d...
Recently we've been having high disk latency issue and wanted to benchmark one of our mount points on a preprod server to get an idea of how much IOPS, Throughput and latency our SAN is capable of. Infra Hardware is maintained by another team and we do not have any visibility what they do. We used diskspd to run test with the below parameters enter image description here **Results** enter image description here As you can see, the max IOPS at 8K block size was around 5800. So my question is, for a SAN in a not so small IT shop, is that a good number? Should we be aiming for more? What is an average IOPS figure for an entry level and high end SAN's. **Adding more info based further questions from answers.** 1) Why 8k block - I read that readahead uses higher block sizes, but when I analysed the bytes per read counter it was around 8K, suggesting SQL server is not able to do a lot of read aheads. Does fragmentation have a role here? 2) I did test with a smaller file size(2GB) and the IOPS and through put was much higher, suggesting that the path is not the bottleneck. Also did test with higher block size, getting a through put of around a terabyte. Cheers
IOtester (11 rep)
Oct 1, 2018, 01:14 PM • Last activity: Oct 1, 2018, 02:42 PM
Showing page 1 of 20 total questions