I've inherited a process that consolidates a number of tables from remote servers to a central database once a week. It takes a list of servers/tables, bcps out the table locally and then bcp ins those files to the local db.
All the imports are to unindexed heaps, but for some reason the ins take a very long time. For example the bcp out of one table is 100 gigs/316m rows and takes 20 minutes. The bcp in takes 12 hours.
I tried pre-allocating the new db file size instead of letting it autoextend, and that seems to be saving around 15% of the time but the in is still taking more than an order of magnitude longer than the remote out.
The bcp out is
bcp "" out "" -S "" -T -t "\n" -w -a 65535 -C RAW
The bcp in is
bcp "" in "" -S -T -t "\n" -w -a 65535 -b1000000 -C RAW
It's using the dns name for the in server (which is the localhost). I didn't know how much that might impact things. A batch size of a million records also seems a bit unusual.
The database being imported to is in Simple mode. It's Sql Server 2019 but because it's an old script the db is in 2008 compatibility mode.
The process is running now, and while the pre-allocation seems to be helping some, it's not as much as I'd hoped
Taking 15x to import as it does to export the same data seems like quite a bit of overhead to me, and I was wondering what options might help improve that?
I've also been toying with the idea of using BULK INSERT or even SELECT INTO (to avoid the file exchange) but at 12 hours a whack, randomly pushing buttons is pretty time consuming.
EDIT: I took Sergey's suggestion below and started looking at the executing characteristics. Every time I ran sp_whoisactive, the bcp process seemed to be in ASYNC_NETWORK_IO wait. When I googled that, it sounded like Sql Server is seemingly having to wait for the next batch of bytes from BCP. Not sure if that's impacted significantly by using the DNS name instead of localhost (or just leaving the -S off entirely).
Then I started looking at some of the other stats. CPU is at 1%. RAM is at 96%. I looked at Sql Server and it's unbounded and consuming 95% of the memory on the system. Past experience is that once Windows gets above 90%, you start losing a lot of performance to swapping. So it seems that might be a big leech on the system.
EDIT 2: I've been trying some experiments based on the suggestions here, and here's where I am so far.
The machines in question are Windows Server 2019 vms in GCP, using Sql Server 2019 with ~20 TB virtual drives
The part that was doing 9 bcp outs and 9 bcp ins that prompted this post was taking a bit over 16 hours (about a half-hour to 40 minutes on the out; over 15 hours on the in).
First experiment: make the data file allocation on db creation closer to the end result from the start. In other words, don't start too small and auto-extend all the way. All the data files ended up around 40 gig anyway, so why start them at 10 and grow?
That cut the process from 16:05 to 12:09. So about a 24% saving.
Second experiment: cut the bcp in batch size down from a million to 50,000. For testing purposes, I just used the last bcp out files and just did the bcp ins with them.
That cut the in process down to 1 hour 34 minutes and 30 seconds. That appears to have been the big poll in the tent. The batch size of a million appears to have been causing the ASYNC_NETWORK_IO waits. And it caused a lot of transaction log use (even in bulk mode).
There appears to be a point of diminishing returns on batch size. I've seen posts saying that there was not any noticeable improvement over ~10 or 20K, but a million seems to be well into the "more harm than good" range.
Third experiment was adding -h"TABLOCK" to the above. That got the loads down to 1 hour, 3 minutes, 36 seconds. Still improvement but not as huge.
EDIT 3: So I put all the changes into place and let the job run as a scheduled task - and it all sucked again. It appears that a lot of the slowdown in BCP in, and the ASYNC_NETWORK_IO blocks were related to the scheduled tasks running at a much lower priority (though interestingly the BCP outs didn't get nearly as hung up). Found a powershell script that let me set the process priority for a given scheduled task higher.
Asked by user1664043
(379 rep)
Jun 30, 2025, 11:58 AM
Last activity: Jul 9, 2025, 05:02 PM
Last activity: Jul 9, 2025, 05:02 PM