I have a MongoDB 3.2 WiredTiger Server running on Windows Server 2008 R2 Enterprise in Production while a MongoDB 3.2 WiredTiger Server running on Windows Server 2012 R2 in Testing. Production has 32GB and Testing has 16GB ram. MongoDB is the only thing running on these two servers.
We have a python script that runs in the morning and it ingests a large amount of data into a single collection.
> EDIT: The data being ingested is from a large .csv file and not coming
> directly from a SQL DB.
This happens every morning. There are two collections, Collections A and Collections B. One is active while the other is dormant. Each morning the dormant one gets loaded with the new data and becomes the active collection.
The problem is that the process to populate this collection takes about 12 hours to complete in Production while only taking about 4 hours in Testing. It's the same data, from the same .csv file, running the same python script, same python version, inserting into the same collections the same way.
Now Production is much more active while Testing has little to no activity. Though the resources on Production are not strained. The CPU usage is typically below 10% consistently. Using Mongostat and Mongotop it looks like our Production server can take 10s of seconds or more before doing any inserts while our Testing server is doing them practically every second.
Production:
Testing:
At first I thought perhaps this was due to an IO bottleneck with the HD. Our Production server was running on a 15k HDD at the time while Testing was on an iSCSI disk array which is much faster. So we worked on getting Production setup like that too. It improved the ingestion by maybe an hour.
I've spent weeks researching this and I am at a loss as to why our Production server take 3 times longer to ingest the same data as our Testing server. At this point it seems to have something to do with either the activity in Production or the difference in Operation System.
Can incoming activity prevent Mongo from making inserts even though there's plenty of CPU and disk IO to do both? Can the OS cause such a massive disparity? What else should I look at?


Asked by iamcyruss
(1 rep)
Nov 25, 2019, 04:41 PM
Last activity: Nov 25, 2019, 09:43 PM
Last activity: Nov 25, 2019, 09:43 PM