Tesseract: High CPU Usage and slow speed, only when running multiple processes in parallel
10
votes
2
answers
13799
views
# Problem
pytesseract.image_to_string()
takes too much time when I run the script through supervisordd, but executes almost instantaneously when run directly in shell (on the same server and simultaneously with supervisor scripts).
Apart from taking too much time, the processes are also showing high CPU usage.
Time taken by pytesseract.image_to_string()
when run via Supervisord: ~30s
Time taken by pytesseract.image_to_string()
when run via Bash: 0.1s
This problem only occurs, if there are a lot of processes, executing pytesseract.image_to_string()
, being run via supervisord (around 22 instances). If I reduce the number of instances (to around 10), the scripts executed via supervisord also run smoothly.
### System Information
OS: Ubuntu 18.04.2 LTS (bionic)
Supervisord: Version 3.3.1
Tesseract: Version 4.0.0-beta.1
Python: Version 3.6
PyTesseract: Version 0.2.5
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 127357
max locked memory (kbytes, -l) 16384
max memory size (kbytes, -m) unlimited
open files (-n) 8096
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 127357
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Let me know if you need any more information.
## Edit 1 (or I know what's NOT the source of this problem)
I am fairly certain that it is not an issue with Supervisord.
When I run one instance from an ssh shell, the function (pytesseract.image_to_string()
) is executed smoothly (i.e takes only 0.1s), while there are 10 instances being run via Supervisord.
When I start another instance from a new ssh shell, both the instances (ones started from ssh) run smoothly most of the time.
When I start yet another instance from a new ssh shell, all the three instances start choking, taking around 10s to execute the function. This time keeps on increasing as I add more instances via shell.
So the problem can be replicated even with a shell.
### More Information
I ran the program with strace -T -f
but I could not figure out what exactly is causing the spike in time.
For a function call that takes 1s
Top 10 system calls sorted by time taken
1.504530 [pid 29921] [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 30166
0.503915 [pid 29932] ) = 0 (Timeout)
0.503472 [pid 29932] ) = 0 (Timeout)
0.500524 [pid 29933] ) = 0 (Timeout)
0.500515 [pid 29933] ) = 0 (Timeout)
0.500514 [pid 29932] ) = 0 (Timeout)
0.500512 [pid 29933] ) = 0 (Timeout)
0.069869 [pid 30169] ) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
0.035989 [pid 30167] ) = 0
0.016002 [pid 30168] ) = 0
For a function call that takes 9s
Top 10 system calls sorted by time taken
9.795787 [pid 29921] [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 30106
0.515960 [pid 29933] ) = 0 (Timeout)
0.511955 [pid 29933] ) = 0 (Timeout)
0.507979 [pid 29932] ) = 0 (Timeout)
0.507968 [pid 29932] ) = 0 (Timeout)
0.505257 [pid 29932] ) = 0 (Timeout)
0.503988 [pid 29932] ) = 0 (Timeout)
0.503978 [pid 29932] ) = 0 (Timeout)
0.503975 [pid 29932] ) = 0 (Timeout)
0.503974 [pid 29932] ) = 0 (Timeout)
Asked by Ashish
(270 rep)
Jul 18, 2019, 08:29 AM
Last activity: Jan 28, 2021, 05:56 PM
Last activity: Jan 28, 2021, 05:56 PM