Sample Header Ad - 728x90

What happened to Tesseract's "Math / equation detection module"?

0 votes
0 answers
301 views
I was able to get Tesseract to run via a Python script on my Windows machine to turn non-searchable PDFs into searchable ones. When downloading Tesseract onto windows, it asked me which languages I wanted and I selected them, this was when I learned about the math module to begin with. I am not sure how effective the math module was, but I could see that it was downloaded when I checked the languages. Now I am trying to install Tesseract on Debian. To install Tesseract I used the command: sudo apt install -y tesseract-ocr Then, to ensure I had the math module, I would always follow that up with: sudo apt install tesseract-ocr-equ And, I am pretty sure that would install the math module. I remember using that command successfully several times, including earlier this morning. However, now, when I use that code, I get the following messages: Reading package lists... Done Building dependency tree... Done Reading state information... Done E: Unable to locate package tesseract-ocr-equ Just to make sure I wasn't crazy, I looked up the language codes used by Tesseract, [according to Debian.org](https://manpages.debian.org/testing/tesseract-ocr/tesseract.1.en.html#LANGUAGES_AND_SCRIPTS:~:text=Math%20/%20equation%20detection%20module) , and they say that "equ" belongs to the "Math / equation detection module", admittedly that is an earlier version. So, I tried the following code: sudo apt-get install -y tesseract-ocr-equ Among the several lines of code that I got in response were the following: Note, selecting 'tesseract-ocr-uzb-cyrl' for regex 'tesseract-ocr-[equ]' Note, selecting 'tesseract-ocr-ell' for regex 'tesseract-ocr-[equ]' Note, selecting 'tesseract-ocr-eng' for regex 'tesseract-ocr-[equ]' Note, selecting 'tesseract-ocr-enm' for regex 'tesseract-ocr-[equ]' Note, selecting 'tesseract-ocr-epo' for regex 'tesseract-ocr-[equ]' Note, selecting 'tesseract-ocr-est' for regex 'tesseract-ocr-[equ]' Note, selecting 'tesseract-ocr-eus' for regex 'tesseract-ocr-[equ]' Note, selecting 'tesseract-ocr-que' for regex 'tesseract-ocr-[equ]' Note, selecting 'tesseract-ocr-uig' for regex 'tesseract-ocr-[equ]' Note, selecting 'tesseract-ocr-ukr' for regex 'tesseract-ocr-[equ]' Note, selecting 'tesseract-ocr-urd' for regex 'tesseract-ocr-[equ]' Note, selecting 'tesseract-ocr-uzb' for regex 'tesseract-ocr-[equ]' tesseract-ocr-eng is already the newest version (1:4.1.0-2). tesseract-ocr-eng set to manually installed. So, this made me wonder if there was a different math module for different languages, and the math module is automatically downloaded with the language you download. I just really remember using the command initially without any problem. That being said, I have had several head injuries, so my memory is not entirely reliable. It's just that if I turn out to have been mistaken here and I have not been using that code as I remember, this will be one of those deeply troubling times due to how vividly I remember this working. So, the primary question is how do I download the "Math / equation detection module" for Tesseract onto my Linux Beta on my Chromebook. Secondarily, could someone tell me if the functionality of the "sudo apt install tesseract-ocr-equ" command changed recently. This is frustrating me quite a bit. I am hoping that someone just changed the functionality this morning and math modules are now built into the languages.
Asked by Curious Layman (101 rep)
May 16, 2024, 04:17 PM
Last activity: May 21, 2024, 09:06 AM