Ask Different (Apple)

Q&A for power users of Apple hardware and software

Latest Questions

29 votes

5 answers

32474 views

Built-in OCR in macOS?

macos command-line monterey ocr live-text

Recently I've found on my Mac that I can easily highlight text in an image that wasn't possible before. Is there also a built-in CLI option or an AppleScript option to address the OCR program with which I can achieve the same results as tesseract?

                                  Recently I've found on my Mac that I can easily highlight text in an image that wasn't possible before.

Is there also a built-in CLI option or an AppleScript option to address the OCR program with which I can achieve the same results as tesseract?

user448077

Mar 5, 2022, 02:59 PM • Last activity: Jun 24, 2025, 02:35 AM

13 votes

1 answers

669 views

Quick Look markup toolbar is no longer shown by default

macos screen-capture quicklook ocr

I'm on Ventura 13.0.1 (22A400). In the past I was able to take a screenshot ( CMD + SHIFT + 4 ), click on the thumbnail in the corner of my screen opening the "Quick Look", and immediately begin drawing on the screenshot. The markup toolbar was always enabled by default. Recently, the toolbar is nev...

                                  I'm on Ventura 13.0.1 (22A400).

In the past I was able to take a screenshot (CMD+SHIFT+4), click on the thumbnail in the corner of my screen opening the "Quick Look", and immediately begin drawing on the screenshot. The markup toolbar was always enabled by default.

Recently, the toolbar is never enabled, so I have to click on the  button to display it, then click on the  button to select the "Sketch" tool.

This is two clicks more than before!

I assume that this is due to the addition of the ability to copy text from any image, as the text selection cursor is now the default. But I use the Sketch tool hundreds of times more than the text selection tool.

Is it possible to change back to the old default behaviour, where Quick Look defaults to the Sketch tool?

Liam (233 rep)

Dec 8, 2022, 09:41 PM • Last activity: Jan 1, 2025, 04:05 AM

1 votes

1 answers

448 views

Has OCR functionality on scanned docs in Notes changed?

ios notes.app ocr

In iOS 17.0.3, on the Notes app, I’ve been using the scan document feature to keep a copy of a document, saving within the Note and then copying & pasting elsewhere. Today, OCR is not working. Works for pictures taken in Notes but not for scanned documents as has been for months.

                                  In iOS 17.0.3, on the Notes app, I’ve been using the scan document feature to keep a copy of a document, saving within the Note and then copying & pasting elsewhere.

Today, OCR is not working. Works for pictures taken in Notes but not for scanned documents as has been for months.

Rebecca Tech (11 rep)

Oct 19, 2023, 04:36 PM • Last activity: Dec 23, 2023, 04:04 PM

2 votes

1 answers

716 views

How to prevent preview from recognizing the numbers in PDFs as phone numbers?

macos pdf preview ocr

As a tech worker, I often read documents in PDF formats, and Preview often recognize the numerical parameter sets in those documents as "phone contacts". How do I prevent Preview from doing so, so that it'll be less likely to butt-dial random people that I don't know?

                                  As a tech worker, I often read documents in PDF formats, and Preview often recognize the numerical parameter sets in those documents as "phone contacts". 

How do I prevent Preview from doing so, so that it'll be less likely to butt-dial random people that I don't know?

DannyNiu (225 rep)

Nov 11, 2023, 04:31 AM • Last activity: Nov 30, 2023, 12:46 AM

1 votes

2 answers

110 views

How to customise Safari OCR?

macos safari ocr

How can I make macOS Safari OCR: - not include "new lines" in the output text - better at reading Chinese (一 and 了 and 。 are often not selectable at all) For the purpose of copying speech bubbles in Chinese comics. At the moment a bubble like [![enter image description here][1]][1] can only be copie...

                                  How can I make macOS Safari OCR:

- not include "new lines" in the output text
- better at reading Chinese (一 and 了 and 。 are often not selectable at all)

For the purpose of copying speech bubbles in Chinese comics. At the moment a bubble like

can only be copied as

> 个再平凡  
> 不过的中考  
> 狗

(It should be 

> 一个再平凡不过的中考狗。

)

user150109

Aug 2, 2023, 02:56 PM • Last activity: Aug 23, 2023, 12:50 AM

3 votes

2 answers

5449 views

Command Line Tool to Batch Convert .EML/.EMLX/.MBOX to Searchable PDFs?

email pdf file-conversion ocr

I need to convert about 500k emails into searchable PDFs. By 'searchable' I mean that macOS will be able to scan them for specific words rather than simply treating them as an image. My searches, thus far, for a tool to do this have ended in proprietary database apps and over-priced sketchball x-to-...

                                  I need to convert about 500k emails into searchable PDFs. By 'searchable' I mean that macOS will be able to scan them for specific words rather than simply treating them as an image. My searches, thus far, for a tool to do this have ended in proprietary database apps and over-priced sketchball x-to-pdf converters which basically perform the built-in macOS functionality of Print To PDF. Is there a single tool or two complementary tools that could be used together in Terminal to just batch convert all the emails to searchable PDFs?
                                

Kerlix (1480 rep)

Dec 29, 2018, 04:16 PM • Last activity: May 18, 2023, 06:08 AM

2 votes

2 answers

748 views

Easiest/cheapest way to OCR an existing PDF that lacks a text layer?

software-recommendation preview ocr

I have photo-scanned one of my most-used reference works, [Watching the Skies][1]. The aim was to allow me to do searches instead of using the (somewhat odd) index. I copied the images off my iphone and used Preview on my mac to produce a single PDF with all of the pages. The result is surprisingly...

                                  I have photo-scanned one of my most-used reference works, Watching the Skies . The aim was to allow me to do searches instead of using the (somewhat odd) index. I copied the images off my iphone and used Preview on my mac to produce a single PDF with all of the pages. The result is surprisingly good.

So now I have a huge document of fairly readable scanned images. But there is no "text layer" that I can search. Is there a way to do that within Preview, or some other tool I can use?

Maury Markowitz (1566 rep)

Apr 18, 2023, 08:03 PM • Last activity: Apr 18, 2023, 10:50 PM

3 votes

1 answers

1680 views

Search for text in image using macOS Preview's Live Text OCR

pdf preview graphics ocr live-text

I want to use Apple's new [Live Text](https://support.apple.com/en-us/HT212630) feature to **search** a PDF for a specific text phrase. The PDF is a large file without any baked-in encoded text. Just many pages of images of scanned text, stored in a PDF. Per this question, Preview evidently cannot u...

                                  I want to use Apple's new [Live Text](https://support.apple.com/en-us/HT212630)  feature to **search** a PDF for a specific text phrase. The PDF is a large file without any baked-in encoded text. Just many pages of images of scanned text, stored in a PDF.

Per this question, Preview evidently cannot use Live Text in a PDF directly:  
https://apple.stackexchange.com/questions/440711/force-preview-to-live-text-a-pdf 

Using File > Export, I can turn the PDF into a multi-page PNG image file.

Once it's an image, Live Text is immediately available. I can select and copy/paste OCR'd text out of the image. I can press ⌘A to select all the text on a single page.

But how can I **search** the image file for Live Text?

In the image file, Preview's Edit > Find... is not available.

I can copy/paste a single page's Live Text, and then use Find in a text editor. But I can't copy more than one page of the image at a time.

Can I use Find or use Spotlight Search to search Live Text? Please only answers using the new iOS/macOS Live Text feature, not any 3rd party OCR programs.

pkamb (9620 rep)

Oct 13, 2022, 12:21 AM • Last activity: Mar 18, 2023, 10:02 PM

15 votes

3 answers

2764 views

For "Live Text" in macOS Monterey, can you have it scan all of your photos, and use the Live Text OCR'd content to search against in Spotlight Search?

macos photos.app spotlight ocr live-text

MacOS Monterey includes "Live Text", which is OCR to use when viewing your photos, including handwriting recognition. But I have thousands of photos, many of which are photos of presentations, pictures of PowerPoint slides taken from my seat in the audience, etc. I also have hundreds of screenshots...

                                  MacOS Monterey includes "Live Text", which is OCR to use when viewing your photos, including handwriting recognition. 

But I have thousands of photos, many of which are photos of presentations, pictures of PowerPoint slides taken from my seat in the audience, etc.  I also have hundreds of screenshots with text from YouTube.

I'd like the Mac to slowly work its way through *all* of my photos and add the recognized text as some type of searchable attribute, attached to each photo.

So can I have Live Text OCR my entire Photo Library?

Mark Bennett (431 rep)

Oct 21, 2021, 07:05 PM • Last activity: Jan 10, 2023, 05:39 PM

1 votes

0 answers

383 views

Mac Monterey "Live Text" Not Working for Imported Screenshots in Photos app

macos photos ocr live-text

Live Text seems to work in Photos for pictures that I've taken with my iPhone camera. However, I want to use the Live Text on screenshots from meetings. I've imported the screenshot into Photos, with very clear text, but it only wants to treat it as pixels. I can't share the text. Some of it is a bi...

                                  Live Text seems to work in Photos for pictures that I've taken with my iPhone camera.

However, I want to use the Live Text on screenshots from meetings.  I've imported the screenshot into Photos, with very clear text, but it only wants to treat it as pixels.  I can't share the text.  Some of it is a bit small, but very clear.

On this Mac, I do not have it automatically putting screen shots into photos; I only do that manually for particular screenshots.

I'm on an Intel iMac Pro (Intel Macs *do* now support Live Text; early reports that you need an M1 Mac are out of date), with up-to-date Monterey macOS.

Mark Bennett (431 rep)

Dec 15, 2021, 07:12 PM • Last activity: Jan 10, 2023, 05:38 PM

1 votes

1 answers

760 views

Notes app: search inside scanned document

search scanning image-capture ocr

How do I get to the detailed search result within a camera scanned document in the Notes app? If I create a note on my iPhone, and use the camera to "scan" a document, it looks like the content of the document is searchable by the global search in the Notes app on my Mac and iPhone. But how do I fin...

                                  How do I get to the detailed search result within a camera scanned document in the Notes app?

If I create a note on my iPhone, and use the camera to "scan" a document, it looks like the content of the document is searchable by the global search in the Notes app on my Mac and iPhone. But how do I find the exact location within that document? There appears to be no highlight of the search result, nor any way to search within that document.

Is there an obvious trick that I am missing? Even if I open the attached document in a new window, there is no search interface at all.

I am using macOS Ventura and iOS 16.1.1

kannix (135 rep)

Nov 12, 2022, 09:02 PM • Last activity: Nov 27, 2022, 12:52 PM

1 votes

0 answers

130 views

Live Text from webcam in MacOS

macos preview webcam text-input ocr

Getting used to [Live Text][1] via *tap menu in text inputs* on iOS 1 I'd like to use it the same ubiquitous way on macOS, but there you must use the [Preview app][2] (link [via][3]). So you have to 1st take a photo using PhotoBooth.app, navigate to `~/Pictures/Photo Booth Library/Pictures/` and ope...

                                  Getting used to Live Text   via *tap menu in text inputs* on iOS1 I'd like to use it the same ubiquitous way on macOS, but there you must use the Preview app  (link via ). So you have to 1st take a photo using PhotoBooth.app, navigate to ~/Pictures/Photo Booth Library/Pictures/ and open the file in Preview until you can at last select the text you want.

Is there a way to use the webcam the same way as on iOS devices, that opens a window with the live webcam image and the possibility to select text from it?

A nice way would be to make it available via *Input menu in menu bar* where you can also access the Character Viewer  for Emojis or somehow trigger it via a script or Automator.

 1) Seems to be documented nowhere

cachius (301 rep)

Aug 10, 2022, 03:19 PM

0 votes

3 answers

211 views

Turn a bunch of photos into a PDF book, splitting and "fixing" pages

software-recommendation preview ocr

I have a series of images of the pages from a very old reference book that I use all the time. Some are a single page, some facing pages. On iOS are there apps that will automatically fix-up such images - de-keystone them and split pages and such and turn it into a single PDF. Is there something sim...

                                  I have a series of images of the pages from a very old reference book that I use all the time. Some are a single page, some facing pages.

On iOS are there apps that will automatically fix-up such images - de-keystone them and split pages and such and turn it into a single PDF.

Is there something similar on the macOS side?

Maury Markowitz (1566 rep)

Jun 4, 2022, 11:57 AM • Last activity: Jul 4, 2022, 08:03 PM

0 votes

0 answers

31 views

Any ABBYY Finereader users here?

applications ocr

I am trying to scan old source code in BASIC and other languages like Pascal. I have tried a couple of OCR products with mixed results. The big problem is they don't treat spaces as important and run multiple spaces together into one. I had a short email exchange with ABBYY's tech, who said you can...

                                  I am trying to scan old source code in BASIC and other languages like Pascal. I have tried a couple of OCR products with mixed results. The big problem is they don't treat spaces as important and run multiple spaces together into one.

I had a short email exchange with ABBYY's tech, who said you can select BASIC as the language. I'm on the demo version and it doesn't have that option, and when I look at the mac online help I don't see it either.

Does anyone out there use Finereader on the Mac and might comment?

Maury Markowitz (1566 rep)

Apr 30, 2022, 07:20 PM

36 votes

13 answers

53618 views

Make existing PDF searchable ( OCR ) via command line / script

pdf ocr

I am looking for an offline scriptable tool that makes an existing PDF file searchable by running OCR on it, replacing the original non-searchable file with the searchable version, and can run unattended. E.g., www.pdfscannerapp.com - does exactly what I need, but it's GUI only - not scriptable. I a...

                                  I am looking for an offline scriptable tool that makes an existing PDF file searchable by running OCR on it, replacing the original non-searchable file with the searchable version, and can run unattended.

E.g., www.pdfscannerapp.com - does exactly what I need, but it's GUI only - not scriptable.

I am aware that Evernote makes PDF files searchable, but they remain searchable only when within Evernote.

I am not looking for perfect OCR, even a moderately acceptable OCR is fine, but I would prefer a small utility rather than a bulky software package.

(I am aware of a similar, but different question on AD: https://apple.stackexchange.com/questions/72676/looking-for-software-to-scan-or-convert-to-searchable-and-signable-pdf  - however, I don't need to sign or fill PDFs, and my requirement is that the solution is scriptable)

EDIT: 

1) Several utilities allow structured text extraction, however in order to be extracted, the text must be there; I am mainly referring to PDFs that are wrapped bitmaps, as is the case with plain PDFs generated by scanners.

2) I am not necessarily looking for a free solution, and I would be more than happy to pay for a good utility that just does what I need, but I am not looking for bulky applications with a million features that include an OCR feature but whose cost does not justify buying them just for the OCR functionality.

3) As stated above, I am not looking for perfect OCR, just a moderately acceptable OCR. Unfortunately, in my experience, tesseract is really below that threshold. I define "moderately acceptable" an OCR that can, say, OCR an utility bill so that at least the account number (customer number) is recognized correctly.

EDIT: "scriptable" or "automatable", that is, able to be triggered automatically and run unattended without human input whatsoever.

magma (958 rep)

Jan 1, 2013, 05:20 PM • Last activity: Feb 25, 2022, 09:48 AM

0 votes

3 answers

11385 views

iOS: App to search All Content in PDF files?

ios search productivity ocr

There are many apps that have search over the names of the files but not all content in the files. It is very slow to test all apps such as iAnnotate, PDF Expert, Amazing PDF Expert and GoodReader. Which app search over all content in PDFs? For non-PDF files and files without OCR, please, see > http...

                                  There are many apps that have search over the names of the files but not all content in the files. It is very slow to test all apps such as iAnnotate, PDF Expert, Amazing PDF Expert and GoodReader. Which app search over all content in PDFs? 

For non-PDF files and files without OCR, please, see 

> https://apple.stackexchange.com/questions/84319/how-to-search-over-different-types-of-documents-such-as-djvu

hhh (3944 rep)

Oct 18, 2012, 06:04 PM • Last activity: Jan 20, 2022, 06:27 AM

0 votes

0 answers

1256 views

OCR on macOS that is actually useful?

pdf preview ocr

I am trying to extract the BASIC code from archive.org's copy of [What to do After You Hit Return][1], a 1975 book of computer games. The PDF is simply a series of images, without the underlying text. I would like to turn these back into ASCII source. The original text is reasonably high quality: [!...

                                  I am trying to extract the BASIC code from archive.org's copy of What to do After You Hit Return , a 1975 book of computer games.

The PDF is simply a series of images, without the underlying text. I would like to turn these back into ASCII source. The original text is reasonably high quality:

The only OCR I could find for the mac that I didn't have to pay for was LEADTools, it converted the first line...

    100 REM *** REVERSE - A GAME OF SKILL

... into ...

    188 REM »-- nsvsnst; - A GAME or SKILL IIB

Now the 8's I get, the original font has a slash in the zeros, so... OK. But the rest, come on. The IIB at the end appears to be the "110" on the next line, although there is clear blank space between them and it adds that to the *rest* of line 110. In any event, the result is useless.

I am wondering if anyone might recommend an OCR on macOS that *isn't* useless. I'm willing to pay, but only if someone can test it on this text for me.

Maury Markowitz (1566 rep)

Dec 25, 2021, 05:18 PM • Last activity: Dec 25, 2021, 06:03 PM

1 votes

0 answers

129 views

automator: sorting image with ocr

macos automator automation image-processing ocr

I have a folder with screenshots of a document, in every image I have an article number in the same exact position. Ideally, I would like something in a text file or to add to numbers table. what is the favorable approach to take in order to perform this action? I thought about mash slicing those im...

                                  I have a folder with screenshots of a document, in every image I have an article number in the same exact position. Ideally, I would like something in a text file or to add to numbers table.

what is the favorable approach to take in order to perform this action?

I thought about mash slicing those images only to the specific area of information 

then use Create ML image classifier but I couldn't really understand the next steps

Omri Cohen (9 rep)

Apr 6, 2020, 09:36 AM • Last activity: Apr 8, 2020, 01:59 PM

17 votes

2 answers

7816 views

OCR on PDFs in OS X with free, open source tools

macos pdf open-source ocr

After reading these blog posts: * [Linux, OCR and PDF - Problem solved][1] * [Creating a searchable PDF with opensource tools ghostscript, hocr2pdf and tesseract-ocr][2] * [Using Tesseract OCR with PDF scans][3] and going through the snippet below (from [this][4] gist) for Linux, I think I found a m...

                                  After reading these blog posts:

* Linux, OCR and PDF - Problem solved 
* Creating a searchable PDF with opensource tools ghostscript, hocr2pdf and tesseract-ocr 
* Using Tesseract OCR with PDF scans 

and going through the snippet below (from this  gist) for Linux, I think I found a method to OCR a multi-page PDF and get a PDF in the output that could also work in OS X. Most of the dependencies are available in homebrew (brew install tesseract and brew install imagemagick), except one, hocr2pdf.

I haven't been able to find a port of it for OS X. Is there one available? If not, how can one OCR a multi-page PDF and get the results back again in a multi-page PDF in OS X, using free, open source tools?

    #!/bin/bash
     
    # This is a script to transform a PDF containing a scanned book into a searchable PDF.
    # Based on previous script and many good tips by Konrad Voelkel:
    # http://blog.konradvoelkel.de/2010/01/linux-ocr-and-pdf-problem-solved/ 
    # http://blog.konradvoelkel.de/2013/03/scan-to-pdfa/ 
    # Depends on convert (ImageMagick), pdftk and hocr2pdf (ExactImage).
    # $ sudo apt-get install imagemagick pdftk exactimage
    # You also need at least one OCR software which can be either tesseract or cuneiform.
    # $ sudo apt-get install tesseract-ocr
    # $ sudo apt-get install cuneiform
    # To install languages into tesseract do (e.g. for Portuguese):
    # $ sudo apt-get install tesseract-ocr-por
     
    echo "usage: ./pdfocr.sh document.pdf ocr-sfw split lang author title"
    # where ocr-sfw is either tesseract or cuneiform
    # split is either 0 (already single-paged) or 1 (2 book-pages per pdf-page)
    # lang is a language as in "tesseract --list-langs" or "cuneiform -l".
    # and author, title are used for the PDF metadata.
    #
    # usage example:
    # ./pdfocr.sh SomeFile.pdf tesseract 1 por "Some Author" "Some Title"
    pdftk "$1" burst dont_ask
    for f in pg_*.pdf
    do
    if [ "1" == "$3" ]; then
    convert -normalize -density 300 -depth 8 -crop 50%x100% +repage $f "$f.png"
    else
    convert -normalize -density 300 -depth 8 $f "$f.png"
    fi
    done
    rm pg_*.pdf
     
    for f in pg_*.png
    do
    if [ "tesseract" == "$2" ]; then
    tesseract -l $4 -psm 1 $f $f hocr
    elif [ "cuneiform" == "$2" ]; then
    cuneiform -l $4 -f hocr -o "$f.html" $f
    else
    echo "$2 is not a valid OCR software."
    fi
    hocr2pdf -i $f -r 300 -s -o "$f.pdf"  in.info
    echo "InfoKey: Author" >> in.info
    echo "InfoValue: $5" >> in.info
    echo "InfoBegin" >> in.info
    echo "InfoKey: Title" >> in.info
    echo "InfoValue: $6" >> in.info
    echo "InfoBegin" >> in.info
    echo "InfoKey: Creator" >> in.info
    echo "InfoValue: PDF OCR scan script" >> in.info
    in_filename="${1%.*}"
    pdftk merged+data.pdf update_info_utf8 in.info output "$in_filename-ocr.pdf"
     
    rm -r doc_data.txt in.info merged* pg_*



                                

Josh (365 rep)

Apr 22, 2014, 03:17 PM • Last activity: May 28, 2019, 05:15 AM

1 votes

0 answers

226 views

HP LaserJet wont print OCR'ed file or Power Point file, sent from OSX

pdf printing ocr

I've got an HP Colour LaserJet Pro MFP M281fdw printer. The printer works fine for word docs, images etc and PDFs (which are not OCR'ed) If a PDF has been created from a scanned document then OCR'ed the file will fail to print. By fail to print i mean : When i try to print a 50 Page OCR'ed PDF the f...

                                  I've got an HP Colour LaserJet Pro MFP M281fdw printer. 

The printer works fine for word docs, images etc and PDFs (which are not OCR'ed)

If a PDF has been created from a scanned document then OCR'ed the file will fail to print. 

By fail to print i mean : 

When i try to print a 50 Page OCR'ed PDF the first page that contains no OCR'ed text prints fine, but the printer stalls and dosent print any of the other pages, i get an error on my laptop something allong the lines of can not connect to printer. If it have the same document but not OCR'ed and print it, it works fine.

When i try to print a 3 page OCR'ed PDF, none of the pages print, but instead the printer prints an error page, with the following : 

    ERROR : undefined
    OFFENDING COMMAND : New
    STACK :
    /AAAAB+ *Times
    /FontName

---

Ive spoken to HP Support and apparently this is a known issue they added the following : 

- The issue Prints from Power Point (I don't do any work in Power Point so have not verified) 
- The issue is to do with OSX. (I have tried using both the HP provided driver and the standard generic drivers, but the issue continues, apparently its to do with the way OSX handles printing, prior to the print being sent to the print driver.
- The issue is not present on Windows (I have not confirmed)

Are there any known workarounds to this issue, other than if i want to print an OCR'ed document, convert it to a non OCR'ed doc first? 

My laptop is running OSX 10.13.6.x, my laptop is connected to the network via Wifi.

The printer is connected to the network via a wired connection. 
                                

sam (3985 rep)

May 8, 2019, 11:04 AM • Last activity: May 9, 2019, 10:39 AM

Showing page 1 of 20 total questions