Jim Willis

Troubleshooting PDF OCR using Python on Mac

I wrote a script to extract some text from a PDF (image-based text, so pdftotext wouldn’t work).

Using pdf2image convert_from_path I simply could not get any data out of the pdf. I tried multiple PDFs while testing and convert_from_path just kept returning an empty variable.

Turned out that my homebrew install of xpdf was interfering with my homebrew install of poppler.

Uninstalling xpdf (brew uninstall xpdf) and reinstalling poppler (brew install poppler) seemed to fix things up. My suspicion is that they both come with their own versions of pdfinfo which is used by pdf2image. Just a hunch, I don’t know enough about what’s going on under the hood. So, anyway, if pdf2image isn’t working correctly for you and you’re on a Mac, make sure you’ve got poppler installed and that xpdf’s pdfinfo isn’t being used.

Posted

January 3, 2020

Uncategorized

Current Spins

Check out my album Set It All Down on your favorite streaming service.

Posts Worth Reading:

Free Internet vs Internet of Free Stuff

Letterboxd

The Secret Agent, 2025
March 8, 2026
Watched on Sunday March 1, 2026.
Ella McCay, 2025
March 8, 2026
Watched on Saturday March 7, 2026.
¡Casa Bonita Mi Amor!, 2024
March 8, 2026
Watched on Wednesday February 11, 2026.
The Muppet Show, 2026
March 8, 2026
Watched on Friday February 6, 2026.
Dogma, 1999
March 8, 2026
Watched on Monday January 26, 2026.
What Happened to Monday, 2017
March 8, 2026
Watched on Friday February 13, 2026.
Pete Holmes: Faces and Sounds, 2016
March 8, 2026
Watched on Sunday March 8, 2026.
Train Dreams, 2025 – ★★★★
February 25, 2026
Felt like a jim harrison story. In a good way.
Marty Supreme, 2025 – ★★★½
February 25, 2026
Not what i was expecting but good
Code 3, 2025 – ★★★★
October 19, 2025
Total surprise. Amazing.

Reading Notes

Read: Manufactured Anxiety: How Self-Improvement Became a Self Destruct Sequence
December 9, 2024
Who profits from our constant state of dissatisfaction? The answer, of course, is painfully obvious. Every industry that sells a solution to a problem you […]
Read: My Friends Aren’t Reading
December 9, 2024
the shifts have been in place for awhile. A certain kind of book—say those reviewed in the NYRB—will become like opera, or theater, or ballet, […]
Read: Pema Chödrön’s Three Methods for Working with Chaos
December 9, 2024
• No more struggle: “Whatever arises, train again and again in seeing it for what it is. The innermost essence of mind is without bias. […]
Read: Why Not Bluesky
December 9, 2024
The real problem, in my mind, isn’t in the nature of this particular Venture-Capital operation. Because the whole raison-d’etre of Venture Capital is to make […]
Read: The Tech Coup
December 9, 2024
. The EU invokes a mechanism called the precautionary principle in cases where an innovation, such as GMOs, has not yet been sufficiently researched for […]

Saved Links

RSS Error: A feed could not be found at `https://links.jimwillis.org/feed/atom?`; the status code is `404` and content-type is `text/html; charset=utf-8`