Imagine that you are a freelance translator, and your customer asked you to translate a file in PDF format. As usual, PDF files are recognized, and it is not a problem to count words. Just copy the text to MS Word and perform word counting using a built-in word count tool. So, you implicitly agree on this job. But when you get this PDF file and open it, you understand that it is unrecognized. So how to do a word count in unrecognized PDF?
There is possible to combine in PDF both recognized text and unrecognized images. Let’s also imagine that, unfortunately, you disagreed with your customer that for scan jobs, you are paid on a per hour basis, and therefore your customer demands a job to be done on a per word basis. So, you need to count words in this PDF file in any way. How can you perform this? There are two methods to count words in PDF files: free of charge and paid…
Let’s begin from the free of charge method. So, to count the unrecognized PDF files, you need to recognize them at first. It is cool if you have already bought some good paid OCR programs like Abbyy FineReader or Adobe Acrobat Professional, a built-in OCR tool. But we are reviewing free of charge ways to count unrecognized PDF files, and therefore we need to get a free OCR tool to recognize your PDF file.
After searching for free OCR tools, I chose FreeOCR because this program can recognize PDF files. You can download Free OCR at http://www.paperfile.net/download.html
After installation (by the way, FreeOCR requires the .Net Framework V2.0 from Microsoft installed) to run the program. You will get a window like on the screenshot attached. To recognize a PDF file, click the Open PDF button, choose your PDF file, choose OCR language, and then click the OCR button. After recognition, export the text, which you have got, to Word.
Get some statistics using the MS Word built-in tool (MS Word 2007, click Review > Word Count).
But I would like to draw your attention to that downloaded FreeOCR has only the English OCR language installed. More OCR languages you can find out on http://www.paperfile.net/lang.html
So, let’s see the summary of this free way:
You can also submit your file to a free online OCR at http://www.free-ocr.com/ (OCR available only for English, German, French, Italian, Dutch, or Spanish). This method has almost the same pros and contras as the previous method, plus there is a more significant risk for the safety of your information, and you should wait while your file will be downloaded on the website.
Notice that you need to have good quality and resolution of images in your unrecognized PDF file to ensure the most accurate word count for all methods considered in this article.
Now I propose to consider the paid alternative. There is software which has been developed especially for counting. As an example, I will consider AnyCount software. This program can count words, characters, and lines in 70 formats. Also, it can count words in unrecognized PDF files. To perform word count in such a PDF file, you need to choose a PDF Graphic Recognition language and click the Count button. The program will recognize your PDF file and count it automatically.
You can evaluate the program by downloading it from https://www.anycount.com/try-free/.
The summary of the paid way of word count in unrecognized PDF files is the following:
So, as you may see, the free variant of word count in unrecognized PDF is reasonable to use as a temporary and quick one-time solution. If you need a swift and extensive word count (or any other statistics, like character and line count), it is better to use professional word count software.