“This Document Contains Renderable Text” Acrobat 8

09/03/2007

Do you ever get this message when trying to OCR a document? This means that the document has already gone through the OCR process, either completely or partially. If you find yourself having trouble editing text and you get this message then follow these steps:

  1. Open your PDF document and go to File > Save as
  2. For the ‘Save as Type’ choose TIFF, which is a type of image
    saveastiff.png
  3. Acrobat will make separate images for each page in the document.
  4. Next go to File > Create PDF from Multiple Files and choose your TIFF files. Alternatively, you can select all the TIFF files and drag them as a group onto the Adobe Acrobat icon, and Acrobat will ask if you want to combine them (you do).
    combineimages.png
  5. Then follow the normal OCR process: [How to Edit a Scanned Document in Acrobat 8]
About these ads

33 Responses to ““This Document Contains Renderable Text” Acrobat 8”

  1. Donna Gallaher Says:

    How can I implement the save as option in VB.Net. I need to save the PDF Files to postscript so finshing commands can be added before sending the file to the printer.

  2. admin Says:

    Hi Donna,

    I’m afraid VB.net is a bit outside the realm of our expertise. Perhaps you could try:

    The developer’s resource:
    http://partners.adobe.com/

    or the Adobe Forums:
    http://www.adobe.com/support/forums/

    Good luck.

  3. D. Peterson Says:

    Wow that’s a horrible kludge.

    What are you thinking of? There is no reason, from a user perspective that this makes any sense. The page I am trying to recognize, in an 800 page document, has nothing on it but a scanned image and an Acrobat footer with the page number. I am asking to recognize JUST THIS PAGE, and yet Acrobat refuses to cooperate.

    Yuck. So you want me to disassemble this entire file and reassemble it? How about fixing your bug instead?

    Thanks.

  4. Mitch Says:

    In your case I would look at how the document was originally scanned in. If you open the scanned PDF file and go to File > Properties and look for the PDF Producer. My guess is that it will say something other than Adobe Acrobat, probably the name of your scanner. If this is the case, this means that the PDF is third party and may not work correctly with Acrobat. Adobe has two recommended methods for scanning.
    1. Go to File > Create PDF > From Scanner > choose your scanner and click Scan or
    2. Scan to an image format with your scanner software and then convert to PDF.

    Using either of the above methods will not produce the “Renderable Text” error.

    Hope this helps,
    Mitch

    Please note that we are not affiliated with Adobe Systems.

  5. chriscoyier Says:

    @D. Peterson: I can see how this could be a bit frustrating, but it would be unnecessary to “disassemble the entire file and reassemble” it. You could extract the single page, use the above steps to save as a TIF and run OCR on that, and then delete the existing page and insert the new one. Just as easy on an 800 page PDF as a 10 page PDF.

  6. ederosia Says:

    One little bit of renderable text at the bottom of each page makes it impossible to OCR the thing! Frankly, the TIFF workaround is terrible. It’s difficult for me to think of a more tedious solution. Why can’t Acrobat simply IGNORE the renderable text?!? For the kind of money we paid for this program, I expect better solutions than “convert the entire document into TIFF and then import it back into Acrobat!” Honestly, this has been a problem for years and years. Please fix this problem.

  7. ederosia Says:

    Or, if ignoring the renderable text is somehow difficult for Adobe to do, how about a function within Acrobat that converts the entire document into bitmapped form? In essence, it would “flatten” the entire document (renderable text and all) into a bitmapped form. It would accomplish *within* Acrobat what the silly TIFF-export/import workaround accomplishes. For the user, this single extra step wouldn’t be a big deal.

    Let me add that Adobe has often described this issue in support forums as if it were a user problem. In essence, they have said, “The stupid user is trying to OCR a document that doesn’t need it!” (e.g., see http://acrobatsupport.com/document-contains-renderable-text) But, please understand, that we really do get it. The document really does need to be OCRed. It’s just that the OCR is prevented by a little bit of rendered text that someone has added somewhere to the document (e.g., a little notice at the bottom of the page). Don’t write us off as idiots. This error does not ONLY come when someone is trying to OCR a document that has already been OCRed.

  8. Mitch Says:

    ederosia,
    We agree with you – Acrobat’s OCR function is far from perfect. Currently our best workaround is the PDF to TIFF to PDF option.
    Please note that we are not affiliated with Adobe, we merely offer our suggestions to the Acrobat community as a free service.
    Mitch

  9. ederosia Says:

    My mistake. I thought you were affiliated with Adobe because of the URL and your use of the Acrobat logo.
    Can you please tell me what you think of this blog entry, written by an Adobe employee? It’s at http://blogs.adobe.com/acrolaw/2007/06/acrobat_81_update_fix_for_render.html The author describes a fix made by Adobe to this whole problem. However, I’ve tried the steps the writer recommends, and it didn’t solve the problem for me. Furthermore, I’ve read the Adobe Knowledge Base Article to which the author refers, and it doesn’t even refer to the fix he described. But, as I say, he seems at least semi-affiliated with Adobe. Can you comment on whether Adobe really has fixed this problem?

  10. Mitch Says:

    Although the blog author works for Adobe, his posts aren’t really official recommendations.
    -
    The 8.1.1 update addresses OCR, but only in regards to Asian language fonts:
    http://www.adobe.com/support/downloads/detail.jsp?ftpID=3796
    -
    There’s also an 8.1.2 patch out there. You may want to apply that. No guarantees on OCR improvement.
    http://www.adobe.com/support/downloads/detail.jsp?ftpID=3849
    -
    Best of luck,
    Mitch

  11. Bruce Anderson Says:

    Crop the page to remove the renderable text, then Acrobat renders the OCR.

  12. Xochi Says:

    I’ve worked with a lot of legal exhibits that have gone to court and come out of court with what some refer to as Court Branding. At the top of every page is blue text that identifies the document and page number. It is that bit of text that interfers with the OCR process. We are talking about thousands of documents (really) that need to be searchable. The simple solution is to delete the text, on each and every page, sometimes 300 or 400 pages. There has got to be a better way. Tonight, I came across a similar problem with Bates numbers digitally stamped at the bottom of the page. I could not delete that. Cropping 600 pages (tonight’s document) is time consuming and believe me this was a conglomeration of many different documents, different sizes, portrait and landscape. Cropping would have cut off text that needed to be searched. I guarantee you this is only the beginning, there will be many more of these types of situations. I don’t know how these digital stamps are generated and I would like to know an easier way then mentioned previously to get rid of them.

  13. anon Says:

    Just an FYI, Nitro PDF has an option to insert bates numbers, but obviously you guys need the opposite ability.

    Any software developers want to make some money, here’s a great idea for a simple utility, that removes them and does nothing else. If 1000 people paid 50$ for such a utility (and believe me, software customized for the legal industry is expen$$$ive) that would earn you fifty thousand.

    I might just have to program this myself. So what’s the best programming language for this task?

  14. Etienne Says:

    Simply print the file!

    1-Use a PDF printer (adobe, PDF factory, or whatever software to make PDF file)
    2-Print the file (make the PDF file)

    The text will be recognized… (you can copy/paste it)

  15. Jessica Says:

    The entire reason I have Acrobat Pro is so that I can make legal documents searchable and highlight without printing them out. It’s ludicrous to suggest that someone go through a hundred+-page document and do anything page-by-page. I guess I need other software.

  16. Alex Says:

    It was possible to print pdf files with rendered text using Microsoft Office Document Image Writer to get a single .tiff image (instead of saving pdf file as tiff file resulting multiple tiff images). I chose output format TIFF-Monochrom Fax SuperFine (300DPI) as a preference for document image (to decrease chances of additional errors in consequent OCR procedure). I needed to OCR a number of pdf files with rendered text. I used Print Files [with Microsoft Office Document Image Writer] as the first step of a batch processing. In the following batch processing step, I used directly tiff files (from the first step) as a source of OCR (without converting them to pdf files before OCR). The final files were saved as OCRed pdf files.

  17. karen Says:

    Alex — You mention at the end of your 3/22/09 comment using the TIFF files from the first step as a source for OCR… how do you get Acrobat’s batch sequence to use a TIFF file without it converting to PDF before OCR? Also, is this all one batch sequence, or do you have to split it out between printing to Image Writer/saving as TIFF as one, then OCRing the new TIFF as a second sequence? Your idea sounds like it worked for you. Thanks.

  18. Steve Says:

    Thanks for the tip -worked a treat for me.

  19. Mike Says:

    If the items causing an OCR problem are headers or footers, I have had success removing them by going to Document>Add Headers and Footers and clicking on [remove all]. Then OCR usually is possible.

  20. Mike G Says:

    Thx for the workaround, guys!
    Adobe: it’s shameful a big-name company like you guys can’t fix this little bug for your legions of adoring fans….

    >:-(

  21. desai Says:

    fella, you sure saved me one hell of an effort trying to translate from de>en. this makes my life so very easy.
    thanks, once again
    kind regards
    gaurang h. desai, india

  22. Kris Says:

    I tried everything in this thread but still couldn’t get OCR to work. Has anyone found an easy solution to the problem?

    Cheers,
    Kris

  23. Coke Says:

    Awesome!
    This worked perfect for me. Now I was able to launch the OCR option and it recognized all the text from the image. Great tip.

  24. Dej Says:

    It works!
    Print ‘renderable text’ document as pdf by Acrobat and do it whatever you want to :)

  25. dan Says:

    ONE STEP to Retrieve ALL the OCR TEXT

    “Save As” the “name.PDF document as “name_.DOC” or “name.TXT”.

    No need to TIFF.

    Dan

  26. Jesu Says:

    Thanks Dan, converting to DOC is much easier.

  27. Michelle Says:

    Worked perfectly, thanks for the info, this saved me a TON of time and frustration!

  28. P P Says:

    I am new to Acrobat x pro and these instructions worked well to get me going.

  29. MGF Says:

    Here’s my workaround: Print the document to another PDF, and in the print dialog box click “Advanced” and check the “Print as Image” box. When you open the resulting PDF you’ll be able to run OCR without an issue.

  30. Roosevelt Says:

    This method is very useful for dealing with pdf with trouble fonts. When you try to copy and the font is not there, the text comes out all gibberish. But using this technique and the help from OCR the texts are now perfect :)!

  31. facebook.com Says:

    It can be also beneath $40, so you are able to’t seriously beat that. You will get to know the sounds of the guitar as you play along with CD. In determining which finger to use, you will notice that in every dot marker there is a number below it which is in line with it as well.

  32. joanne Says:

    I have found that by going to the FILE menu in Adobe and choosing “Export” and then instead of going directly to Word, choose “Text” and then “Plain Text,” Adobe will create a workable document–I think it’s in Notepad– that can be edited, eve skipping the OCR step if it does not work. What i do is, after the .txt document is created, I go in there, hit “Control A” which selects everything, then hit Copy and then pull up a new Word document, hit Control V which pastes it in there, and then all I have to do is correct the formatting.

  33. Johnd505 Says:

    Hi there. Merely desired to question an instant dilemma. eabebakefefe


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.