A common method for making PDF documents is to place a paper copy of a document into a scanner and view the newly-scanned document as a PDF with Adobe Acrobat. Unfortunately, scanners only create an image of text, not the actual text itself. This means the content is not accessible to users who rely on assistive technology. Additional modifications must be made to make the document accessible.
If the PDF document is not a scanned document or it has previously undergone optical character recognition (OCR), skip this discussion and proceed to “Step 4: Add Form Fields and Set the Tab Order”.
There are many ways to determine if a PDF file originated from a scanned page:
The Page Appears to be Skewed
Sometimes sheets are not properly fed into the scanner. The result is the page appears to be crooked, or skewed on the screen . Lines of text will not be straight but will appear to slant up or down.
Search for Characters that Appear on the Page
Use the find command in Acrobat to search for text that appears on the page. Select Edit > Find and type a term that appears on the page in the search field.
If the document was scanned, Acrobat will not find the search item but will display the message: “Acrobat has finished searching the document. No matches were found.”
Zoom in and Check for Jagged Edges on Smooth Characters
Scanned images are bitmaps (See “Figure 6. Bitmapped Text Appearance”). The edges of curves on bitmapped images will not appear to be smooth or rounded but will be jagged, as shown in the sample illustrating the word “Writing” in Figure 6. Use the Marquee Zoom tool in Acrobat to define the area and magnify the edges of curved letters such as “c”, “s”, and “o”. Text that has undergone the OCR process using the ClearScan option will display edges that are smoother but still uneven or lumpy where there should be smooth curves, as shown in the illustration of the of the words “Quality” and “region” in Figure 7.
Acrobat Pro DC can detect the presence of assistive technology, and if it encounters a scanned document, Acrobat will announce an audible empty page warning and display the Scanned Page Alert dialog (See “Figure 8. Scanned Page Alert and Recognize Text Dialogs”).
Perform Optical Character Recognition (OCR) to convert the bitmap image of text to actual characters. In Acrobat Pro DC, this can be performed two ways:
- Select “OK” from the Scanned Page Alert dialog after opening the document for the Recognize Text dialog (See “Figure 8. Scanned Page Alert and Recognize Text Dialogs”).
- By selecting Tools > Action Wizard > Make Accessible > Recognize Text using OCR (See “Figure 9. Recognize Text - Settings”).
There is an option of recognizing the entire document, the current page, or a range of pages within the document. Use the Edit button in the scanned page dialog to set the desired characteristics for the resulting file. The “Recognize Text—General Settings” dialog appears also when the Make Accessible Wizard is run. The options to choose are:
- Primary OCR Language: Acrobat does not recognize a document’s language itself—a user must indicate which language is used.
- PDF Output Style: This option should be set to ClearScan. ClearScan will allow the resulting PDF to “reflow”. Reflow allows the text on the page to be enlarged without displaying horizontal scroll bars. As the text size increases, the text wraps so content is not lost in the margins. The other two options, “Searchable Image” and “Searchable Image Exact”, will also work with assistive technology but will result in a PDF file that does not reflow.
- Downsample to: Downsampling should be set to the highest resolution as measured in dots per inch (dpi). This should be 600 dpi.
For additional information on performing optical character recognition using Adobe Acrobat, refer to the Acrobat Pro DC Help.
Proceed to Step 4: Add Form Fields and Set the Tab Order.