IBM Watson™ Discovery Service Ideas

We've moved...

You'll be redirected shortly, we've moved to our new idea portal: https://ibm-watson.ideas.aha.io

Smart Document Understanding to produce a readable PDF from original scanned PDF

In solutions where text archives are digitized to become searchable in Watson Discovery, Smart Document Understanding is a promising new feature. However, it does not by side effect produce a searchable (text-based) PDF when OCRing a scanned PDF. This is a feature of IBM's BACA system. It allows for user to open, after finding a relevant document via Discovery,  what looks like the original scanned PDF but as a searchable/text-based PDF where user can now apply control-F to search where the term or phrase of interest occurs, and copy text passages into clipboard etc. 

Potential benefit is not estimated, but the benefit is clear for any Discovery use case when searching large enterprise collections that have been OCRd.

  • Sara Elo Dean
  • Jan 13 2020
  • Planned
Why is it useful?
Who would benefit from this IDEA? As an analyst, I have searched my company's digitized archives (or previous projects, history, contracts) and now am able to search within the original document the larger context of the passages matching my search results.
How should it work?
Idea Priority
Priority Justification
Customer Name
Submitting Organization
Submitter Tags
  • Attach files
  • Admin
    Christophe Guittet commented
    30 Mar 12:30pm

    This capability to search within OCRed document using Discovery will be available in May release through the "Rich Preview" feature. https://bigblue.aha.io/features/WDS-789

  • and 5 more