IBM Watson™ Discovery Service Ideas

We've moved...

You'll be redirected shortly, we've moved to our new idea portal: https://ibm-watson.ideas.aha.io

Ability to split paragraphs inside a document

Several documents rely on a structure that leverages a single section for a bunch of unrelated paragraphs, we'd like to do document splitting based on paragraph marks at ingestion time.

 

This could be accomplished by allowing other tags (not only H1, H2, Hx...) to split a document.

  • Renato dos Santos Leal
  • Apr 13 2018
  • Attach files
  • Percy Shi commented
    April 20, 2018 23:34

    Using predefined tags as an option to define the desired boundary of a paragraph(passage) will be very helpful to get a self-explainable answer from WDS.

     

    The current passage level function seems much to be based on standard html tag(<p>), and guessing the paragraph/passage boundary by the trailing \r\n and the leading space of the following text line. This approach is not able to reserve the context from the "malformatted" documents(most technology manuals, troubleshooting guidelines, administration guidelines etc.) where natural language and computer language are intermingled, hence the common format of boundary of a paragraph is not achievable.