IBM Watson™ Discovery Service Ideas

We've moved...

You'll be redirected shortly, we've moved to our new idea portal: https://ibm-watson.ideas.aha.io

Stop words should not impact phrases in Discovery Query Language

WDS enables you to define your own set of stop words. We recently noticed that the stop words are removed from searches even when using phrases with the Discovery Language syntax. For example if we search for the phrase "we the people" and we and the are part of the stop words then they will be removed from the search. We understand stop words being ignored by the Natural Language syntax but when stop words are combined in a particular order within a phrase they can be very useful to find relevant content. We think that phrases should not be impacted by stop words. Phrases are meant to be considered as string literals and Discovery should not modify them.  

  • Guest
  • Mar 7 2019
  • Needs review
Why is it useful?
Who would benefit from this IDEA? As a customer I want to be able to search for phrases without being modified by the Discovey Language parser
How should it work?
Idea Priority
Priority Justification
Customer Name
Submitting Organization
Submitter Tags
  • Attach files
  • Michael McCawleuy commented
    08 Mar 12:43

    Actually, I don't think this is possible as written.  I'm posting as a customer, not as a Watson engineer here, but  stopwords are indextime directives, so this means these terms are removed from the document to simplify it BEFORE it's indexed.  If the words are gone, no fiddling with query interpretation could put them back.

    I totally agree with the need, though.  In technical or commerce domains, you often have search goals about products that have odd names, or technical jargon.  Finding Volvos with "i Drive" is impossible.

    Something better might be the Common Terms feature of elastic, such as described here:

    https://www.elastic.co/blog/stop-stopping-stop-words-a-look-at-common-terms-query

    and then we can both remove stopwords from body, but enrich documents with common terms they should respond to in separate fields.  Would this work for you?