IBM Watson™ Discovery Service Ideas

We've moved...

You'll be redirected shortly, we've moved to our new idea portal: https://ibm-watson.ideas.aha.io

Ability to disable or customize stemming

Watson Discovery uses stemming to identify valid matches. Based on our experience it appears that stemming is applied in both Natural Language Queries and also Discovery Language Queries. It seems to also enforced in phrase searching. This is useful sometimes but there are other cases where this approach makes it very difficult to find relevant matches. For example, in one of our collections we have several documents with the term "DCS" and several with the term "DC". These are not related terms and there is no reason for our users to get both in a single query. However, if you search for either DCS or DC, you will get both and there is no way to filter our the undesired matches because they seem to be interpreted by WDS as the exact same term.

We think WDS should not enforce stemming when using phrase searching, as this type of search is primarily used to specify exact matches such as titles or excerpts of a document. Alternatively, we would like to have some control over when stemming should be applied or what words or terms should not be stemmed.

  • Eduardo Kaufmann-Malaga
  • Jun 26 2019
  • Future Consideration
Why is it useful?
Who would benefit from this IDEA? "As a customer I would like to be able to do searches that only retrieve exact matches
How should it work?
Idea Priority
Priority Justification
Customer Name
Submitting Organization
Submitter Tags
  • Attach files
  • Guest commented
    June 27, 2019 16:36

    Do NOT solve this with words or terms that should not be stemmed. Using quotes (phrase searching" should be able to stop stemming on any word you choose. This is standard practice and anything else would likely lead to unexpected results in the future. Keep in mind that counter-intuitive search engine results can quickly make IBM support look "stupid" because, say, we don't even know about our own documentation.

  • alexandre blancke commented
    December 12, 2019 16:53

    Yes, that is not an idea, that is a requirement,

    when I search for "efs" I don't want to see results matching "EF", because

    then the actual match I should get are lost in the number of matches I don't expect.
    We are support and looking for exact match on technical things.
    Not looking for approximate random results

  • Michael Long commented
    December 12, 2019 17:12

    We need a support search feature that works natively as do most common search tools.  The current Support Search tool returns a vast number of "false positives" due to stemming and lemmatization. When we search with the keyword "integrator" we do not results that do not contain that keyword but contain the words "integration" or "integrated". 

  • Michael Long commented
    December 12, 2019 17:14

    We need a support search feature that works natively as do most common search tools.  The current Support Search tool returns a vast number of "false positives" due to stemming and lemmatization. When we search with the word "integrator" we do not want results that contain the words "integration" or "integrated" but not our original search. 

  • Sara Elo Dean commented
    13 Jan 07:44

    Stemming should be replaced with lemmatization in Discovery, as lemmatization takes into account  the part of speech of the original word.