IBM Watson™ Discovery Service Ideas

We've moved...

You'll be redirected shortly, we've moved to our new idea portal:

Ability to disable or customize stemming

Watson Discovery uses stemming to identify valid matches. Based on our experience it appears that stemming is applied in both Natural Language Queries and also Discovery Language Queries. It seems to also enforced in phrase searching. This is useful sometimes but there are other cases where this approach makes it very difficult to find relevant matches. For example, in one of our collections we have several documents with the term "DCS" and several with the term "DC". These are not related terms and there is no reason for our users to get both in a single query. However, if you search for either DCS or DC, you will get both and there is no way to filter our the undesired matches because they seem to be interpreted by WDS as the exact same term.

We think WDS should not enforce stemming when using phrase searching, as this type of search is primarily used to specify exact matches such as titles or excerpts of a document. Alternatively, we would like to have some control over when stemming should be applied or what words or terms should not be stemmed.

  • Eduardo Kaufmann-Malaga
  • Jun 26 2019
  • Future Consideration
Why is it useful?
Who would benefit from this IDEA? "As a customer I would like to be able to do searches that only retrieve exact matches
How should it work?
Idea Priority
Priority Justification
Customer Name
Submitting Organization
Submitter Tags
  • Attach files
  • Admin
    Phil Anderson commented
    24 Apr 06:34pm

    I'm sorry for the confusion, it appears you are right and double quotes does not disable stemming.

  • Michael Long commented
    24 Apr 04:47pm

    Hello Phil,

    After reading your post I tried this Support Search, title:"integrator", from CSP after enabling "Advanced Query Syntax". The search results includes items such as this title- 'Integration for Application Integration'. This search did not return only items with "integrator" in the title.

    Perhaps I misunderstood your post and the CSP Playbook instructions. Or maybe this is just an implementation hurdle that gives me and many Support Team members an unnecessary amount of grief and wasted effort.

    NLQ should not be the default search option for our Support Team research as it returns a huge number of false positive results. NLQ greatly devalues the benefits and increases the filtering time.

    Thank you for your attention to this RFE.

  • Kevin Baldwin commented
    24 Apr 12:56pm

    Hello Phil, that doesn't work from Support Search with Watson which is why Eduardo opened this request.

    If you're saying a "quoted" search should disable stemming, it sounds like SSwW is doing something wrong when it submits a query.

    I'll try to find out what a query looks like from SSwW when we use it's Advanced Query Syntax option and perform a quoted search.

    Thanks very much.

  • Admin
    Phil Anderson commented
    24 Apr 12:35pm

    Hi Kevin - you can already disabled stemming today via double quotes, just use the Query parameter instead of the Natural Language Query parameter

  • Kevin Baldwin commented
    23 Apr 09:52pm

    Hello Phil,

    Thanks for the prompts feedback.

    The need to prioritize this requirement has already covered by the wording of the orginal submission and subsequent comments from Support Search with Watson users.

    These users are IBM Support professionals that use SSW on a daily basis to search data from a variety of sources, mainly for the purpose of finding information used to solve hardware and software problems submitted to IBM Support by our clients.

    It would be almost impossible to quantify, but I can say with certainty that we spend a significant amount of time trying to "see the wood for the trees" because stemming delivers additional results that we have no interest in using,

    The argument that "relevance" sorting of results can overcome these problems doesn't help when we often use other sort orders to display results.

    The simple ability to know that a "quoted" search will only return results relating to the string inside the quotes by not enforcing stemming or giving us the option in a query sunmission to indicate stemming is, or is not required, would fix this problem.

    I could go "vote hunting" by way of a blanket bombing email run of 17.000'ish users and invite them to add comments, but these comments would most likely be "variations on a theme".

    You suggested I could... "provide the Offering Managers revenue or other info to help with the prioritization".

    What "revenue" are you referring to?
    Other than the information already provided here, and/or collecting votes, what else should I be looking for?

    Would it help if I got some Support Executives involved?

    Regards, Kevin

  • Puneet Mahajan commented
    23 Apr 09:34pm

    Phil, this is one of the basic things which is being asked here in this request. We use very exact words in a products vocabulary. 1000s of IBM support engineers rely on the search function everyday. If you add the extra cycles each one of them has to spend because of the basic search function is lacking, it makes a no brainer case. In my support org alone, we have 350+ engineers dependent on this.
    Thanks, Puneet

  • Admin
    Phil Anderson commented
    23 Apr 09:00pm

    Hi Kevin,

    Future Consideration means "it's a good idea, we will consider it for our roadmap". High voted features like this we consider each quarter based on a number of factors. If you want to accelerate a feature getting on our roadmap you can provide the Offering Managers revenue or other info to help with the prioritization.


  • Kevin Baldwin commented
    23 Apr 08:49pm

    This request appears to have a status of "Future Consideration".
    What exactly does that mean?

    Are we far enough away from the time when that status was set to now be in the future for it to be considered?

    How much longer will it be before users of "Support Search with Watson" have the luxury of seeing results that reflect the search criteria we've used without suffering from the pain introduced by stemming?

    Is there a need to go vote hunting in the IBM Support Organisation to get eyes on this requirement?
    Please let me go so I can do that if it's a necessary evil.

  • Sara Elo Dean commented
    13 Jan 07:44am

    Stemming should be replaced with lemmatization in Discovery, as lemmatization takes into account  the part of speech of the original word.

  • Michael Long commented
    12 Dec, 2019 05:14pm

    We need a support search feature that works natively as do most common search tools.  The current Support Search tool returns a vast number of "false positives" due to stemming and lemmatization. When we search with the word "integrator" we do not want results that contain the words "integration" or "integrated" but not our original search. 

  • Michael Long commented
    12 Dec, 2019 05:12pm

    We need a support search feature that works natively as do most common search tools.  The current Support Search tool returns a vast number of "false positives" due to stemming and lemmatization. When we search with the keyword "integrator" we do not results that do not contain that keyword but contain the words "integration" or "integrated". 

  • alexandre blancke commented
    12 Dec, 2019 04:53pm

    Yes, that is not an idea, that is a requirement,

    when I search for "efs" I don't want to see results matching "EF", because

    then the actual match I should get are lost in the number of matches I don't expect.
    We are support and looking for exact match on technical things.
    Not looking for approximate random results

  • Guest commented
    27 Jun, 2019 04:36pm

    Do NOT solve this with words or terms that should not be stemmed. Using quotes (phrase searching" should be able to stop stemming on any word you choose. This is standard practice and anything else would likely lead to unexpected results in the future. Keep in mind that counter-intuitive search engine results can quickly make IBM support look "stupid" because, say, we don't even know about our own documentation.

  • and 38 more