IBM Watson™ Discovery Service Ideas

We've moved...

You'll be redirected shortly, we've moved to our new idea portal: https://ibm-watson.ideas.aha.io

NLQ should not search numeric fields that specify e.g. offset of entity within text

The json document https://ibm.box.com/s/32mis3zwbzk3t913wdrbmcl30i3at129 ingested into WDS ends up with this out of the box NLU detected "Jobtitle" entity, with location offset ending at 2016:

    {
                        "count": 2,
                        "text": "program manager",
                        "mentions": [
                            {
                                "text": "program manager",
                                "location": [
                                    139,
                                    154
                                ]
                            },
                            {
                                "text": "program manager",
                                "location": [
                                    2001,
                                    2016
                                ]
                            }
                        ],
                        "relevance": 0.495976,
                        "type": "JobTitle"
                    },

 

When I issue the NLQ query  "2016" thinking that NLQ will only search ALL TEXT FIELDS, this document is returned. No other mentions of 2016 is in the document.

Enhancement request is that WDS NLQ not search fields that do not represent CONTENT of the document. An numerical offset does not reflect the contents of the document and is causing false positives in search results.

  • Guest
  • May 13 2019
  • Future Consideration
Why is it useful?
Who would benefit from this IDEA? All customers, but this is an issue that was discovered with Project Daisey
How should it work?
Idea Priority
Priority Justification
Customer Name
Submitting Organization
Submitter Tags
  • Attach files
  • JAYSEN OLLERENSHAW commented
    20 Sep 01:10

    Count and location in this example are both not document content.  I would also argue that the text of the entity should be included in the search, but the text of the entity mentions might be redundant since that text is already in the top level fields that were processed by NLU.