IBM Watson™ Discovery Service Ideas

We've moved...

You'll be redirected shortly, we've moved to our new idea portal: https://ibm-watson.ideas.aha.io

Make ingestion independent from document order

Currently, the data type of the indexed fields is determined when the first document is uploaded (or when the indexed field appears for the first time). If the data type is not unique in the first document, the full document is not indexed and thus not ingested.

As an example, consider a document with metadata stored in a key-value dictionary like

"dict" : [ 
{ "key" : "author", "value" : "John" },
{ "key" : "year", "value" : 2018 },
{ "key" : "confidential", "value" : true}
]

The elements of the dict array all contain a field "key" with value of type string and a field "value" with variable type string, numeric and boolean.

  • Ingesting a document with the above metadata as first document will fail, as the type of the "value" field is not unique.

  • But ingesting a document with just the "author" field as first document, and than one with the above three fields will work.
    The field metadata.dict.value will be set as STRING, so in the UI it will not be possible to enter a query like "metadata.dict.value > 2018", although via API it will work.

  • However, ingesting a document with just the "year" field as first document, and than one with all the three fields will fail.

We ask to make the ingestion independent from the order the documents are ingested, either by allowing multiple types on a field, or by giving the possibility to specify the schemas of the expected documents.

  • Sandro Corsi
  • Jun 20 2018
  • Needs review
Why is it useful?
Who would benefit from this IDEA? As a customer, I want WDS to behave in a way that does not depend on the order the documents are ingested.
How should it work?
Idea Priority
Priority Justification
Customer Name
Submitting Organization
Submitter Tags
  • Attach files