This idea has been merged into another idea. To comment or vote on this idea, please visit WDS-I-72 Ability re-process documents without re-ingestion.
Currently, in several customer deployments of Watson Discovery Service currently implementing their MVPs, collections will be thought to contain hundreds of thousands of documents (even millions). Typically the modelling tasks to model custom and domain related analysis is thought to be refined in defined time slots so to obtain more and more accurate models over time. This currently imply that in order to apply the new model to a given collection, so to obtain the new textual analysis, the only way is to drop the current collection, create a new one configured with the new model and reinvest all the documents in the new collection. The impact of this option could be very expensive in a production environment while it could be really minimised if a suitable API to reprocess all the collection content would be available.
Why is it useful?
|Who would benefit from this IDEA?||At least a couple of customers in Italy|
How should it work?