IBM Watson™ Discovery Service Ideas

We've moved...

You'll be redirected shortly, we've moved to our new idea portal: https://ibm-watson.ideas.aha.io

Standard WDS crawler for websites

The standard WDS crawler supports filesystem, sharepoint and databases.

Since one of the few document types that WDS can ingest is HTML, it would make sense to have the standard crawler also support HTTP and HTTPS accessible websites.

  https://bigblue.aha.io/features/BNTO-31

On my recent project the customer had documents in filesystem and sharepoint and on their intranet and internet websites that they wanted to use via WDS.  To support this we had to write a custom crawler in Nutch, which was a lot of effort for what seems a common thing to want to do.

  • JAYSEN OLLERENSHAW
  • Mar 9 2018
  • Already exists
Why is it useful?
Who would benefit from this IDEA? As a customer I want to use WDS to query documents and HTML pages off my intranet and internet sites
How should it work?
Idea Priority
Priority Justification
Customer Name
Submitting Organization
Submitter Tags
  • Attach files
  • Admin
    Phil Anderson commented
    May 09, 2018 05:23

    We already have this functionality via a Nutch plug-in: https://console.bluemix.net/docs/services/discovery/adding-content.html#crawling-urls