IBM Watson™ Discovery Service Ideas

We've moved...

You'll be redirected shortly, we've moved to our new idea portal:

Standard WDS crawler for websites

The standard WDS crawler supports filesystem, sharepoint and databases.

Since one of the few document types that WDS can ingest is HTML, it would make sense to have the standard crawler also support HTTP and HTTPS accessible websites.


On my recent project the customer had documents in filesystem and sharepoint and on their intranet and internet websites that they wanted to use via WDS.  To support this we had to write a custom crawler in Nutch, which was a lot of effort for what seems a common thing to want to do.

  • Mar 9 2018
  • Already exists
  • Attach files
  • Admin
    Phil Anderson commented
    May 9, 2018 05:23

    We already have this functionality via a Nutch plug-in: