This is a cache of https://developer.ibm.com/articles/introduction-watson-discovery/. It is a snapshot of the page as it appeared on 2025-11-14T13:26:02.737+0000.
With IBM Watson Discovery, you can ingest, normalize, enrich, and search your unstructured data (JSON, HTML, PDF, Word, and more) with speed and accuracy. It packages core Watson APIs such as Natural Language Understanding and Document Conversion along with UI tools that enable you to easily upload, enrich, and index large collections of private or public data.
The following image shows a high-level view of all of the components that make up the Discovery pipeline.
Note: Watson Discovery V2 was released in August 2021. For existing V1 users, you can compare the features of V1 and V2.
Terms and concepts
Watson Discovery service terms
This section covers the terms and concepts that are specific to Discovery.
A project is a convenient way to collect and manage the resources. You can assign a project type and connect your data to the project by creating a collection.
Discovery contains a powerful analytics engine that provides cognitive enrichments and insights into your data. These enrichments include entities, keywords, parts of speech, and sentiment.
This refers to Discovery returning a set of data values, such as the top values for selected enrichments. For example, it can return the top 10 concepts that appear in a data collection.
Use Smart Document Understanding to break your documents into smaller, more consumable chunks of information. When you help Discovery index the correct set of information in your documents, you improve the answers that your application can find and return.
The extensive set of UI tools available from the IBM Cloud console that you can use to create and populate your collection, apply enrichments, and query and test your data.
Enrichments
Discovery has a powerful analytics engine that provides cognitive enrichments and insights into your data. With built-in natural language processing (NLP) capabilities, it can extract enrichments from a wide range of document types, such as JSON, HTML, PDF, and Microsoft™ Word. The following table shows the key enrichments.
For specific types of documents, Discovery can provide detailed information about tables and table-related data.
The ability to make natural language queries on these enrichments provides an advantage over typical keyword search engines.
Data flow
In Watson Discovery, data flows through one of three stages: Acquire, Enrich, and Analyze and Search.
Acquire: Data is ingested from one or more data sources, stripped of unnecessary content (such as graphics and formatting), and then passed to the Enrich stage.
Enrich: The data (typically, a mix of structured and unstructured text) is processed using techniques such as natural language processing and machine learning to provide meaning and context to the raw text. This enriched data is stored in collections.
Analyze and Search: Uses the enriched data from one or more collections to conduct discovery and exploration or to enable expert assistance through search-based applications.
Architecture
A common way to use Discovery is by accessing the Discovery APIs from your application. The Watson team releases SDKs that support many programming languages so that you can use Discovery easily in a web or mobile application.
All of the data content is stored and enriched within Watson Discovery collections. The data does not require any specific structure and can come from multiple public and private data sources. Every Discovery project comes with a pre-enriched data collection named Sample Collection (comprised of software installation manuals).
Optionally, with Watson Knowledge Studio, you can use domain experts to help customize Discovery to better understand the unique entities and relationships in your specific industry or organizational data.
Some typical use cases for Discovery include:
The need to search thousands of product reviews at once: Create a Discovery collection and build a UI to query the collection and graph the sentiment over time.
The need to programmatically find text within a document: Use the passage retrieval feature of Discovery to create an FAQ chatbot.
There are thousands of documents in different formats and you need to organize them logically: Use Discovery to pull out keywords, concepts, and relationships to sort them.
Accessing Discovery
Tooling
As mentioned previously, Discovery has its own set of tooling that is available through IBM Cloud or IBM Cloud Pak for Data and which provides a UI to manually manage your Discovery projects and collections.
The following video shows how you can use the tool to create a new project, collect data, and then ingest data files for enrichment.
The following Node.js code sample shows how to authorize and query your Discovery project on IBM Cloud. If you need more information on query concepts, look at the Discovery documentation.
This article gave you an introduction to Watson Discovery, where you can ingest, normalize, enrich, and search your unstructured data (JSON, HTML, PDF, Word, and more) with speed and accuracy. It packages core Watson APIs such as Natural Language Understanding and Document Conversion along with UI tools that enable you to easily upload, enrich, and index large collections of private or public data.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.