IBM Releases Deep Search For Scientific Discovery

Written by Nikos Vaggalis

Tuesday, 16 August 2022

IBM's Deep Search for Scientific Discovery (DS4SD) Toolkit has been made available to the public. It comes from the depths of IBM's research labs using NLP to analyze mass amounts of data.

Deep Search is a cloud-based AI research service offered as a SaaS that allows researchers to load large amounts of structured or unstructured data to immediately find useful connections. The sources that Deep Search can consume vary and range from journal articles to patents to technical reports and more. By using AI and NLP it can ingest 20 pages per second whereas a typical human expert takes 1–2 minutes per page just to read, and automatically extracts the semantic units and their relationships. It then builds a searchable knowledge graph which enables its users to:

robustly explore information extracted from tens of thousands of documents without having to read a single paper.

As such it has been widely adopted in the scientific field, for instance on Covid research or for alternative cancer treatments by working out the connections between individual research papers, or discovering new molecules. Of course, the use cases are not constrained to the medical research sector but can be applied anywhere there is data like documents, legal briefs, financial statements, technical specifications, research papers, slide decks, you name it.

deepsearch1

IBM has made available part of the service in the form of a toolbox , calling it Deep Search for Scientific Discovery (DS4SD). This toolbox is broken down into two parts, Deep Search Experience and Deep Search Toolkit.

The Deep Search Experience is the automatic document conversion service which allows users to upload documents to inspect a document’s conversion quality, using a simple drag-and-drop interface that makes it very easy for non-experts to use. This part is not open sourced but has been made publicly available online for anyone to use. To work with the Deep Search Experience service,you upload your document and then let it work its magic:

Inspects the data that can be extracted from one of your documents. Your document is decomposed on the spot, cut into pieces of text, images, and tables. Numeric data, entities, and their relationships are then inferred from these pieces.
Searches and collectes data from preprocessed document collections. These data include structured text, numerics, entities, and their relationships.
Processes data into usable information in your workspace , where you connect documents with curated knowledge from databases. The resulting knowledge graphs enable queries and analyses that span the entities and relationships that are described in both your documents and domain-specific databases.

The Deep Search toolkit, on the other hand, is an open source Python package allowing users to interact with the Deep Search platform by programmatically uploading and converting documents in bulk. They can point to a folder and direct the toolkit to upload the documents, convert them, and ultimately analyze the contents of the text, tables, and figures. The Deep Search Toolkit is available as a PyPI package. It can be installed using the standard Python package managers like pip, poetry, etc.

The Deep Search Experience is reachable at

https://ds4sd.github.io/

while you can find the Python DeepSearch Toolkit on its repo.

The wider context is that we are entering an era where AI evolution and advancements in Computer Science will play a crucial role in bringing society forward.That's the one ingredient necessary for success; the other is the democratization by open sourcing those tools in order to make them available to as many brains as possible, increasing multi-fold the chances of making a groundbreaking discovery and so changing the world for the better.

More Information

https://ds4sd.github.io/

https://github.com/DS4SD/deepsearch-toolkit

Artificial Intelligence, Machine Learning and Society

Take Stanford's Natural Language Understanding For Free

Take Stanford's Natural Language Processing with Deep Learning For Free

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Google Releases Python Client For Data Commons
01/07/2025

Google has released a new Python client library for Data Commons based on the V2 REST API. They say the new library enhances how data developers can make use of Data Commons.

+ Full Story

Windows 11 Overtakes Windows 10 - But Not In Europe
08/07/2025

With the end of support of Windows 10 just three months away, Windows 11 has finally edged ahead of Windows 10 in terms of Desktop Windows Version Market Share on a Worldwide Basis. In Europe, h [ ... ]

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 16 August 2022 )

Recent Articles

Recent Book Reviews

Popular Articles

More Information

Related Articles

Comments