NLUlite – An NLP Database
Written by Kay Ewbank   
Thursday, 11 September 2014

A new natural language parsing database that reads English texts and can then answer questions about them has been released as a public alpha.

NLULite has been created to be developer friendly, and consists of a server and a Python client. You use it by passing texts to it. The text is tagged using the tag frequencies provided in the Open American National Corpus (OANC). Sentences are then parsed by using parsing frequencies extracted from the OANC. A “distance” between words is obtained by using the Wordnet corpus (3.1). The parsing is then improved by choosing the sentences that make more sense according to the Framenet dataset.

As an example of the way it works, if you pass it the text from Wikipedia about snakes, it would then be able to answer questions such as:

what are the snakes able to do?

where do most of the snakes live?

what animal has no limbs?

 

Texts can include simple inference rules such as “If an animal has no limbs it cannot walk”, after which you (or a subsequent user) could ask “what does not walk”, and get an answer given in terms of the text submitted and the inference rules you’ve given.

 

Data sources can include web pages and RSS feeds. The data is kept as objects of the ‘wisdom’ class. Your code can set up many Wisdom objects, and each one is a separate knowledge base. Currently, you can only use NLUlite to parse texts that are smaller than a megabyte, though the developer plans to increase this in future versions. Once the text is parsed, the information is stored as XML.

NLULite is available in a single-threaded free version, or in a commercial multi-threaded version that parses pages much faster.

While there are a number of natural language projects, such as the Stanford Natural Language Processing Group, and the Natural Language Toolkit, this field is still developing.

More Information

NLUlite

Related Articles

Handbook of Natural Language Processing, 2nd Ed (book review)

Taming Text (book review)

 

To be informed about new articles on I Programmer, install the I Programmer Toolbar, subscribe to the RSS feed, follow us on, Twitter, Facebook, Google+ or Linkedin,  or sign up for our weekly newsletter.

 

Banner


Google Releases Logic Programming Language
14/04/2021

Google has announced a new open source logic programming language. Logica is a successor to Google's existing logic language, Yedalog, and is a Datalog-like programming language.



Visual Studio 2022 Will Be 64-bit
20/04/2021

Microsoft has released details of Visual Studio 2022, a preview of which will be released this summer. The headline change is that the new version will be 64-bit.


More News

 

square

 



 

Comments




or email your comment to: comments@i-programmer.info

 

Last Updated ( Thursday, 11 September 2014 )