Lucida For Personal Artificial Intelligence
Written by Nikos Vaggalis   
Wednesday, 07 September 2016

The Clarity Labs team of researchers at Michigan University made headlines last year with the release of its own IPA (Intelligent Personal Assistant), called Sirius. Sirius was mistakenly regarded by many as the open source version of Apple's Siri, but that wasn't the case since the two projects are totally unrelated. Maybe that's one reason for rebranding Sirius as Lucida.

As Jeremy Russell, a member of the core team, puts it:


When Sirius was started it was more of an afterthought prototype for research done on what hardware platforms work best for an IPA (Intelligent Personal Assistant), it seemed that a re-brand would be a good idea once it was decided to focus on the AI platform itself

 

A little bit of history is necessary in order to fully comprehend that statement. Initially, the Sirius project was founded to facilitate benchmarking and extending research into future server architectures that can handle the astronomical workload that the Cloud platforms supporting Machine Learning as a service, on which all IPAs (Apple’s Siri, Google’s Google Now, Microsoft’s Cortana, or Amazon’s Echo) rely upon, are put under.

As the current datacenter architectures reach their computational limits, Sirius comes forth with a brand new proposition. For the  datacenters to continue doing their job but without having to scale up  stacking up on hardware, they can do so by leveraging highly optimized and dedicated algorithms.

 

 

Lucida however aims to be more than that, and goes beyond what Sirius  achieved. Built on Sirius' foundations, it has evolved into the next, more intelligent. generation with modularity and extensibility in mind.

It still remains a provider of speech recognition, image matching, natural language processing and question-and-answering services, but due to the newly found modularity it can now allow for all or any of its main components, Automatic Speech Recognition (ASR) , Image Matching (IMM) or Question-Answering System (QA), to be modified or completely replaced by custom made components.

Say, for example that a researcher has come up with his own speech recognition engine, he can now just replace Lucida's ASR component with his own so that he still takes advantage of the rest of Lucida's backend components. Or, in another case where he might not be interested in the Image Matching component, he can  remove it and work with a bare bones version of Lucida instead.
 
Attempting to describe Lucida's workflow in a sentence would be: 

Lucida consumes queries in the form of speech or image and answers in the form of natural language,just like assigning a task to a human assistant.

A prime demonstration of that fact can be experienced in the following promotional video where a human operator talks to a Lucida powered tablet, asking it a series of questions in natural language:

Who's the author of James Bond? to get a reply of Ian Flemming

The next two questions,

When was Google's IPO?'(!)

followed by

Who invented peanut butter?

highlight the engine's agility in interpreting domain agnostic questions.

 

 

But there's more, as the mind blowing moment of the video had yet to arrive, when the researcher presented Lucida with a picture of the leaning Tower of Pisa and asked it for its height. That's a mammoth task for any computer to undertake because it first has to identify the building, analyze and understand the spoken request, translate that into a format the backend database can understand, and then retrieve the answer and restructure it in natural language for the user to understand. We've already explored such an approach where human pilots communicate and coordinate with an AI Wingman in humanly understood language, a vital tool when in the middle of an air battle.

 

Technically speaking

Lucida is formed by the fusion of three separate and self-contained components :

The Automatic Speech Recognition (ASR) component, which utilizes Gaussian Mixture Model and/or Deep Neural Network scoring, is backed by the Signal Processing Deep Neural Network backend and supports several speech recognition toolkits: Kaldi (Deep Neural Network-Hidden Markov Model based), Pocketsphinx and Sphinx4 (Gaussian Mixture Model-Hidden Markov Model based).

The Image Matching (IMM) component, which utilizes the Feature Extraction (FE) and Feature Description (FD) techniques, is backed by the Image Processing DNN backend, and uses SURF, a class of the OpenCV computer vision and machine learning software library, for extracting Speeded Up Robust Features from an image and use them as queries to a  database.

The Question-Answering System (QA), (Regular Expression/Regex, Porter word stemming/ Stemmer, and Conditional Random Fields/CRF tagging), is backed by the Natural Language Processing DNN backend and utilizes OpenEphyra, a Java platform-independent framework for question answering, plus a Wikipedia database stored in Lemur’s Indri format.This is how Lucida could answer the How tall the tower of Pisa is question;it looked it up in an embedded Wikipedia database.

These DNN backends, together with 7 dependent upon applications, were united under the  Deep-Learning-As-A-Service umbrella, taking shape in the DjINN and Tonic suite.

The Tonic suite therefore, is a collection of applications that accept a series tasks, be it

Image Processing related tasks:

• Image classification (IMC)
• Facial recognition (FACE)
• Digit recognition (DIG)

Speech Processing related tasks:

• Automatic speech recognition (ASR)

Natural Language Processing related tasks:

• Part-of-speech tagging (POS)
• Chunking (CHK)
• Name entity recognition (NER)

all derived from the user supplied queries.

The applications then call into the DNN web service to forward it the request, which would take it from there, process the request and reply in natural language format.

The flexibility of the system lies in that you can mix and match those services in order to develop pipelined applications.For example you could combine the ASR and QA services or all ASR+IMM+QA to pull something like taking a picture of a restaurant and asking Lucida What time does this restaurant close, in order for Lucida to promptly reply at 8 o'clock.

You can easily see where that leads to. Wearable or mobile devices having more intimate relationships with their owners, knowing their secrets, habits and belongings so that they're capable of not just answering general questions the kind of where is the nearest tube station but personal ones too like how many pounds does my roof rack hold?, per the promotional video, or the quintessential and potentially life saving when is my wife's birthday question (pun intended).

Of course, the aspect of privacy and security is a grand issue pertaining to all IoT devices, which to this day remains not fully satisfied (although Bitcoin's Blockchain infrastructure looks like holding the key, but that's a subject for some other time).

Starting today Lucida is offered as a Cloud platform to the world, just like IBM Watson Developer cloud and Hewlett Packard's Haven OnDemand, courtesy of the University of Michigan and Clinc, a company specifically set up for this cause. 

The idea here is to expose APIs to the DNN backend that will empower anyone to create intelligent Personal Assistant applications in a move emphasizing an emerging trend of our times;that Machine Learning and AI have reached the status of tradeable commodity.Competition amongst the major stakeholders looks nothing but fierce...

 

 

More Information

Lucida main

Lucida on GitHub

Sirius

Clinc-AI for the Planet

Clarity Lab

Related Articles

Achieving Autonomous AI Is Closer Than We Think

Artificial Intelligence in Pokémons' Service

Haven OnDemand Offers Machine Learning As A Service

Ai Linux

Are Your Pictures Memorable?

OpenFace - Face Recognition For All

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter,subscribe to the RSS feed and follow us on, Twitter, FacebookGoogle+ or Linkedin

 

Banner


No More Android Sweet Treats
27/08/2019

In a break with tradition, the next version of Android is not going to share a name beginning with Q with some dessert or sweet treat. Instead it will be known by its number -  Android 10.



A Personal Sound Projector For $10
18/08/2019

Perhaps the "for $10" part is hype as that is just the cost of the webcam used in the tracking system, but it is low cost. Using acoustic meta-materials a team has managed to focus a beam of sound tha [ ... ]


More News

 

graphics

 



 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Wednesday, 07 September 2016 )