Overview

Conversational applications often are over-hyped and under perform. While there's been significant progress in Natural Language Understanding (NLU) in academia and a huge growing market for voice based technologies, NLU performance significantly drops when you introduce language with typos or other errors, uncommon vocabulary, and more complex requests. This talk will cover how to build a production quality conversational app that performs well in a real world setting.

We will demonstrate an end-to-end approach for consistently building conversational interfaces with production-level accuracies that has proven to work well for a number of applications across diverse verticals. Building successful conversational interfaces involves choosing the right use case, collecting clean and relevant data, and breaking down the NLU problem into a series of solvable sub-tasks. All of today's most widely used conversational services have been built using a similar hierarchical NLU pipeline of domain-intent-entity classification that has become an industry standard, which we will discuss in detail.

Our architecture further improves on this standard domain-intent-entity classification and dialogue management architecture by leveraging shallow semantic parsing. We observed that NLU systems for industry applications often require more structured representations of entity relations than provided by the standard hierarchy, yet without requiring full semantic or syntactic parses which are often inaccurate on real-world conversational data. We describe our approach and demonstrate how it improves the performance of conversational interfaces for non-trivial use cases.

We end the talk by discussing the additional challenges in building a voice assistant rather than a text-based chatbot. Large vocabulary domain-agnostic Automatic Speech Recognition (ASR) systems often mis-transcribe domain-specific words and phrases. Since these generic ASR systems are the first components of most voice assistants in production, building NLU systems that are robust to these errors can be a challenging task. We describe a few potential methods for handling ASR errors in the NLU pipeline, especially in the entity classification and resolution component which is most susceptible to poor performance from ASR errors.

After this talk, attendees will have a better appreciation for the challenges and nuances of building real-world NLU systems, as well as a high level understanding of the best practices and components needed to build their own production quality conversational assistant.

Syllabus

Introduction.
Cisco Webex: Meetings, Messaging, Calling, Devices.
Face Detection.
Intelligent framing for clearer communication.
Face Recognition.
Noise Suppression.
Building production-quality conversational assistants is one of today's hardest Al challenges.
With recent NLP advancements, why is building a production level conversational app so hard?.
Selecting the right use case is essential.
Conversational data is often collected via crowdsourcing tools.
Key steps in a modern conversational application.
Voice based conversational applications.
Automatic Speech Recognition.
Domain Classifier assigns the user query to a pre-defined domain.
Domain Classification.
Intent Classifier determines the user intent.
Intent Classification.
Entity Recognizer detects all relevant entities in the user query.
Entity Recognition.
Role Classifier assigns role labels to extracted entities.
Role Classification.
Entity Resolver transforms each extracted entity into its canonical form.
Entity Resolution - Textual Features.
Entity Resolution - Phonetic Features.
Entity Resolution - Personalization Features.
Entity Resolution on Noisy Voice Transcripts.
Language Parser clusters extracted entities into meaningful entity groups.