Conversational applications often are over-hyped and under perform. While there's been significant progress in Natural Language Understanding (NLU) in academia and a huge growing market for voice based technologies, NLU performance significantly drops when you introduce language with typos or other errors, uncommon vocabulary, and more complex requests. This talk will cover how to build a production quality conversational app that performs well in a real world setting.
We will demonstrate an end-to-end approach for consistently building conversational interfaces with production-level accuracies that has proven to work well for a number of applications across diverse verticals. Building successful conversational interfaces involves choosing the right use case, collecting clean and relevant data, and breaking down the NLU problem into a series of solvable sub-tasks. All of today's most widely used conversational services have been built using a similar hierarchical NLU pipeline of domain-intent-entity classification that has become an industry standard, which we will discuss in detail.
Our architecture further improves on this standard domain-intent-entity classification and dialogue management architecture by leveraging shallow semantic parsing. We observed that NLU systems for industry applications often require more structured representations of entity relations than provided by the standard hierarchy, yet without requiring full semantic or syntactic parses which are often inaccurate on real-world conversational data. We describe our approach and demonstrate how it improves the performance of conversational interfaces for non-trivial use cases.
We end the talk by discussing the additional challenges in building a voice assistant rather than a text-based chatbot. Large vocabulary domain-agnostic Automatic Speech Recognition (ASR) systems often mis-transcribe domain-specific words and phrases. Since these generic ASR systems are the first components of most voice assistants in production, building NLU systems that are robust to these errors can be a challenging task. We describe a few potential methods for handling ASR errors in the NLU pipeline, especially in the entity classification and resolution component which is most susceptible to poor performance from ASR errors.
After this talk, attendees will have a better appreciation for the challenges and nuances of building real-world NLU systems, as well as a high level understanding of the best practices and components needed to build their own production quality conversational assistant.