Instructor: Prof. Pushpak Bhattacharyya, Department of Computer Science and Engineering, IIT Bombay.
This course provides an understanding of natural language processing, its tools, techniques, philosophy and principle. Topics covered include sound, words and word forms, structures, meaning, and web 2.0 applications:
Sound: Biology of Speech Processing; Place and Manner of Articulation; Word Boundary Detection; Argmax based computations; HMM and Speech Recognition.
Words and Word Forms: Morphology fundamentals; Morphological Diversity of Indian Languages; Morphology Paradigms; Finite State Machine Based Morphology; Automatic Morphology Learning; Shallow Parsing; Named Entities; Maximum Entropy Models; Random Fields.
Structures: Theories of Parsing, Parsing Algorithms; Robust and Scalable Parsing on Noisy Text as in Web documents; Hybrid of Rule-Based and Probabilistic Parsing; Scope Ambiguity and Attachment Ambiguity resolution.
Meaning: Lexical Knowledge Networks, Wordnet Theory; Indian Language Wordnets and Multilingual Dictionaries; Semantic Roles; Word Sense Disambiguation; WSD and Multilinguality; Metaphors; Coreferences.
Web 2.0 Applications: Sentiment Analysis; Text Entailment; Robust and Scalable Machine Translation; Question Answering in Multilingual Setting; Cross-Lingual Information Retrieval (CLIR).