Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

University of Illinois at Urbana-Champaign

Text Retrieval and Search Engines

University of Illinois at Urbana-Champaign via Coursera

Overview

Recent years have seen a dramatic growth of natural language text data, including web pages, news articles, scientific literature, emails, enterprise documents, and social media such as blog articles, forum posts, product reviews, and tweets. Text data are unique in that they are usually generated directly by humans rather than a computer system or sensors, and are thus especially valuable for discovering knowledge about people’s opinions and preferences, in addition to many other kinds of knowledge that we encode in text. This course will cover search engine technologies, which play an important role in any data mining applications involving text data for two reasons. First, while the raw data may be large for any particular problem, it is often a relatively small subset of the data that are relevant, and a search engine is an essential tool for quickly discovering a small subset of relevant text data in a large text collection. Second, search engines are needed to help analysts interpret any patterns discovered in the data by allowing them to examine the relevant original text data to make sense of any discovered pattern. You will learn the basic concepts, principles, and the major techniques in text retrieval, which is the underlying science of search engines.

Syllabus

  • Orientation
    • You will become familiar with the course, your classmates, and our learning environment. The orientation will also help you obtain the technical skills required for the course.
  • Week 1
    • During this week's lessons, you will learn of natural language processing techniques, which are the foundation for all kinds of text-processing applications, the concept of a retrieval model, and the basic idea of the vector space model.
  • Week 2
    • In this week's lessons, you will learn how the vector space model works in detail, the major heuristics used in designing a retrieval function for ranking documents with respect to a query, and how to implement an information retrieval system (i.e., a search engine), including how to build an inverted index and how to score documents quickly for a query.
  • Week 3
    • In this week's lessons, you will learn how to evaluate an information retrieval system (a search engine), including the basic measures for evaluating a set of retrieved results and the major measures for evaluating a ranked list, including the average precision (AP) and the normalized discounted cumulative gain (nDCG), and practical issues in evaluation, including statistical significance testing and pooling.
  • Week 4
    • In this week's lessons, you will learn probabilistic retrieval models and statistical language models, particularly the detail of the query likelihood retrieval function with two specific smoothing methods, and how the query likelihood retrieval function is connected with the retrieval heuristics used in the vector space model.
  • Week 5
    • In this week's lessons, you will learn feedback techniques in information retrieval, including the Rocchio feedback method for the vector space model, and a mixture model for feedback with language models. You will also learn how web search engines work, including web crawling, web indexing, and how links between web pages can be leveraged to score web pages.
  • Week 6
    • In this week's lessons, you will learn how machine learning can be used to combine multiple scoring factors to optimize ranking of documents in web search (i.e., learning to rank), and learn techniques used in recommender systems (also called filtering systems), including content-based recommendation/filtering and collaborative filtering. You will also have a chance to review the entire course.

Taught by

ChengXiang Zhai

Reviews

3.2 rating, based on 13 Class Central reviews

4.5 rating at Coursera based on 952 ratings

Start your review of Text Retrieval and Search Engines

  • Text Retrieval and Search Engines is the second course in Coursera's new data mining specialization offered by the University of Illinois at Urbana-Champaign. The course covers a variety of topics in text data mining and natural language processing…
  • I've taken a number of courses on Coursera and have thoroughly enjoyed some of them, but it's clear that the quality varies. I was very disappointed in this course. Having applied to the University of Illinois' Master of Computer Science - Data Scie…
  • Anonymous
    I was initially excited for this course as it seemed a good dive into unstructured text data. But now I'd say: *skip this course*. I think the instructor is okay and presents the material in a sufficient enough manner to get a decent grasp of it.…
  • Anonymous
    Great class with a nice mix of theoretical and practical lessons. There was a competition at the end of the course which pushed us to come up with new ideas.
  • Anonymous
    Precise and clear explanation about the concepts .This course completes focuses on text retrieval concepts with strong strong intro on what is text retrieval , what are the challenges faced and further gives an insight on various models and improvement in this field .Therefore, this course is mostly only for people more interested in an area in information retrieval.
  • Profile image for Lien Block
    Lien Block
    The course is not very organised and even though they share a lot of information, it's not really very useful for someone who wants to get his/her hands dirty and really learn NLP/Text retrieval.

    (+ Instructor is sometimes very hard to understand)
  • I'm encouraging more programming assignments dealing with NLP, and a bit smaller focus on C++ and more R/Py support. It was a fun experience, and I hope that the theoretical approach will slowly turn into a combination of theory and practice.
  • Anonymous
    It's not complete, but a good start point for who want to learn more about information retrieval. Great course. I recommend.
  • Basil Rormose
  • Mike Rocke
  • Deepak Jois
  • Rafael Prados

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.