This course introduces the technologies behind web and search engines, including document indexing, searching and ranking. You will also learn different performance metrics for evaluating search quality, methods for understanding user intent and document semantics, and advanced applications including recommendation systems and summarization. Real-life examples and case studies are provided to reinforce the understanding of search algorithms.
Search Engines for Web and Enterprise Data
The Hong Kong University of Science and Technology via Coursera
-
224
-
- Write review
Overview
Syllabus
- Introduction to Search Engines for Web and Enterprise Data
- Welcome to the first module of this course! In this module, you will learn: (1) The major tasks involved in web search. (2) The history, evolution, impacts and challenges of web search engine.
- Search Engine Business Model
- In this module, you will learn: (1) Different business models of web search engine.
- TFxIDF
- In this module, you will learn: (1) Different information retrieval models, Boolean Models and Statistical models. (2) How to determine important words in a document using TFxIDF.
- Vector Space Model
- In this module, you will learn: (1) How to represent a document/query as a vector of keywords. 2) How to determine the degree of similarity between a pair of vectors using different similarity measures, including Inner Product, Cosine Similarity, Jaccard Coefficient, Dice Coefficient.
- Inverted Files
- In this module, you will learn: (1) How to index documents using inverted files. 2) How to perform update and deletion on inverted files.
- Extended Boolean Model
- In this module, you will learn: (1) How to use Extended Boolean Model to rank documents. 2) How to evaluate conjunctive and disjunctive queries using Extended Boolean Model.
- PageRank
- In this module, you will learn: (1) The history and evolution of link-based ranking methods. 2) How to determine query/document similarities using HyPursuit, WISE, and PageRank. 3) Possible extensions that can be applied to Pagerank.
- HITS Algorithm
- In this module, you will learn: (1) How to calculate hub and authority scores of web documents using HITS algorithm. 2) Understand the re-ranking process involved in HITS algorithm.
- Performance Evaluation of Information Retrieval System
- In this module, you will learn: (1) How to evaluate retrieval effectiveness of an information retrieval using Precision, Recall, F-Measure, Average-Precision, DCG, and NDCG. 2) What are the subjective relevance measures to be used on an information retrieval system.
- Benchmarking
- In this module, you will learn: (1) How to use the TREC collection for benchmarking. 2) The characteristics of the TREC collection.
- Stopword removal and Stemming
- In this module, you will learn: (1) What is stemming. 2) Different Content-Sensitive and Context-Free stemming algorithms. 3) How to calculate Successor Variety and Entropy for stemming.
- Relevance Feedback
- In this module, you will learn: (1) How to perform document space modification using relevance feedback. 2) How to perform query modification using relevance feedback.
- Personalized Web Search
- In this module, you will learn: (1) Relative preference is more useful than absolute preference in personalization. 2) The importance of eye-tracking user study in personalized web search. 3) How to model preferences as a weighted vector.
- Index Term Selection
- In this module, you will learn: (1) How to calculate discrimination value for index term selection. 2) The importance of word usage in documents in search engine design.
- Discovering Phrases and Correlated Terms
- In this module, you will learn: (1) How to use collocated terms in lieu of strict phrases in search. 2) How to identify collocated terms using Pointwise Mutual Information (PMI). 3) How to utilize N-grams for search.
- Enterprise Search Engine
- In this module, you will learn: (1) The challenges of enterprise search. 2) The differences between web search and enterprise search.
Taught by
Kenneth W T Leung and Dik Lun LEE