信息检索与搜索引擎Introduction to Information RetrievalGESC1007
School of Natural Sciences and Humanities
We have discussed:
A complete search system
Brief review of last week
Evaluation in an information retrieval system
Course schedule (日程安排)
Introduction (Chapter 1)
Term vocabulary and posting lists (Chapter 2)
Dictionaries and tolerant retrieval (Chapter 3)
Index construction (Chapter 4)
Scoring, term weighting, the vector space model (Chapter 6)
A complete search system (Chapter 7)
Evaluation in information retrieval
Web search engines, advanced topics, conclusion
1) Initially, we have a set of documents.
2)Linguistic processing is applied to these documents (tokenization, stemming, language detection…)
Each document is a set of terms.
3) The IR System keeps a copy of each document in a cache (缓存).
This is useful to generate snippets (片段)
Snippet: a short text that accompany each document in the result list of a search engine
4) A copy of each document is given to indexers. These programs will create different kind of indexes: positional indexes, indexes for spell correction, structures for inexact retrieval….
5) When a user searches using a free-text query, the query parser transforms the query, and spell-correction is applied.