信息检索与搜索引擎Introduction to Information RetrievalGESC1007
Philippe Fournier-Viger
Full professor
School of Natural Sciences and Humanities
******@
Spring 2021
1
Last week
What is Information Retrieval (信息检索)?
We discussed the « Boolean retrieval model (布尔检索模型) ”: searching documents using terms and Boolean operators (. AND, OR, NOT)
QQ Group: 596340260 Website: PPTs
2
Course schedule (日程安排)
3
Lecture 1
Introduction
Boolean retrieval (布尔检索模型)
Lecture 2
Term vocabulary and posting lists
Lecture 3
Dictionaries and tolerant retrieval
Lecture 4
Index construction and compression
Lecture 5
Scoring, weighting, and the vector space model
Lecture 6
Computer scores, and a complete search system
Lecture 7
Evaluation in information retrieval
Lecture 8
Web search engines, advanced topics, and conclusion
An exercise
4
b. Draw the dictionary (also called inverted index representation) for this collection
c. What are the returned result for these queries?
- schizophrenia AND drug
- for AND NOT (drug OR approach)
This is an exercise that you can do at home if you want to review what we have learnt last week
Introduction
To able to search for documents quickly, we need to create an index (索引).
What kind of index?
5
Term-document matrix (关联矩阵 )
6
Term-document matrix (关联矩阵 )
Dictionary (词典), also called “inverted index” 倒排索引)
Four steps to create an index
7
How to create an index?
Step 1: collect the documents to be indexed
Book1
Book2
Book3
Book100
…
How to create an index?
Step 1: collect the documents to be indexed
Step 2: tokenize the text (标记文本): separate it into words
9
Book1
Book2
Book3
Book100
…
““The city of Shenzhen is located in the South of China…“”
token1
token2 …
token8
token7
…
Book1
token9
token10
token11
How to create an index?
Step 3: Linguistic preprocessing (语言的预处理)
Keep only the terms that a
信息检索与搜索引擎 ppt课件 来自淘豆网www.taodocs.com转载请标明出处.