下载此文档

信息检索与搜索引擎 ppt课件.pptx


文档分类:IT计算机 | 页数:约91页 举报非法文档有奖
1/91
下载提示
  • 1.该资料是网友上传的,本站提供全文预览,预览什么样,下载就什么样。
  • 2.下载该文档所得收入归上传者、原创者。
  • 3.下载的文档,不会出现我们的网址水印。
1/91 下载此文档
文档列表 文档介绍
信息检索与搜索引擎 Introduction to Information Retrieval GESC1007
Philippe Fournier-Viger
Full professor
School of Natural Sciences and Humanities
******@
Spring 2021
1
Last week
What is Information Retrieval (信息检索)?
We discussed the « Boolean retrieval model (布尔检索模型) ”: searching documents using terms and Boolean operators (. AND, OR, NOT)
QQ Group: 596340260 Website: PPTs
2
Course schedule (日程安排)
3
Lecture 1
Introduction
Boolean retrieval (布尔检索模型)
Lecture 2
Term vocabulary and posting lists
Lecture 3
Dictionaries and tolerant retrieval
Lecture 4
Index construction and compression
Lecture 5
Scoring, weighting, and the vector space model
Lecture 6
Computer scores, and a complete search system
Lecture 7
Evaluation in information retrieval
Lecture 8
Web search engines, advanced topics, and conclusion
An exercise
4
b. Draw the dictionary (also called inverted index representation) for this collection
c. What are the returned result for these queries?
- schizophrenia AND drug
- for AND NOT (drug OR approach)
This is an exercise that you can do at home if you want to review what we have learnt last week
Introduction
To able to search for documents quickly, we need to create an index (索引).
What kind of index? 
5
Term-document matrix (关联矩阵 )
6
Term-document matrix (关联矩阵 )
Dictionary (词典), also called “inverted index” 倒排索引)

Four steps to create an index
7
How to create an index?
Step 1: collect the documents to be indexed
Book1
Book2
Book3
Book100

How to create an index?
Step 1: collect the documents to be indexed
Step 2: tokenize the text (标记文本): separate it into words
9
Book1
Book2
Book3
Book100

““The city of Shenzhen is located in the South of China…“”
token1
token2 …
token8
token7

Book1
token9
token10
token11
How to create an index?
Step 3: Linguistic preprocessing (语言的预处理)
Keep only the terms that a

信息检索与搜索引擎 ppt课件 来自淘豆网www.taodocs.com转载请标明出处.

相关文档 更多>>
非法内容举报中心
文档信息
  • 页数91
  • 收藏数0 收藏
  • 顶次数0
  • 上传人1017848967
  • 文件大小1.25 MB
  • 时间2021-06-25