下载此文档

信息检索与搜索引擎.pptx


文档分类:IT计算机 | 页数:约71页 举报非法文档有奖
1/71
下载提示
  • 1.该资料是网友上传的,本站提供全文预览,预览什么样,下载就什么样。
  • 2.下载该文档所得收入归上传者、原创者。
  • 3.下载的文档,不会出现我们的网址水印。
1/71 下载此文档
文档列表 文档介绍
信息检索与搜索引擎 Introduction to Information Retrieval GESC1007
Philippe Fournier-Viger
Full professor
School of Natural Sciences and Humanities
******@
Spring 2021
1
Last time
We have discussed:
how to calculate scores
the vector model
2
Course schedule (日程安排)
3
Week 1
Introduction (Chapter 1)
Boolean retrieval
Week 2
Term vocabulary and posting lists (Chapter 2)
Week 3
Dictionaries and tolerant retrieval (Chapter 3)
Week 4
Index construction (Chapter 4)
Week 5
Scoring, term weighting, the vector space model (Chapter 6)
Week 6
A complete search system (Chapter 7)
Week 7
Evaluation in information retrieval
Week 8
Web search engines, advanced topics, conclusion
Final exam (date to be announced)
LAST WEEK
4
Term frequency (TF): (词频) how many times a term appears in a document
Document frequency (DF) (文档频率): how many documents contain a term in a collection of documents.
5
Inverse document frequency (IDF) of a term t: (逆文档频率)
N = number of documents in the collection DFt = document frequency of the term t
Example
N = 806, 791 documents
6
TF-IDF
7
Term frequency - 词频): number of times that the term appears in the document
Inverse
document
frequency
(逆文档频率) of the term
The TF-IDF of a term t for a document :
8
Vector model (矢量模型)
Documents can be viewed as vectors:
vector(doc1) = [, ] vector(doc2) = [, ] vector(doc3) = [, ] vector(doc4) = [, ]

… …
..
9
Shenzhen
(score using TF-IDF or TF)
Beijing
(score using TF-IDF or TF)
The vector space model can be used to calculate how similar two documents are.
10
Two documents should be similar if their vectors are close to each other

信息检索与搜索引擎 来自淘豆网www.taodocs.com转载请标明出处.

非法内容举报中心
文档信息
  • 页数71
  • 收藏数0 收藏
  • 顶次数0
  • 上传人3321568027
  • 文件大小807 KB
  • 时间2021-07-21