我要上传

下载此文档

信息检索与搜索引擎ppt课件.pptx

文档分类：IT计算机 | 页数：约84页举报非法文档有奖

1/84

下载提示

1.该资料是网友上传的，本站提供全文预览，预览什么样，下载就什么样。
2.下载该文档所得收入归上传者、原创者。
3.下载的文档，不会出现我们的网址水印。

同意并开始全文预览

1/84 下载此文档

文档列表 文档介绍

信息检索与搜索引擎Introduction to Information RetrievalGESC1007
Philippe Fournier-Viger
Full professor
School of Natural Sciences and Humanities
******@
Spring 2021
1
Last week
We have discussed in more details about how index are created.
Tokenization, normalization, lemmatization…
Phrase queries
using positional indexes
QQ Group: 596340260Website: PPTs…
2
Course schedule (日程安排)
3
Lecture 1
Introduction
Boolean retrieval (布尔检索模型)
Lecture 2
Term vocabulary and posting lists
Lecture 3
Dictionaries and tolerant retrieval
Lecture 4
Index construction and compression
Lecture 5
Scoring, weighting, and the vector space model
Lecture 6
Computer scores, and a complete search system
Lecture 7
Evaluation in information retrieval
Lecture 8
Web search engines, advanced topics, and conclusion
About last course
Normalization -规范化: the process of converting tokens to a standard form
Stemming: consists of removing the end of words (simple)
cars ⇒ car
airplanes ⇒ airplane
Lemmatization: converting a word to a common base form called “lemma” (complicate)
am, are, is ⇒ be
4
Chapter 3 – Dictionaries and tolerant retrieval
5
PDF -…
Previous weeks
Boolean retrieval model (布尔检索模型 using Boolean operators) Shenzhen AND food
Phrase (短语) queries “Airplane tickets from Beijing”
Proximity queries “Shenzhen (within 5 words) of City”
To find documents, we have used a dictionary (词典 - also called inverted index 倒排索引).
6
Today
How to deal with typographical errors (打字错误)? Shenzhen vs Shenzhennn
often made by accident (无意地)
How to deal with different spellings (拼法)?
Color vs Colouranalyze vs analyse
How to deal with phonetically similar terms (发音相似的词)?
concede vs conceed right vs write vs rite vs wright
7
Wildcard queries (通配符查询)
Wildcard (*) query: a query containing the wildcard (通配符) character “ * ”
* = one or more characters
. automat* to search

信息检索与搜索引擎ppt课件来自淘豆网www.taodocs.com转载请标明出处.

猜你喜欢

相关文档更多>>

非法内容举报中心

文档信息

页数：84
收藏数：0 收藏
顶次数：0 顶
上传人：3321568027
文件大小：425 KB
时间：2021-07-02

相关标签

搜索引擎指南信息检索课件搜索引擎推广方案搜索引擎使用方法搜索引擎优化指南搜索引擎优化方案搜索引擎优化论文搜索引擎优化建议中文搜索引擎指南搜索引擎优化算法

最近更新

在线
客服
微信
客服
QQ
客服
意见
反馈
手机
查看
返回
顶部