下载此文档

信息检索与搜索引擎ppt课件.pptx


文档分类:IT计算机 | 页数:约84页 举报非法文档有奖
1/84
下载提示
  • 1.该资料是网友上传的,本站提供全文预览,预览什么样,下载就什么样。
  • 2.下载该文档所得收入归上传者、原创者。
  • 3.下载的文档,不会出现我们的网址水印。
1/84 下载此文档
文档列表 文档介绍
信息检索与搜索引擎 Introduction to Information Retrieval GESC1007
Philippe Fournier-Viger
Full professor
School of Natural Sciences and Humanities
******@
Spring 2021
1
Last week
We have discussed in more details about how index are created.
Tokenization, normalization, lemmatization…
Phrase queries
using positional indexes
QQ Group: 596340260 Website: PPTs…
2
Course schedule (日程安排)
3
Lecture 1
Introduction
Boolean retrieval (布尔检索模型)
Lecture 2
Term vocabulary and posting lists
Lecture 3
Dictionaries and tolerant retrieval
Lecture 4
Index construction and compression
Lecture 5
Scoring, weighting, and the vector space model
Lecture 6
Computer scores, and a complete search system
Lecture 7
Evaluation in information retrieval
Lecture 8
Web search engines, advanced topics, and conclusion
About last course
Normalization -规范化: the process of converting tokens to a standard form
Stemming: consists of removing the end of words (simple)
cars ⇒ car
airplanes ⇒ airplane
Lemmatization: converting a word to a common base form called “lemma” (complicate)
am, are, is ⇒ be
4
Chapter 3 – Dictionaries and tolerant retrieval
5
PDF -…
Previous weeks
Boolean retrieval model (布尔检索模型 using Boolean operators) Shenzhen AND food
Phrase (短语) queries “Airplane tickets from Beijing”
Proximity queries “Shenzhen (within 5 words) of City”
To find documents, we have used a dictionary (词典 - also called inverted index 倒排索引).
6
Today
How to deal with typographical errors (打字错误)? Shenzhen vs Shenzhennn
often made by accident (无意地)
How to deal with different spellings (拼法)?
Color vs Colour analyze vs analyse
How to deal with phonetically similar terms (发音相似的词)?
concede vs conceed right vs write vs rite vs wright
7
Wildcard queries (通配符查询)
Wildcard (*) query: a query containing the wildcard (通配符) character “ * ”
* = one or more characters
. automat* to search

信息检索与搜索引擎ppt课件 来自淘豆网www.taodocs.com转载请标明出处.

非法内容举报中心
文档信息
  • 页数84
  • 收藏数0 收藏
  • 顶次数0
  • 上传人3321568027
  • 文件大小425 KB
  • 时间2021-07-02