下载此文档

信息检索与搜索引擎课件.pptx


文档分类:IT计算机 | 页数:约80页 举报非法文档有奖
1/80
下载提示
  • 1.该资料是网友上传的,本站提供全文预览,预览什么样,下载就什么样。
  • 2.下载该文档所得收入归上传者、原创者。
  • 3.下载的文档,不会出现我们的网址水印。
1/80 下载此文档
文档列表 文档介绍
信息检索与搜索引擎 Introduction to Information Retrieval GESC1007
Philippe Fournier-Viger
Full professor
School of Natural Sciences and Humanities
******@
Spring 2020
1
Last week
We have discussed about:
Hashing (散列) and search trees (搜索树)
Wildcard queries
Spell correction
QQ Group: 1059666166 Website: PPTs…
2
Homework
The first homework was announced last week.
Please submit your answers no later than on the 30th March 2020 at 23:59 PM.
3
Course schedule (日程安排)
4
Lecture 1
Introduction
Boolean retrieval (布尔检索模型)
Lecture 2
Term vocabulary and posting lists
Lecture 3
Dictionaries and tolerant retrieval
Lecture 4
Index construction and compression
Lecture 5
Scoring, weighting, and the vector space model
Lecture 6
Computer scores, and a complete search system
Lecture 7
Evaluation in information retrieval
Web search engines, advanced topics, and conclusion
PHONETIC (语音的) CORRECTION
5
Write…
Right… Rite… Wright
Phonetic correction
Misspellings are often caused by a user typing a query that sounds like the target term.
Phonetic hashing: try to group together all terms that sound similar.
6
7
Soundex algorithms
Turn every term to be indexed into a 4-character reduced form Hermann  H655
Use these character to create an inverted index (dictionary 词典). The dictionary is called “soundex index”
Do the same with query terms
When a new query arrives, search using the soundex index.
8
How to calculate the 4 character codes?
Retain the first letter of the term.
Change all occurrences of the following letters to ’0’(zero): ’A’,E’, ’I’, ’O’, ’U’, ’H’, ’W’, ’Y’
Change letters to digits as follows: B, F, P, V to 1. C, G, J, K, Q, S, X, Z to 2. D,T to 3. L to 4. M, N to 5. R to 6.
Repeatedly remove one out of each pair of consecutive identical digits
Remove all zeros from the resulting text. Pad the resulting text with trailing zeros and return the first four positions, which will consist of a letter followed by three dig

信息检索与搜索引擎课件 来自淘豆网www.taodocs.com转载请标明出处.

非法内容举报中心
文档信息
  • 页数80
  • 收藏数0 收藏
  • 顶次数0
  • 上传人3321568027
  • 文件大小1.40 MB
  • 时间2021-04-19