下载此文档

北野武(north.ppt


文档分类:医学/心理学 | 页数:约15页 举报非法文档有奖
1/15
下载提示
  • 1.该资料是网友上传的,本站提供全文预览,预览什么样,下载就什么样。
  • 2.下载该文档所得收入归上传者、原创者。
  • 3.下载的文档,不会出现我们的网址水印。
1/15 下载此文档
文档列表 文档介绍
Using the Web for Automated Translation Extraction in Cross-Language Information Retrieval
Advisor : Dr. Hsu
Presenter : Zih-Hui Lin
Author :Ying Zhang and Phil Vines
1
Motivation
Objective
Previous work
Methodology
Experiments and results
Conclusions
Outline
2
Motivation
One of the major remaining reasons that CLIR does not perform as well as monolingual retrieval is the presence of out of vocabulary (OOV) terms.
it will not be recognized, and segmented into either smaller sequences of characters or individual characters
北野武→(north limit military)
Previous work has either relied on manual intervention or has only been partially essful in solving this problem.
3
Objective
We propose a segmentation free method which can be applied to both Chinese-English and English-Chinese CLIR, correctly extracting translations of OOV terms from the Web automatically, and thus is a significant improvement on earlier work
4
English translation extraction in Chinese-English CLIR
Chinese OOV term detection
北野武(north limit military) → Pvalue given by the HMM will be very low if Pvalue < Pmin → contains OOV terms
web text extraction
we extract strings that contain the Chinese query
terms and some English text from the Web.
collection of co-occurrence statistics,
translation selection.
search for longest Chinese substring Ct:
search for the English term etwith the highest frequency:
1. |Ctargets| = max(|Cij|).
2. f(et, Ct) = max(f(ei,Ctargets)).
3. Add (Ct, et) into the translation dictionary.
(etargets) = max(f(ei)).
(et’,Ct’) = max(f(etargets,Cij )).
3. if Ct’≠ Ct and et’≠ et , add (Ct’, et’) into the translation dictionary.
北野武(Kitano Takeshi) c4 c5 c6 e 1
導演北野武( Kitano Takeshi) c2 c3 c4 c5 c6 e1
5
Chinese translation extraction in English-Chinese CLIR
Extraction of web text
use Google to fetch the top100 Chinese documents with the English OOV term eoov as the query.
Collection of co-occurrence statistics

accumulate the frequency foov.
considering all

北野武(north 来自淘豆网www.taodocs.com转载请标明出处.

非法内容举报中心
文档信息
  • 页数15
  • 收藏数0 收藏
  • 顶次数0
  • 上传人drp539603
  • 文件大小541 KB
  • 时间2018-05-01