下载此文档

Nutch分布式网络爬虫研究与优化.docx


文档分类:IT计算机 | 页数:约7页 举报非法文档有奖
1/7
下载提示
  • 1.该资料是网友上传的,本站提供全文预览,预览什么样,下载就什么样。
  • 2.下载该文档所得收入归上传者、原创者。
  • 3.下载的文档,不会出现我们的网址水印。
1/7 下载此文档
文档列表 文档介绍
E-mail: ******@
+86-10-51616056
ISSN 1673-9418 CODEN JKYTA8
Journal of Frontiers of Computer Science and Technology
1673-9418/2011/05(01)-0068-07 Tel:
DOI: .1673-
Nutch分布式网络爬虫研究与优化
詹恒飞1+,杨岳湘2,方宏2
国防科学技术大学
国防科学技术大学
计算机学院,长沙410073
信息中心,长沙410073
*
Research and Optimization of Nutch Distributed Crawler
ZHAN Hengfei 1+, YANG Yuexiang2, FANG Hong2
School of Computer Science, National University of Defense Technology, Changsha 410073, China
Information Center, National University of Defense Technology, Changsha 410073, China
+ Corresponding author: E-mail: zhf_a_******@
ZHAN Hengfei, YANG Yuexiang, FANG Hong. Research and optimization of Nutch distributed crawler. Journal of Frontiers of Computer Science and Technology, 2011,5(1): 68-74.
Abstract: As a good open-source search engine, Nutch kernel code uses a lot of MapReduce programming models, being used by more and more businesses and organizations to customize their needs in line with the distributed search engine product. As a good search engine, one of the important prerequisites is how to grab network data as much as possible to build indexes. This paper introduces Nutch ' s working mechanism based on Hadoop dis
Web crawler, points out its shortcomings and proposes an improved program, which can make Web crawler using network resources more efficien

Nutch分布式网络爬虫研究与优化 来自淘豆网www.taodocs.com转载请标明出处.

相关文档 更多>>
非法内容举报中心
文档信息
  • 页数7
  • 收藏数0 收藏
  • 顶次数0
  • 上传人江湖故人
  • 文件大小24 KB
  • 时间2021-07-03