E-mail: ******@
+86-10-51616056
ISSN 1673-9418 CODEN JKYTA8
Journal of Frontiers of Computer Science and Technology
1673-9418/2011/05(01)-0068-07 Tel:
DOI: .1673-
Nutch分布式网络爬虫研究与优化
詹恒飞1+,杨岳湘2,方宏2
国防科学技术大学
国防科学技术大学
计算机学院,长沙410073
信息中心,长沙410073
*
Research and Optimization of Nutch Distributed Crawler
ZHAN Hengfei 1+, YANG Yuexiang2, FANG Hong2
School of Computer Science, National University of Defense Technology, Changsha 410073, China
Information Center, National University of Defense Technology, Changsha 410073, China
+ Corresponding author: E-mail: zhf_a_******@
ZHAN Hengfei, YANG Yuexiang, FANG Hong. Research and optimization of Nutch distributed crawler. Journal of Frontiers of Computer Science and Technology, 2011,5(1): 68-74.
Abstract: As a good open-source search engine, Nutch kernel code uses a lot of MapReduce programming models, being used by more and more businesses and organizations to customize their needs in line with the distributed search engine product. As a good search engine, one of the important prerequisites is how to grab network data as much as possible to build indexes. This paper introduces Nutch ' s working mechanism based on Hadoop dis
Web crawler, points out its shortcomings and proposes an improved program, which can make Web crawler using network resources more efficien
Nutch分布式网络爬虫研究与优化 来自淘豆网www.taodocs.com转载请标明出处.