下载此文档

Python爬虫程序设计ppt课件.ppt

文档分类：IT计算机 | 页数：约44页举报非法文档有奖

1/44

下载提示

1.该资料是网友上传的，本站提供全文预览，预览什么样，下载就什么样。
2.下载该文档所得收入归上传者、原创者。
3.下载的文档，不会出现我们的网址水印。

同意并开始全文预览

(约 1-6 秒)

1/44 下载此文档

文档列表 文档介绍

(1)在前面我们已经知道使用BeautifulSoup能查找HTML中的元素,scrapy中也有强大的查找HTML元素的功能,那就是使用xpath方法。xpath方法使用XPath语法,比BeautifulSoup的select要灵活而且速度快。例4-2-1:='''<html><body><bookstore><book><titlelang="eng">HarryPotter</title><price></price></book><book><titlelang="eng">LearningXML</title><price></price></book></bookstore></body></html>'''selector=Selector(text=htmlText)print(type(selector));print(selector)s=("//title")print(type(s))print(s)程序结果:class''><Selectorxpath=Nonedata='<html><body>\n<bookstore>\n<book>\n<title'><class''>[<Selectorxpath='//title'data='<titlelang="eng">HarryPotter</title>'>,<Selectorxpath='//title'data='<titlelang="eng">LearningXML</title>'>](1),这个类就是选择查找类。(2)selector=Selector(text=htmlText)使用htmlText的文字建立Selector类,就是装载HTML文档,文档装载后就形成一个Selector对象,就可以使用xpath查找元素。(3)print(type(selector),这个类型是一个有xpath方法的类型。((4)s=("//title")这个方法在文档中查找所有的<title>的元素,其中"//"表示文档中的任何位置。一般地:("//tagName")表示在权文档中搜索<tagName>的tags,形成一个Selector的列表。(5)print(type(s))由于<title>有两个元素,,。(6)print(s)我们看到s包含两个Selector对象,一个是<Selectorxpath='//title'data='<titlelang="eng">HarryPotter</title>'>,另外一个是<Selectorxpath='//title'data='<titlelang="eng">LearningXML</title>'>。<Selectorxpath='//title'data='<titlelang="eng">LearningXML</title>'>由此可见一般selector搜索一个<tagName>的HTML元素的方法是:("//tagName")在装载HTML文档后selector=Selector(text=htmlText)得到的selector是对应全文档顶层的元素<html>的,其中"//"表示全文档搜索,结果是一个Selector的列表,哪怕只有一个元素也成一个列表,例如:("//body")搜索到<body>元素,结果是一个Selector的列表,包含一个Selector元素;("//title")搜索到两个<title>元素,结果是Selector的列表,包含2个Selector元素;("//book")搜索到两个<book>元素,结果是Selector的列表,包含2个Selector元素;(2

Python爬虫程序设计ppt课件来自淘豆网www.taodocs.com转载请标明出处.

Python爬虫程序设计ppt课件.ppt

Python爬虫程序设计课件

python爬虫程序设计kc18

Python爬虫程序设计KC18

Python爬虫程序设计KC18

python爬虫程序设计kc18

Python爬虫程序设计KC18

Python爬虫程序设计KC18

Python爬虫程序设计KC18

python爬虫程序设计kc18

Python爬虫程序设计KC18