首页的无圣光图站，写了个python小爬虫-优刊号

本帖最后由 junomak 于 2019911 00:28 编辑

话不多说，先上成果图

微信截图_20190911000747.png (235.51 KB, 下载次数: 2)

下载附件

2019911 00:06 上传

微信截图_20190911000859.png (155.43 KB, 下载次数: 0)

下载附件

2019911 00:07 上传

代码如下，爬虫小白，希望可以互相交流一下，使用的话记得先pip安装相关库

“””

爬取https://www.24fa.top/MeiNv/网站上的图片

“””

import os

import requests

from lxml import etree

url = ‘https://www.24fa.top/MeiNv/’

HEADERS = {

‘Cookie’: ‘__cfduid=d5a0f1131fb1f8ab99233892dcd308bec1568092497; HstCfa4220097=1568092500185; HstCmu4220097=1568092500185; __dtsu=3DD172A75531775D56417C1F02D64A0A; mpc=1; HstCnv4220097=2; HstCla4220097=1568107788057; HstPn4220097=6; HstPt4220097=7; HstCns4220097=3’,

‘UserAgent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36’

}

def get_urls():

“””

获取所有的图片链接

“””

link_list = []

response = requests.get(url, headers=HEADERS)

response.encoding = ‘utf8’

text = response.text

html = etree.HTML(text)

trs = html.xpath(“//div[@class=’mx’]//tr”)

for tr in trs:

tds = tr.xpath(“.//td”)

for td in tds:

link = td.xpath(“./a/@href”)[0].replace(‘..’, ‘https://www.24fa.top’)

link_list.append(link)

get_pic(link_list)

def get_pic(link_list):

“””

获取到图片地址了，差批量下载

“””

for link in link_list:

response = requests.get(link, headers=HEADERS)

response.encoding = ‘utf8’

text = response.text

html = etree.HTML(text)

title = html.xpath(“//h1[@class=’title2′]/text()”)[0] # 获取标题

page_num = html.xpath(“//div[@class=’pager’]/ul/li”)

page_num = len(page_num)3 # 计算总共的页数

download_pic(html,title)

print(“正在下载{}第1页”.format(title))

for num in range(2, page_num+1):

“””

下载第二页之后的

“””

plink = link.replace(“.html”, ‘p{}.html’.format(num))

response = requests.get(plink,headers=HEADERS)

response.encoding = ‘utf8’

text = response.text

html = etree.HTML(text)

download_pic(html, title)

print(“正在下载{}第{}页”.format(title,num))

def download_pic(html, title):

pic_links = html.xpath(“//div[@id=’content’]/div/p/img/@src”)

for pic_link in pic_links:

pic_link = pic_link.replace(‘../..’, ‘https://www.24fa.top’)

img = requests.get(pic_link)

file_name = pic_link[15:]

path = “D:\\PICS\\{}”.format(title)

if not os.path.exists(path):

os.makedirs(path)

with open(“{}\\{}”.format(path, file_name), ‘wb’) as f:

f.write(img.content)

f.close()

print(“{}下载成功”.format(file_name))

if __name__ == ‘__main__’:

get_urls()

2楼：速度够快的呀

3楼：期待打包图片。

4楼：期待打包图片，大佬打包上传网盘吧

5楼：期待打包图片到百度云。谢谢大佬。

6楼：期待打包图片，mark

7楼：期待打包图片，mark

8楼：期待打包图片，mark

9楼：期待打包图片

10楼：umei.fun的来一个爬虫

11楼：

期待打包图片到百度云。谢谢大佬。

12楼：emmmm 怎么运行啊有没有教程啊

13楼：

期待打包图片，mark

14楼：这速度很快啊…..大佬

15楼：期待后续报道、~~

16楼：楼主厉害，我原来学Python两周死活写不出一个爬虫，我就放弃了。

17楼：硬核开车，支持下！紫薯布丁

18楼：大佬，能请教下曾经用mac运行pc下的爬虫代码，结果生成了一个怎么也删不掉的文件夹。。何解呀（我已经百度，Google无数次了。。唉）

19楼：期待打包图片，mark

20楼：求打包@！

21楼：跑了一次，创建了一个空文件夹，下载的图片却没有放到文件夹里

22楼：mark 期待大神打包图片

23楼：感谢楼主分享，~~~~~~~~~~~~~~~~~

24楼：只爬了第一页的,后面还38页呢,循环没做异常处理,挂一个就没法继续了

25楼：单线程跑也比较慢,我帮你加了多线程不知道怎么发上来

26楼：666厉害啊,有没直接运行的脚本啊~

27楼：技术流厉害了我的哥

28楼：自己也写了一个用的beautifulsoup 卡在多线程那里了懒多动了

29楼：摩拜大神，求出个详细教程

30楼：我也Mark一下

期待打包图片百度云的连接地址

首页的无圣光图站，写了个python小爬虫

相关推荐