【转载修改】一次运行，无差别下载100页面的图片！多线程。-优刊号

【声明】

借鉴大神liushao11a的帖子原版

大神的代码是针对CL的，现修改为针对⑨①的，

如有侵权，请联系删除~

内容如下：

import requests

import os

import re

from multiprocessing import Pool

import time

headers = {

‘Host’: ‘f.wonderfulday29.live’,

‘UpgradeInsecureRequests’: ‘1’,

‘Pragma’: ‘nocache’,

‘UserAgent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3702.0 Safari/537.36’

}

header = {

‘UpgradeInsecureRequests’: ‘1’,

‘Pragma’: ‘nocache’,

‘UserAgent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3702.0 Safari/537.36’

}

clurl = ‘http://f.wonderfulday29.live’ #页面url

path_down = ‘E:\\setu\\91PNGs\\’ # 保存的具体文件夹

def downpic(ms):

time.sleep(2)

nus = ms

marukl = clurl + f’/viewthread.php?tid={ms}’

r = requests.get(marukl, headers=headers)

r.encoding = ‘utf8’

mm = r.text.replace(” “, “”).replace(“\r\n”, “”).replace(” “, “”)

res = re.compile(‘file=”(.*?)”‘)

s_fmrs = res.findall(mm) # 获取图片rul

rest = re.compile(‘(.*?)91自拍达人原创申请’)

title = rest.findall(mm)[0] # 获取页面标题

title = title.replace(“：”,””).replace(“:”, “”).replace(“?”, “”).replace(“？”, “”).replace(“|”,”&”).replace(“.”,”=”) #替换：避免创建文件夹不成功

downpath = path_down + str(nus) + title # 采用页面 + 帖子名称组合命名，便于查看。如

if not os.path.exists(downpath):

os.makedirs(downpath)

lisf = os.listdir(downpath)

for irl in range(len(s_fmrs)):

imgurl = s_fmrs[irl]

xu = str(irl)

if len(xu) == 1:

xu = “S00” + xu

elif len(xu) == 2:

xu = “S0” + xu

elif len(xu) == 3:

xu = “S” + xu

f = imgurl.split(“/”)[1]

if len(f) > 13:

df = xu + f[13:]

else:

df = xu + “0” * (13 len(f)) + f

if df not in lisf:

try:

html = requests.get(imgurl, headers=header, timeout=10)

with open(downpath+f’\\{df}’, ‘wb’) as file:

file.write(html.content)

print(imgurl, f”{df}done!”)

except:

print(imgurl, “no!”)

pass

else:

print(f”{df}isdone!”)

if __name__ == ‘__main__’:

for page_nub in range(1, 10): # 第110页全部帖子，具体可根据需要进行修改

mainurl = clurl + f’/forumdisplay.php?fid=19&page={page_nub}’

main_r = requests.get(mainurl, headers=headers, timeout=10)

main_t = main_r.text.replace(” “, “”).replace(“\r\n”, “”).replace(” “, “”)

main_c = re.compile(”)

main_l = main_c.findall(main_t)

pool = Pool(10) # 10进程同步下载！

pool.map(downpic, main_l)

pool.close()

pool.join()

2楼：哈哈，支持一下

3楼：multiprocessing.pool.RemoteTraceback:

“””

Traceback (most recent call last):

File “C:\Users\xxx\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py”, line 121, in worker

result = (True, func(*args, **kwds))

File “C:\Users\李仁杰\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py”, line 44, in mapstar

return list(map(*args))

File “C:\Users\xxx\Desktop\91img.py”, line 34, in downpic

title = rest.findall(mm)[0] # 获取页面标题

IndexError: list index out of range

“””

报错

4楼：给技术党点赞

5楼：halrj 发表于 20191017 15:32

multiprocessing.pool.RemoteTraceback:

“””

Traceback (most recent call last):

我这边跑的没有报错呢

6楼：虚拟机不好跑py，有没有php版

7楼：用的是python2.7 还是python3？

8楼：853100013 发表于 20191017 16:13

虚拟机不好跑py，有没有php版

在大神基础上修改的没有PHP的

9楼：sdushmily 发表于 20191017 16:33

在大神基础上修改的没有PHP的

我用的是python 3

10楼：ilalien 发表于 20191017 16:25

用的是python2.7 还是python3？

我现在用的是python3

11楼：本帖最后由 ilalien 于 20191017 16:55 编辑

测试通过

不过能不能加上daili的配置，感觉还是用daili跑起来快点。

12楼：853100013 发表于 20191017 16:13

虚拟机不好跑py，有没有php版

python不是可以直接用windows跑吗？

我win10用 powershell 跑

13楼：有没有下好的，分享一下

14楼：不会用的路过

15楼：不会用，尴尬

16楼：multiprocessing.pool.RemoteTraceback:

“””

Traceback (most recent call last):

File “C:\Users\xxx\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py”, line 121, in worker

result = (True, func(*args, **kwds))

File “C:\Users\xxx\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py”, line 44, in mapstar

return list(map(*args))

File “C:\Users\xxx\Desktop\91img.py”, line 34, in downpic

title = rest.findall(mm)[0] # 获取页面标题

IndexError: list index out of range

“””

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

File “C:/Users/xxx/Desktop/91img.py”, line 77, in

pool.map(downpic, main_l)

File “C:\Users\xxx\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py”, line 268, in map

return self._map_async(func, iterable, mapstar, chunksize).get()

File “C:\Users\xxx\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py”, line 657, in get

raise self._value

IndexError: list index out of range

>>>

运行一会儿，下载了一些，但一会儿就报错。。。。不知道为什么！！

17楼：halrj 发表于 20191018 07:49

multiprocessing.pool.RemoteTraceback:

“””

Traceback (most recent call last):

我没出现这个问题出现的是创建文件夹的问题因为有的标题里面存在非法字符

18楼：halrj 发表于 20191018 07:49

multiprocessing.pool.RemoteTraceback:

“””

Traceback (most recent call last):

估计 ⑨㈠论坛有反爬措施吧

19楼：yy8y 发表于 20191018 09:43

估计 ⑨㈠论坛有反爬措施吧

不是，就跟前面的10251一样，都存在同一个问题，字面意思是索引问题！

20楼：我可以说，完全看不懂。。这是什么技术。突然感觉和高手差距好大。

21楼：ilalien 发表于 20191017 16:41

测试通过

不过能不能加上daili的配置，感觉还是用daili跑起来快点。

你跑了多长时间，确认没有报错吗？？？

22楼：本帖最后由 liushao11a 于 20191018 11:28 编辑

rest = re.compile(‘(.*?)91自拍达人原创申请’)

建议修改为 rest = re.compile(‘(.*?)<')

23楼：halrj 发表于 20191018 11:03

你跑了多长时间，确认没有报错吗？？？

能跑，但是跑15个左右的贴子就会报错。

24楼：python3 而且，这个。。真的是程序员自己用的东西随改随调。。。

25楼：akira7788 发表于 20191017 17:21

python不是可以直接用windows跑吗？

我win10用 powershell 跑

额，我说的是公网跑php网站的虚机。。。

26楼：853100013 发表于 20191018 13:31

额，我说的是公网跑php网站的虚机。。。

难道是能快速架设MM网站的那种？

27楼：halrj 发表于 20191018 11:03

你跑了多长时间，确认没有报错吗？？？

跑了10分钟不到，就报错了！

28楼：halrj 发表于 20191018 10:02

不是，就跟前面的10251一样，都存在同一个问题，字面意思是索引问题！

嗯嗯。多谢解答，我这边一般只跑5分钟就卡了报错。

【转载修改】一次运行，无差别下载100页面的图片！多线程。

相关推荐