31 lines
1.1 KiB
Plaintext
31 lines
1.1 KiB
Plaintext
|
||
这是关于腾讯新闻网爬取军事分类新闻的一个可行的代码
|
||
需要注意的是腾讯新闻解析文章详情的代码是通用的,这里没有给出(使用tencent_parser.py即可)
|
||
注意这里需要使用到动态加载(继承DynamicCrawler,并且无需重写_fetch_page())
|
||
```python
|
||
import requests
|
||
from bs4 import BeautifulSoup
|
||
|
||
|
||
URL = "https://news.qq.com/ch/milite"
|
||
headers = {
|
||
"User-Agent": (
|
||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
|
||
"AppleWebKit/537.36 (KHTML, like Gecko) "
|
||
"Chrome/120.0.0.0 Safari/537.36"
|
||
)
|
||
}
|
||
resp = requests.get(URL,headers=headers,timeout=10)
|
||
resp.raise_for_status()
|
||
resp.encoding = "utf-8"
|
||
# print(resp.text)
|
||
# with open("example/example-13.html","r",encoding="utf-8") as f:
|
||
# html = f.read()
|
||
|
||
soup = BeautifulSoup(resp.text,"lxml")
|
||
# soup = BeautifulSoup(html,"lxml")
|
||
div_list = soup.select("div[id='channel-feed-area'] div.channel-feed-list div.channel-feed-item[dt-params*='article_type=0']")
|
||
for div in div_list:
|
||
href = div.select_one("a.article-title").get("href")
|
||
print(href)
|
||
``` |