Traditional Culture Encyclopedia - Traditional culture - python3 how to use the requests module to achieve the crawl page content example details
python3 how to use the requests module to achieve the crawl page content example details
1. install pip
My personal desktop system with linuxmint, the system does not have a default installation of pip, taking into account the back to install the The first step is to install pip.
$ sudo apt install python-pip installed successfully, check the PIP version:
$ pip -V2.
$ pip install requests
Run import requests, if it doesn't prompt an error, then it has been installed successfully!
Test whether the installation was successful
3. Install beautifulsoup4
Beautiful Soup is a Python library that can extract data from HTML or XML files. It enables the usual way of navigating through documents, finding and modifying them with your favorite converter. beautiful Soup will save you hours or even days of work.
$ sudo apt-get install python3-bs4Note: I am using python3 installation here, if you are using python2, you can use the following command to install it.
$ sudo pip install beautifulsoup44.requests module shallow
1) send request
First of all, of course, to import Requests module:
>>> import requests Then, get the target Capture the web page. Here's an example:
>>> r = requests.get('blogs.com/get?key=val. Requests allows you to use the params keyword parameter to provide these parameters as a dictionary of strings.
For example, when we google the keyword "python crawler", the parameters newwindow (new window open), q and oq (search keyword) can be manually formed into a URL, so you can use the following code:
>> > > payload = {'newwindow': '1', 'q': 'python crawler', 'oq': 'python crawler'}
>>> r = requests.get("blogs.com/')
>>> r. encoding
'utf-8' 5) Get the response status code
We can detect the response status code:
>>> r = requests.get('blogs.com/')
>>> r.status_code<
2005. Case Study
Recently, the company just introduced an OA system, here I take its official documentation page as an example, and only crawl the page in the article title and content and other useful information.
Demo Environment
Operating System: linuxmint
Python Version: python 3.5.2
Using Modules: requests, beautifulsoup4
Code as follows:
#! /usr/bin/env python
# -*- coding: utf-8 -*-
_author_ = 'GavinHsueh'
import requests
import bs4
## Address of the target page to be crawled
url = 'http://www.ranzhi.org/book/ranzhi/about-ranzhi-4.html'
#Grab the page number content and return the response object
response = requests.get(url)
#View response status code
status_code = response.status_code
#Use BeautifulSoup to parse the code and lock the page number to specify the tag content
content = bs4. .decode("utf-8"), "lxml")
element = content.find_all(id='book')
print(status_code)
print(element)Program run to return to crawl to the results:
Crawl Success
Crawl to the results of the garbled problem
In fact, at first I was directly using the system default comes with the operation of python2, but in the crawling return content of the encoding of the garbled problem of the tossed half a day, googled a variety of solutions are ineffective. In the python2 "whole crazy" after, had to be honest with python3. For python2 crawling page content garbled problem, welcome all seniors to share their experience to help me and other students to take a detour.
- Previous article:What is the function of VR indoor virtual decoration design?
- Next article:What are the origins and customs of beginning of spring?
- Related articles
- 12987 Brewing process flow chart
- What foods can't be missed in the summer?
- What is hakka yellow rice wine made of?
- Annual sales of more than 20 billion hair dye contains carcinogen formaldehyde?
- Write a composition with the theory of six countries.
- The Biography of Zeng Guofan ¡ª¡ª Several Ways to Improve Self-cultivation
- 6 Model essay on community cultural work plan
Model essay on community cultural work plan 1
In 2020, under the guidance of the Street Party Working Committee and the office, with the activ
- Light as a Cicada Wing White as Snow is which non-heritage item
- Write about two archers in ancient China.
- What is Alibaba's slogan?