Traditional Culture Encyclopedia - Almanac inquiry - How does python capture web content?

How does python capture web content?

If you use python to crawl web information, you need to learn several modules, such as urllib, urllib2, urllib3, requests, httplib and so on. And also learn the re module (that is, regular expressions). Use different modules according to different scenarios to solve problems efficiently and quickly.

At the beginning, I suggest that you start with the simplest urllib module, such as climbing Sina's homepage (statement: this code is for academic research only and has no attack intention):

In this way, the source code of Sina homepage is crawled, which is the information of the whole webpage. If you want to extract information that you find useful, you must learn to use string methods or regular expressions.

Read more articles and tutorials on the Internet at ordinary times, and you will soon learn them.

One more thing: the environment used above is python2. In python3, urllib, urllib2 and urllib3 have been integrated into one package, and there are no more modules named after these words.

Previous article:Dreaming of many people burning incense bodes well.
Next article:65438+20241October 6.