Accessing the Internet in Python Using Urllib Library

The urllib library is a library used to access the internet and get information from websites, including their publicly available source code. You can access and get data from a website using the ult library. Data obtained are generally in JSON format, HTML, or XML. In the tutorial, you will see how to get data from a website using the urllib library. By the end of this tutorial, you will know:

How to a send a request to a url
How to read HTML files from a URL
How to get response headers from a URL

Let’s jump into it.

How to Send a Request to a URL

You can send a request to a URL using the urllib.request() method. Let us see how to run the code

#import the request library
from urllib import request
#send a request to open the website
url = request.urlopen('https://www.h2kinfosys.com/blog/')
 
#print the result code
print('The result code is ', url.getcode())
#print the status
print('The status is ', url.status)

Output:
The result code is  200
The status is  200

Let’s unpack the code above. We begin by importing the request function from urllib library. Afterward, parse the URL you wish to open with the urlopen() function. Finally, check whether the process was successful or not by printing the result code or status.

In both cases, the number 200 was returned. 200 is an HTTP code that shows the request was processed successfully. Another successful HTTP code is 301.

However, numbers such as 404 or 500 are error codes.

How to read HTML files from a URL

You can read HTML file from a url using the request() method. The code below reads the HTML codes for the website defined.

Output:
The output is a bunch of HTML codes.

How to get response headers from a URL

You can get the website headers using the getheaders() method. If you don’t know what a header is, the header of a website is simply the website’s metadata. The code below gets the header of the URL passed in.

#import the request library
from urllib import request
#send a request to open the website
url = request.urlopen('https://www.h2kinfosys.com/blog/')
 
#print the header
print(url.getheaders())

Output:
[('Date', 'Sun, 07 Feb 2021 14:32:30 GMT'), ('Server', 'Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips PHP/7.4.5'), ('X-Powered-By', 'PHP/7.4.5'), ('Link', '<https://www.h2kinfosys.com/blog/wp-json/>; rel="https://api.w.org/"'), ('Link', '<https://www.h2kinfosys.com/blog/>; rel=shortlink'), ('Vary', 'Accept-Encoding'), ('Cache-Control', 'max-age=172800'), ('Expires', 'Tue, 09 Feb 2021 14:32:30 GMT'), ('Strict-Transport-Security', 'max-age=31536000'), ('Connection', 'close'), ('Transfer-Encoding', 'chunked'), ('Content-Type', 'text/html; charset=UTF-8')]

Note that there is a cleaner way of scraping data from a website – using Beautifulsoup and the requests library. You may decide to use the urllib library to avoid external dependencies.

One more thing. It is crucial to point out that many popular websites such as Google, Twitter, Facebook, Amazon, Wikipedia, etc. are not in support of manually requesting data from their website. They would rather have you use their API to access data as it is cleaner and frees traffic they get on their URL address. Manually scraping data over a period of time may trigger their system and have your IP blocked especially if you hit them with too many requests in a short time.

If you have any questions, feel free to leave them in the comment section, and I’d do my best to answer them.

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share this article

What Are the Basics of Salesforce Training for Certification?

April 18, 2025

Everything You’ll Learn in Agile and Scrum Training Courses

April 18, 2025

What are some free online courses for a scrum master?

April 17, 2025

AWS DevSecOps Training Course Overview

April 17, 2025

Scrum Master Certification Online: What You Need to Know Before Enrolling

April 14, 2025

Unlock Opportunities: Top Benefits of a DevOps Course

April 14, 2025

Need a Free Demo Class?

Join H2K Infosys IT Online Training

Enroll Now

Python Software Tutorial: Master the Essentials with Our Comprehensive Guide

September 30, 2024

How to Extract a String Between Two Characters in Python

September 4, 2024

Python vs C++: A Beginner’s Guide to Writing Your First “Hello, World!”

August 13, 2024

PySpark Developer Salary Guide

August 12, 2024

Data Science Interview Questions and Answers

R to R Interview Questions

August 7, 2024

Top Python Data Analyst Interview Questions and Answers

August 7, 2024

Top Python Data Science Interview Questions and Answers

August 6, 2024

How to Use Python Syntax Checkers for Better Code

August 2, 2024

How to Comment Code in Python

July 15, 2024

Top 11 Essential Python Methods and Functions for Beginners

July 8, 2024

Steven Roger

Steven Roger is a technology blogger for the H2K Infosys blog, where he brings complex tech concepts to life with clear, engaging insights. With a passion for IT education and over a decade of industry experience, Steven specializes in demystifying the latest in software development, business analysis, and quality assurance training. His articles provide readers with practical knowledge and tips on upskilling for successful careers in tech.

Read All from Steven Roger