Note: This tutorial is part of our “Python Web Scraping Tutorial for Beginners” series.
Table of Contents
What is Requests?
The requests library in Python is a popular and versatile library for making HTTP requests to interact with web services, retrieve data from websites, and perform various web-related tasks. It simplifies the process of sending HTTP requests and handling responses. You can use the “requests” library for web scraping by sending HTTP GET requests to web pages and then parsing the HTML content of those pages to extract the information you need.
You can install the “requests” library using pip if you haven’t already:
pip install requests
Requests for scraping static websites:
The “requests” library in Python is an essential tool for web scraping static websites. It simplifies the process of making HTTP GET requests to retrieve the HTML content of web pages. With “requests,” you can easily fetch the data from a website and access it as text. While it provides the foundational step of fetching web content, additional libraries like BeautifulSoup(we will see it in our next tutorial) are often used with requests to parse and extract specific information from the HTML. By leveraging the “requests” library, web scrapers can efficiently gather data from static websites, making it a fundamental component of many web scraping projects.
Here is how you can use it:
- First, import the library
import requests
- Now we will send a GET request to our targeted website
url = "" response = requests.get(url)
- Now get its page source(HTML)
html_content = response.text # Now, you have the raw HTML content in the 'html_content' variable print(html_content)
In this way, you can fetch the HTML content of any static website and parse it (we will see it in detail in upcoming tutorials)
Using headers in requests for scraping:
Using headers in requests when scraping websites is essential because, headers allow you to mimic a real web browser’s behavior, providing information about your request to the web server. This can help you avoid being blocked or rate-limited and make your scraping activities appear more legitimate. Websites often use headers to determine the user agent (the browser or client making the request), language preferences, and more. By setting appropriate headers, you can ensure that your web scraping activities are efficient.
Here is how you can send headers:
import requests # Define headers to mimic a web browser (you can customize these headers) headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.9999.99 Safari/537.36", "Accept-Language": "en-US,en;q=0.5", } # Specify the URL you want to scrape url = "" # Send an HTTP GET request with the custom headers response = requests.get(url, headers=headers) #HTML content print(html_content)
Using requests to scrape data from APIs:
Apis are a way to get data from the server in JSON format. It is the easiest and simplest way to retrieve data. Before scraping a website, you should first check whether this website provides any API or not.
To scrape data from APIs using Python, you can use the requests library to make HTTP requests and retrieve data in JSON format. Here’s a basic example of how to use requests to scrape data from Github API:
First, we will hit the endpoint of API:
import requests # Make a request to the GitHub API to get a list of users. response = requests.get('')
Now, we will get JSON data from the response
#json() function will return json data from the response json_data = response.json() print(json)
Now you can print JSON data and parse it in the way you want.
Here is the JSON data return from GitHub API.
[ { "login": "mojombo", "id": 1, "node_id": "MDQ6VXNlcjE=", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "defunkt", "id": 2, "node_id": "MDQ6VXNlcjI=", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "pjhyett", "id": 3, "node_id": "MDQ6VXNlcjM=", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "wycats", "id": 4, "node_id": "MDQ6VXNlcjQ=", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "ezmobius", "id": 5, "node_id": "MDQ6VXNlcjU=", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "ivey", "id": 6, "node_id": "MDQ6VXNlcjY=", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "evanphx", "id": 7, "node_id": "MDQ6VXNlcjc=", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "vanpelt", "id": 17, "node_id": "MDQ6VXNlcjE3", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "wayneeseguin", "id": 18, "node_id": "MDQ6VXNlcjE4", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "brynary", "id": 19, "node_id": "MDQ6VXNlcjE5", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "kevinclark", "id": 20, "node_id": "MDQ6VXNlcjIw", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "technoweenie", "id": 21, "node_id": "MDQ6VXNlcjIx", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "macournoyer", "id": 22, "node_id": "MDQ6VXNlcjIy", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "takeo", "id": 23, "node_id": "MDQ6VXNlcjIz", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "caged", "id": 25, "node_id": "MDQ6VXNlcjI1", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "topfunky", "id": 26, "node_id": "MDQ6VXNlcjI2", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "anotherjesse", "id": 27, "node_id": "MDQ6VXNlcjI3", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "roland", "id": 28, "node_id": "MDQ6VXNlcjI4", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "lukas", "id": 29, "node_id": "MDQ6VXNlcjI5", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "fanvsfan", "id": 30, "node_id": "MDQ6VXNlcjMw", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "tomtt", "id": 31, "node_id": "MDQ6VXNlcjMx", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "railsjitsu", "id": 32, "node_id": "MDQ6VXNlcjMy", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "nitay", "id": 34, "node_id": "MDQ6VXNlcjM0", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "kevwil", "id": 35, "node_id": "MDQ6VXNlcjM1", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "KirinDave", "id": 36, "node_id": "MDQ6VXNlcjM2", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "jamesgolick", "id": 37, "node_id": "MDQ6VXNlcjM3", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "atmos", "id": 38, "node_id": "MDQ6VXNlcjM4", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "errfree", "id": 44, "node_id": "MDEyOk9yZ2FuaXphdGlvbjQ0", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "Organization", "site_admin": false }, { "login": "mojodna", "id": 45, "node_id": "MDQ6VXNlcjQ1", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false }, { "login": "bmizerany", "id": 46, "node_id": "MDQ6VXNlcjQ2", "avatar_url": "", "gravatar_id": "", "url": "", "html_url": "", "followers_url": "", "following_url": "{/other_user}", "gists_url": "{/gist_id}", "starred_url": "{/owner}{/repo}", "subscriptions_url": "", "organizations_url": "", "repos_url": "", "events_url": "{/privacy}", "received_events_url": "", "type": "User", "site_admin": false } ]
For example, if you want to extract the names of users. You can do this in this way:
for user in json_data: print(user['login'])