Here’s an example of a Python code snippet that demonstrates how to rotate proxies using the requests library. This example incorporates best practices such as error handling and user-agent rotation to minimize the risk of detection while web scraping.

Example Code

import requests
import random

# List of proxies
proxies = [
    {'http': 'http://10.10.1.10:3128', 'https': 'http://10.10.1.10:1080'},
    {'http': 'http://10.10.1.11:3128', 'https': 'http://10.10.1.11:1080'},
    {'http': 'http://10.10.1.12:3128', 'https': 'http://10.10.1.12:1080'},
]

# List of user-agents
user_agents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.3 Safari/605.1.15',
]


def get_page(url):
    # Choose a random proxy
    proxy = random.choice(proxies)

    # Choose a random user-agent
    user_agent = random.choice(user_agents)
    headers = {'User-Agent': user_agent}

    try:
        response = requests.get(url, proxies=proxy, headers=headers, timeout=10)        response.raise_for_status()  # Raise HTTPError for bad responses (4xx or 5xx)
        return response.text
    except requests.exceptions.RequestException as e:
        print(f"Request failed: {e}")
        return None

# Example usage
if __name__ == "__main__":
    url = 'https://www.example.com'
    content = get_page(url)
    if content:
        print("Successfully fetched the page.")
        # Process the content here
        # print(content)
    else:
        print("Failed to fetch the page.")

Explanation

  • Proxy List: A list of dictionaries, where each dictionary specifies the HTTP and HTTPS proxies.
  • User-Agent List: A list of user-agent strings to mimic different browsers.
  • get_page Function:
    • Chooses a random proxy and user-agent for each request.
    • Uses requests.get to fetch the content from the specified URL, including error handling.
    • Includes a timeout to prevent the script from hanging indefinitely.
    • Raises an HTTPError for bad responses (4xx or 5xx status codes).
  • Error Handling: The try...except block catches any request exceptions, such as connection errors or timeouts, and prints an error message.

Key Improvements and Best Practices

  • Random Proxy Selection: Choosing a proxy randomly from the list ensures even distribution of requests across different proxies.
  • User-Agent Rotation: Rotating user-agents helps to further disguise the scraper as a normal user.
  • Error Handling: The try...except block ensures that the script handles any request-related errors gracefully, preventing it from crashing.
  • Timeout: Setting a timeout prevents the script from waiting indefinitely for a response from a slow or unresponsive server.
  • Status Code Check: The response.raise_for_status() method raises an HTTPError for bad responses, allowing you to handle them appropriately.

This code provides a basic example of how to rotate proxies in Python using the requests library. For more advanced use cases, you might consider using a more sophisticated proxy management library or service.

Share this post

Subscribe to our newsletter

Keep up with the latest blog posts by staying updated. No spamming: we promise.
By clicking Sign Up you’re confirming that you agree with our Terms and Conditions.

Related posts