How to Bypass Google Recaptcha V2 Using Python and HTTPX
How to Bypass Google Recaptcha V2 Using Python and HTTPX, When I try to scrape some websites that use google ReCaptcha v2 to search the product.
The Problem
We won’t get the data before we pass the captcha, the alternative way is we can use playwright and selenium but it’s too heavy to run with our application server, so I try as much as possible not to use the browser in a GUI using drivers and such, but I use httpx as an alternative requests module that I normally use.
The Solution
one of the safest solutions is to use a third party to solve this captcha problem, I use the services of anti-captcha to solve this captcha bypass problem here is the module I use
by combining these two modules we can solve this captcha problem let's see the code execution below
Execute the solution
First, we can install the module first using a pip
pip install install anticaptchaofficial httpx
the next step is to create an account at anticapcha.com then copy the API key from the web then enter the dashboard menu, copy the API key there
here is an example code from the official documentation
# code example
from anticaptchaofficial.recaptchav2proxyless import *
solver = recaptchaV2Proxyless()
solver.set_verbose(1)
solver.set_key("YOUR_API_KEY")
solver.set_website_url("https://website.com")
solver.set_website_key("SITE_KEY")
# Specify softId to earn 10% commission with your app.
# Get your softId here: https://anti-captcha.com/clients/tools/devcenter
solver.set_soft_id(0)
g_response = solver.solve_and_return_solution()
if g_response != 0:
print "g-response: "+g_response
else:
print "task finished with error "+solver.error_code
after reading the above code let’s combine it with httpx
from httpx import Client
from selectolax.parser import HTMLParser
from typing import Optional, Any
from anticaptchaofficial.recaptchav2proxyless import recaptchaV2Proxyless
def get_payload(self, keyword: str) -> dict[str, Any]:
# generating new payload for get new form data automaticaly
response = self.client.get(url=self.url, params=self.params, headers=self.headers)
print(f"Excecute URL {response.url} with status code: {response.status_code}")
# create teamporary area
try:
os.mkdir(join(bc.BASE_DIR, 'temp'))
except FileExistsError:
pass
with open(join(join(bc.BASE_DIR, 'temp'), "res.html"), "w") as file:
file.write(response.text)
file.close()
# save sample cookies
soup = HTMLParser(response.text)
container = soup.css_first("form#SLSearchForm")
district_id = container.css_first("#DistrictGUID").attrs['value']
district_code = container.css_first("input[name=DISTRICTCODE]").attrs['value']
data_status = container.css_first("#DataStatus").attrs['value']
initial_search = container.css_first("input[name=InitialSearch]").attrs['value']
search_type = container.css_first("#SearchType").attrs['value']
sitekey = container.css_first("#recaptcha-div").attrs["data-sitekey"]
# set capcha solver here
print("Solver Setup")
print("Using Anti capcha API KEY: {}".format(self.api_key))
print("website Sitekey: {}".format(sitekey))
self.solver.set_website_url(str(response.url))
self.solver.set_verbose(1)
self.solver.set_key(self.api_key)
self.solver.set_website_key(sitekey)
# return solution
g_recaptcha_response = self.solver.solve_and_return_solution()
# make sure gcapcha is Return a Solution
print("Recaptcha Key: {}".format(g_recaptcha_response))
# append all value
data_dict: dict[str, Any] = {
"DistrictGUID": district_id,
"DISTRICTCODE": district_code,
"DataStatus": data_status,
"InitialSearch": initial_search,
"SearchType": search_type,
"txtAddress_name": keyword,
"g-recaptcha-response": g_recaptcha_response
}
print("Generating new payload")
return data_dict
def get_data(self, keyword: str):
# generate from temporary file
payload: dict[str, Any] = self.get_payload(keyword=keyword)
# using token
token: str = self.get_token()
params: dict[str, Any] = {
"token": token,
"event": "action.PublicSchoolLocatorResults",
"DistrictCode": payload['DISTRICTCODE'],
}
headers: dict[str, Any] = {
# "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
# "Accept-Language": "en-US,en;q=0.9",
# "Accept-Encoding": "gzip, deflate, br",
# "Cache-Control": "max-age=0",
# "Connection": "keep-alive",
# "Content-Length": "5964",
# "Content-Type": "application/x-www-form-urlencoded",
# "Host": "mybaragar.com",
# "Origin": "https://mybaragar.com",
# "Referer": "https://mybaragar.com/index.cfm?event=page.SchoolLocatorPublic&DistrictCode=BC45",
# "sec-ch-ua": '"Not_A Brand";v="99", "Google Chrome";v="109", "Chromium";v="109"',
# 'sec-ch-ua-mobile': "?0",
# "sec-ch-ua-platform": "Sec-Fetch-Dest",
# "Sec-Fetch-Mode": "navigate",
# "Sec-Fetch-Site": "same-origin",
# "Sec-Fetch-User": '?1',
# "Upgrade-Insecure-Requests": '1',
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36"
}
response = self.client.post(url=self.url, params=params, headers=headers, data=payload)
return response
# response: 200
Conclusion
the conclusion is that we can bypass the captcha without using automation tools such as selenium and playwright, like using httpx and anti captcha, hope this is useful and good luck