Network capturing with Selenium/PhantomJS

I am using a proxy for this

from selenium import webdriver
from browsermobproxy import Server

server = Server(environment.b_mob_proxy_path)
server.start()
proxy = server.create_proxy()
service_args = ["--proxy-server=%s" % proxy.proxy]
driver = webdriver.PhantomJS(service_args=service_args)

proxy.new_har()
driver.get('url_to_open')
print proxy.har  # this is the archive
# for example:
all_requests = [entry['request']['url'] for entry in proxy.har['log']['entries']]

the 'har' (http archive format) has a lot of other information about the requests and responses, it's very useful to me

installing on Linux:

pip install browsermob-proxy

I use a solution without a proxy server for this. I modified the selenium source code according to the link bellow in order to add the executePhantomJS function.

https://github.com/SeleniumHQ/selenium/pull/2331/files

Then I execute the following script after getting the phantomJS driver:

from selenium.webdriver import PhantomJS

driver = PhantomJS()

script = """
    var page = this;
    page.onResourceRequested = function (req) {
        console.log('requested: ' + JSON.stringify(req, undefined, 4));
    };
    page.onResourceReceived = function (res) {
        console.log('received: ' + JSON.stringify(res, undefined, 4));
    };
"""

driver.execute_phantomjs(script)
driver.get("http://ariya.github.com/js/random/")
driver.quit()

Then all the requests are logged in the console (usually the ghostdriver.log file)


If anyone here is looking for a pure Selenium/Python solution, the following snippet might help. It uses Chrome to log all requests and, as an example, prints all json requests with their corresponding response.

from time import sleep

from selenium import webdriver
from selenium.webdriver import DesiredCapabilities

# make chrome log requests
capabilities = DesiredCapabilities.CHROME

capabilities["loggingPrefs"] = {"performance": "ALL"}  # chromedriver < ~75
# capabilities["goog:loggingPrefs"] = {"performance": "ALL"}  # chromedriver 75+

driver = webdriver.Chrome(
    desired_capabilities=capabilities, executable_path="./chromedriver"
)

# fetch a site that does xhr requests
driver.get("https://sitewithajaxorsomething.com")
sleep(5)  # wait for the requests to take place

# extract requests from logs
logs_raw = driver.get_log("performance")
logs = [json.loads(lr["message"])["message"] for lr in logs_raw]

def log_filter(log_):
    return (
        # is an actual response
        log_["method"] == "Network.responseReceived"
        # and json
        and "json" in log_["params"]["response"]["mimeType"]
    )

for log in filter(log_filter, logs):
    request_id = log["params"]["requestId"]
    resp_url = log["params"]["response"]["url"]
    print(f"Caught {resp_url}")
    print(driver.execute_cdp_cmd("Network.getResponseBody", {"requestId": request_id}))