Do you know that while you are browsing a website, you are mostly tracked by third parties who compile detailed records of your browsing behaviors and of course without your consent? In any case, even the domain owner is unaware of this practice.

Third-party data collection creates a variety of risks. Identifiable details of the person or company can be sold, often for illegal purposes such as fraud or phishing attempts.

webXray is a tool for analyzing third-party content on web pages and identifying the companies which collect user data. This professional tool is designed for academic research, but it can also be used by site administrators as well as people who are usually curious about hidden data streams on the Web.

WHAT YOU WILL NEED

To follow this tutorial, you will need to have Google Chrome installed on your machine along with Chromedriver, Python3 (Version >= 3.4) and Selenium.


INSTALLATION ON UBUNTU/DEBIAN OR OTHER DERIVED DISTRIBUTIONS

Install Chromedriver

Chromedriver allows other programs to control Chrome. To download and install chromedriver, you will need first to execute the below command to find out what is the Google Chrome version you are currently using.

google-chrome --version

The output should look like "Google Chrome 79.0.3945.117". You can now open a web browser and go to https://sites.google.com/a/chromium.org/chromedriver/downloads to grab and copy the URL of the appropriate version of Chromedriver associated to your version of Google Chrome (In our case ChromeDriver 79.0.3945.36).

How to use WebXray to Identify Domains collecting User Data

How to use WebXray to Identify Domains collecting User Data

Once you got the download link of the chromedriver needed by your version of Google Chrome, open a terminal and run the following commands.

cd /tmp/

# Replace the below URL by the one you just grab
wget https://chromedriver.storage.googleapis.com/79.0.3945.36/chromedriver_linux64.zip
unzip chromedriver_linux64.zip

# Need root password
sudo mv chromedriver /usr/bin/

Now run the following command to make sure chromedriver is installed. If you get an error try the above steps again or search the web for advice.

chromedriver --version

Install pip3 and Selenium

Pip is a de facto standard package-management system used to install and manage software packages written in Python. By default pip3 is not installed in Linux distribution. To install it simply run the following commands.

sudo apt install python3-pip

Selenium WebDriver is the most popular tools for Web Automation that can execute automatic actions performed in a web browser window like navigating to a website, filling forms that include dealing with text boxes, radio buttons, and dropdowns, submitting the forms, browsing through web pages, handling pop-ups and so on.

sudo pip3 install selenium

If everything has been done properly, you are now ready to download and use "webXray".


INSTALLATION AND FIRST RUN

To get started you will need first to clone the webXray repository from Github in your computer using the below command.

git clone https://github.com/timlib/webxray
cd webxray && chmod +x run_webxray.py

You are now ready to start using webXray. Before you begin, you must choose between the two audit modes offered by the tool.

Interactive Mode

The interactive mode consists of creating an SQLite database that will be used to store the information transmitted by webXray and analyze it. This mode is very easy to use since you will simply have to execute the below command below and answer the questions displayed on the screen. The advantage of this mode is that you can set up a list of several sites to scan then go to do something else while webXray is scanning your list.

python3 run_webxray.py

For the purpose of this tutorial, we created a file called "custom-list.txt" into the "page_lists" directory of webXray and in which we included 4 URLs to scan.

How to use WebXray to Identify Domains collecting User Data

To get all the third-party elements possible, webXray waits for 45 seconds after loading a page. You can edit this value to make it longer or shorter by changing the line "browser_wait = 45" in "run_webxray.py". When the script has finished scanning your sites list, you can run the "[A] Analyze Data" option to get your stored data analyzed and the reports to be saved.

How to use WebXray to Identify Domains collecting User Data

Once it is done you will find in the "reports" folder a collection of CSV files as per the below details.

How to use WebXray to Identify Domains collecting User Data

Manual Mode

The manual mode can be used to scan a single URL and it will return all the data directly in your terminal.

python3 run_webxray.py -s https://neoslab.com

How to use WebXray to Identify Domains collecting User Data

If you are having problems installing the software or find bugs, please open an issue on GitHub. If you have advanced needs and require assistance, please visit webXray website. This tool is developed, distributed, and hosted by Tim Libert.