Proxy Detection

There are many use cases for proxy servers other than anonymous web browsing. Some common examples are:

Caching Proxies
Proxies with Firewall & Filtering Purposes (For example proxies that remove JavaScript from websites)
Content-Control Proxies (Often used in schools to filter illicit traffic)
Reverse Proxies used on servers

Load Balancing
Serving cached static content
Compression
TLS acceleration

Translation Proxies
Proxies used for NAT purposes

The detection engine is programmed to detect proxy configurations used for anonymization purposes solely.

Furthermore, the location of proxy servers can be anywhere on the path from client to server. For example, a proxy server may run locally on the same host as the client. Alternatively, the proxy server can be inside a local network of an organization. In large organizations, employees often access the Internet not directly, but over an Internet facing proxy.

With anonymous proxies instead, the point is to camouflage the source IP address of the client. Therefore, it doesn't make sense to use a proxy in the local network (or close to the local network) if the goal is to remain anonymous.

Setting Up a SOCKS 5 Proxy Server

In this section, it is shown how a SOCKS5 proxy server can be setup in some easy steps. The purpose of this section is to show how easy it is to create a dedicated proxy infrastructure from scratch. The open source project Simple Socks Server is used. It allows to create a simple SOCKS5 server and exposes additional SOCKS5 proxy events.

You will need a recent installation of Node.js and npm. Simple Socks Server can be installed with the following command:

$ npm install simple-socks

Then create a JavaScript file named proxy.js with the following contents:

import socks5 from 'simple-socks'

const server = socks5.createServer().listen(1080);

// When a reqest arrives for a remote destination
server.on('proxyConnect', (info, destination) => {
  console.log('connected to remote server at %s:%d', info.address, info.port);

  destination.on('data', (data) => {
    console.log(data.length);
  });
});

// When data arrives from the remote connection
server.on('proxyData', (data) => {
  console.log(data.length);
});

// When an error occurs connecting to remote destination
server.on('proxyError', (err) => {
  console.error('unable to connect to remote server');
  console.error(err);
});

// When a request for a remote destination ends
server.on('proxyDisconnect', (originInfo, destinationInfo, hadError) => {
  console.log(
    'client %s:%d request has disconnected from remote server at %s:%d with %serror',
    originInfo.address,
    originInfo.port,
    destinationInfo.address,
    destinationInfo.port,
    hadError ? '' : 'no ');
});

// When a proxy connection ends
server.on('proxyEnd', (response, args) => {
  console.log('socket closed with code %d', response);
  console.log(args);
});

In order to start your proxy server, enter the following command in a terminal:

$ node proxy.js

And in a separate terminal, you can use your local proxy server with curl as follows:

$ curl proxydetect.live --socks5 127.0.0.1:1080

Now all your requests from curl are tunneled through the proxy. You can of course also use your proxy server with your favorite browser. In case you are using Google Chrome, start Chrome as follows:

$ google-chrome --proxy-server="socks5://127.0.0.1:1080" --user-data-dir="/tmp/" "proxydetect.live"

Proxy Detection Methods

It is crucial to understand that the presence of a proxied connection can be detected on different levels of the network stack. The combination of independent tests give a proxy score that bears high confidence.

HTTP Headers

Some HTTP proxy servers forward the real IP address of the client. The real client IP address is passed as part of the HTTP headers when the proxy server communicates with the destination server. Even though the name of the HTTP header can be anything, some common HTTP header names that contain the client IP address as value are: X-Forwarded-For, X-Requested-With, Via or HTTP_X_FORWARDED_FOR.

WebRTC

WebRTC is a set of standards that allows to add real-time communication capabilities to web applications such as video, voice and generic data transmission. It allows to transmit data between peers in a direct way, without needing an intermediary (such as a server).

One of the protocols in WebRTC is called STUN which stands for Session Traversal Utilities for NAT. According to Wikipedia, STUN provides the following:

STUN provides a tool for hosts to discover the presence of a network address translator, and to discover the mapped, usually public, Internet Protocol (IP) address and port number that the NAT has allocated for the application's User Datagram Protocol (UDP) flows to remote hosts.

STUN and TURN use User Datagram Protocol (UDP). The reason why WebRTC can be used to detect proxies is a combination of two reasons:

Most browsers configured with proxies send only TCP traffic over the proxy. RTCPeerConnection uses UDP by default.
RTCPeerConnection allows to access the public IP address of your device over this protocol.

The JavaScript example below shows how to use WebRTC to display your public IP address:

function leakWebRTC() {
  var RTCPeerConnection = window.RTCPeerConnection
    || window.mozRTCPeerConnection
    || window.webkitRTCPeerConnection;
  // Minimal requirements for data connection
  var mediaConstraints = {
    optional: [{ RtpDataChannels: true }]
  };
  var servers = { iceServers: [{ urls: "stun:stun.l.google.com:19302" }] };
  // Construct a new RTCPeerConnection
  var pc = new RTCPeerConnection(servers, mediaConstraints);
  // Listen for candidate events
  pc.onicecandidate = function (ice) {
    // Skip non-candidate events
    if (ice.candidate) {
      console.log(ice.candidate.address, ice.candidate.port);
    }
  };
  // Create a bogus data channel
  pc.createDataChannel("");
  // Create an offer sdp
  pc.createOffer(function (result) {
    // Trigger the stun server request
    pc.setLocalDescription(result, function () { }, function () { });
  }, function () { });
}
leakWebRTC();

IP Blocklists

The detection engine from proxydetect.live does not use IP blocklists. IP blocklists are basically lists of IP addresses from various sources that claim that the IP addresses on the list were observed to be proxies or VPN's. Examples of such lists:

FireHOL cybercrime feeds
lists_vpn is a list of common VPN providers
bad-asn-list is a open source list of ASNs known to belong to cloud, managed hosting, and colo facilities.

It is very easy to understand why blocklists are not reliable. For example, if someone is setting up a WireGuard VPN server in their home network and they start using the VPN privately when working remotely, how can any contributor of such a ban list know that this residential IP address is hosting a VPN server?

Network Behavior

Networking behavior of browsers configured with proxies is different to browsers without proxy. The reason is simple: The browser appears as if it is situated on the proxy server. For example, if the proxy server does not support connections on all TCP ports, the browser will only fail after a certain delay. Furthermore, the standard error page for a HTTP request to a location that does not exist is different if a proxy is configured. This discrepancy can be used to interpolate proxy usage.

Datacenter Check

The cheapest ways for criminals to use proxies is to host them in a cloud instance. Cloud computing can be purchased from large cloud providers such as Amazon AWS or Google Azure. However, IP address ranges of large cloud providers are known. If a client has a datacenter / cloud IP address, it is a reason to be concerned. End users of the Internet are usually not accessing the Internet over a cloud / hosting provider.

Response Time

Proxies and VPN's often exhibit considerable "lag". This large latency can be measured. Of course, large response times is not only a consequent of using proxies. Therefore, the detection engine picks up on differential aspects of response time measurements.

Since the Internet is implemented according to a packet-switched architecture and IP packets (even in the same network session) can take different routes, round trip times are often not reliable if the sample size is not large enough.

Furthermore, intermediate hops in the Internet might cause routing/processing delays of IP packets. Therefore, it is never clear whether delays are caused by proxies or other side-effects of IP routing. However, it is possible to reasonably cancel out this noise by making repeated network connections.

OS Fingerprinting

Based on network characteristics, the assumed operating system can be interpolated. Different layers of the OSI model have different OS fingerprints. For instance, all major browsers are sending the User-Agent as part of the HTTP headers. More recently, Chrome introduced Sec-CH-UA-* headers for brand identification. For example, the Sec-CH-UA-Platform exposes the platform or operating system on which the user agent is running.

How can OS Fingerprinting be used to detect Proxies? The idea is to find a "lie" between two independently crafted OS fingerprints of different layers of the network stack. For example, one could compare the User-Agent derived operating system with the OS derived from the TCP congestion control algorithm.

The basic idea with OS Fingerprinting is to find a mismatch between two different fingerprints. The reason for the mismatch could be an intermediate proxy server.