HTTP Inspection

From wiki.measurementlab.net
Jump to: navigation, search

HTTP request line path blocking

HTTP header alteration

HTTP host field blocking

HTTP header manipulation HTTP manipulation

URL-path-based HTTP response modification

HTTP Block pages


HTTP Headers Variety of tests: HTTP Header Manipulation: Control result and a control backend that you contact and which encapsulates request headers in body, Field names and content is done in alternating capitalization -- if they don't then there may be a proxy; Invalid HTTP Request Lines: Request method name that is some random string, Version number may be invalid, Goal is to try to trigger errors in a transparent proxy device and maybe get a version number, may also find content blocking; Response Comparison: HTTP request once over Tor and once in country, compare results: Are we sure that the Tor exit node is not subject to its own interference, e.g. exit node operator that did not opt out of uk porn filter scenario; HTTP Header Manipulation OONI Tests: Bidirectional test: between clients and single server, both sides check expected headers to see if there have been any significant changes in headers, Comparison of page fetched over Tor vs. over local connection; Further Possible Improvements (Graduate level of test, compare results): Send and receive (POST/GET) known body data, see if that changes from one end to the next, Difficulties: differentiating injection from legitimate customizations enacted by servers, e.g. l18n, Solution: identify relatively static sites, rely on those; Make this kind of test work with arbitrary websites, based on distributed data, pre-existing databases, or "known good" pathways, Integrate server into Cloudflare, look for signatures, Vary IPs, locations of servers; run test multiple times, have all servers check the incoming client headers as well (then synchronize the data somehow), Integrate test into browser; use that integration to make request look like standard browser request, Consider running test as an automated tool as well, to check for differences; Test with cookies, without cookies, Run test over SSL, without SSL, Use known, neutral headers from external sites, rather than custom headers, to avoid identifying small numbers of users in particular reasons, Work with large, neutral company to forward steganographically-hidden probe requests to M-Lab, Use session cookies, which are very high entropy, to embed information Additional tests in the OONI suite: Requesting live websites from multiple external sites; comparing bodies, Assumptions and Problems: Concerns about liability for published data, the tests themselves, Dangerous for users to be identified in data, OR be identified as contacting different sites, Headers may be unique enough to be identifiable, as related to measurement; may allow authorities to target experimental users; selectively not modify these headers, Assumption that IP address of server is static;


Web Content Filtering Pertinent forms of interference identified: HTTP block pages, DNS injection and transparent proxies; Detection of blocking through comparing HTML: Fetch from two locations, compare pages through means such as term frequency vector analysis, Build a tree of tags, compare DOM structure, Length of page is a good enough predictor; Indirectly detecting through aggregate samples: Do thousands of tests, look for duplicates that may be block pages (title of pages might be good enough or reader whole pages to evaluate similarity similar to anti-phishing efforts); Outstanding questions: What can adversaries do to defeat these? Can we get the block page companies to add a header or XML tag to indicate that it’s a block page? Which do we care more about, precision or recall (false negatives or false positives)? What about regimes that allow HTML but block images? That’s content manipulation, not filtering; What about captive portals (related issue, necessary to solve)? Control page to test for captive port; What if the network is down? Step zero would be to run a baseline network test, Ping, DNS, keyword filtering, baseline tests need to be more innocuous than the themes of the measurement; Reporting network failure is worthwhile too (e.g. Egypt unplugs the network, we want all measurements to report it); DNS Injection could respond with: the IP of the block page, NX Domain (or other failures such as SRV FAIL), random IP address (China has done so historically but stopped to reduce collateral damage), et. al; How to detect two responses is obvious;