Volunteer populations, non-volunteer meetings, and associated risks for each group
Axes Risk Informed Consent Incentives
Users: VPN-> lower risk, low awareness Issues-> hosting providers-> representativeness of measurements. Don't know what the coverage is Raspberry PI-> high risk, high awareness BISmark-> GT home routers My Speed Test-> GT Android deployment University VMs->PlanetLab, GENI, etc. Somewhat risky, somewhat aware Problem-> representativeness on academic networks Risks-> government going after university PI-> professors get fired
More open questions: What are the options for doing this? Can we trust the users to assess risk? How to assess risk? User risk vs. risk to platform dev. How does the size of the platform affect risk?
Informed consent-> give users info from several sources
Changing risk Context dependent Magnitude of risk and probability How many others are there?
Incentives: Societal good/ altruism Circumvention-> use research to make circumvention better. Improve operational security as well Can also reduce risk Ease of participation-> is software usable Improve operational security as well How the data will be used-> in contrast to other circumvention tools
Immediate steps Could partition off experiments into different risk levels-> different populations for different experiments Existing tools-> think of as low awareness, so give low risks Users have given limited informed consent Could we tell users the anonymity we provide? Probability of being identified by running our tool Probability of being identified from data after measurements Informed consent database separate from measurements People will likely care about this-> don't want their government reading their emails, but don't care if other people access their email If Google gets their data, doesn't matter because in worst case they get targeted ads Security of data as well as state that it is stored in are important
What if we decentralize informed consent to reduce likelihood of corruption?
Research questions and data risk
Collection: Who do we endanger? Users, neighbors, whoever owns the collector…?
Goal: Build a taxonomy of risks
Sensitive: Location of test operator in the network Details of machine used by test operator Demographics of operators
The thing is...Rule #1 in Internet measurement: “Collect it all” E.g. Sam has a dataset of sensitive info, and yeah, it worries him.
Mapping out the authority space: Who can judge the value decisions, and are they competent to do it? Conflicting/contradicting authorities E.g. legal risk to test operators in Pakistan -- who is supposed to assess it?
IRB approval for Sam’s program -- even if it sanitizes user data before publishing?
Things to look at: Network architecture What does who collect? Many tests look for many different answers Relevant authority? Sam would like to delegate this question to...someone! Exercise: Try to get consensus…
One simple test that can serve as a tangible example: HTTP Header Insertion test.
What output do we upload? One bit -- yes or no? User IP as well? Headers that were added in flight? Maybe these include PII? (e.g. “X-your-IP-Is” There is much meta-risk involved in being a test operator (who’s running these tests, anyway?) -- volunteers, all chrome users…? Who operates the web servers? Can adversary in-country can make list of Tor users who access the site?
Appears we need a social solution -- critical mass of “normal” people running the tests, making it difficult to profile test runners as “X”