Re: [rssac-caucus] Handing the anonymization document off to RSSAC

April 10, 2018

      Ray Bellis wrote:
...
On 09/04/2018 21:42, Paul Vixie wrote:
...
anonymizing at /48 for v6 and /24 for v4 isn't enough. even the
least capable data scientist, using data that's less than a millionth
of the other data in google's or facebook's or cambridge analytica's
  possession, can _trivially_ deanonymize that.
Couldn't that same data scientist also reverse anything that maintains a
1:1 relationship between input and output?
this isn't 1:1, so i don't understand your question.
...
You appear to be shifting the goal posts.  The document doesn't mention
safety.
i apologize-- i was not viewing your comments in the context of the 
document.
...
The entire documented rationale for the entire RSSAC study and therefore
anonymization seems to be this single sentence in the Introduction:
...
Some operators are uncomfortable sharing IP addresses of the query
sources and some are even legally prevented from doing so.
GDPR seems to be the main driver for this right now.  I'm (currently)
satisfied that pseudonymization of IP addresses by truncation satisfies
any obligations we might have there.
i think there's a crypto-lite proposal that preserves identity of 
endpoints but destroys their associativity. if true, this is likely to 
do a better job of assuaging GDPR and similar concerns.

some of my sensor operators just always report 0.0.0.0 for the source 
address. the document ought to mention this as an extreme example of 
privacy preservation. and for my dayjob's purposes, this doesn't matter. 
but for most forms of dns research, it's important to know that the same 
endpoint re-asked the same question, or asked several questions. in that 
sense, prefix masking discards more information than the crypto-lite 
proposal i saw. this means there is a tradeoff between the 
deanonymization risk of exposing that a dns transaction and some non-dns 
activity came from the same network, vs. knowing that two or more dns 
transactions came from the same endpoint.

the context of my remarks is the giant pendulum of history, which swung 
too far in the direction of "let google and facebook run giant vacuums 
and hope they and their customers behave ethically, even though there is 
no transparency", and is now swinging in the direction of "if the user 
or operator doesn't believe the collection is in their best interests, 
and if they have not given verificable and revocable permission for it 
to be collected, and if the laws of the land don't support it, then 
assume it's bad and prohibit it blanket-wise."

the right answer is somewhere in the middle of those pendulum swings. 
here in rssac-caucus we can afford to consider what's right rather than 
only what's practical or inevitable. i hope we take that opportunity.

-- 
P Vixie