Ray Bellis wrote:
On 09/04/2018 21:42, Paul Vixie wrote:
anonymizing at /48 for v6 and /24 for v4 isn't enough. even the least capable data scientist, using data that's less than a millionth of the other data in google's or facebook's or cambridge analytica's possession, can _trivially_ deanonymize that.
Couldn't that same data scientist also reverse anything that maintains a 1:1 relationship between input and output?
this isn't 1:1, so i don't understand your question.
You appear to be shifting the goal posts. The document doesn't mention safety.
i apologize-- i was not viewing your comments in the context of the document.
The entire documented rationale for the entire RSSAC study and therefore anonymization seems to be this single sentence in the Introduction:
Some operators are uncomfortable sharing IP addresses of the query sources and some are even legally prevented from doing so.
GDPR seems to be the main driver for this right now. I'm (currently) satisfied that pseudonymization of IP addresses by truncation satisfies any obligations we might have there.
i think there's a crypto-lite proposal that preserves identity of endpoints but destroys their associativity. if true, this is likely to do a better job of assuaging GDPR and similar concerns. some of my sensor operators just always report 0.0.0.0 for the source address. the document ought to mention this as an extreme example of privacy preservation. and for my dayjob's purposes, this doesn't matter. but for most forms of dns research, it's important to know that the same endpoint re-asked the same question, or asked several questions. in that sense, prefix masking discards more information than the crypto-lite proposal i saw. this means there is a tradeoff between the deanonymization risk of exposing that a dns transaction and some non-dns activity came from the same network, vs. knowing that two or more dns transactions came from the same endpoint. the context of my remarks is the giant pendulum of history, which swung too far in the direction of "let google and facebook run giant vacuums and hope they and their customers behave ethically, even though there is no transparency", and is now swinging in the direction of "if the user or operator doesn't believe the collection is in their best interests, and if they have not given verificable and revocable permission for it to be collected, and if the laws of the land don't support it, then assume it's bad and prohibit it blanket-wise." the right answer is somewhere in the middle of those pendulum swings. here in rssac-caucus we can afford to consider what's right rather than only what's practical or inevitable. i hope we take that opportunity. -- P Vixie