On 09/04/2018 21:42, Paul Vixie wrote:
anonymizing at /48 for v6 and /24 for v4 isn't enough. even the least capable data scientist, using data that's less than a millionth of the other data in google's or facebook's or cambridge analytica's possession, can _trivially_ deanonymize that.
Couldn't that same data scientist also reverse anything that maintains a 1:1 relationship between input and output? At least truncating the data does ensure that some portion of the input data is intentionally destroyed. I think there's a balance somewhere in the (to us desirable) property that this is prefix preserving, against the increase in difficulty because of the N:1 mapping it creates. If there are arguments to be made against prefix truncation then they should be properly documented *in the paper*.
please re-think this. you're making decisions about third party safety
You appear to be shifting the goal posts. The document doesn't mention safety. The entire documented rationale for the entire RSSAC study and therefore anonymization seems to be this single sentence in the Introduction:
Some operators are uncomfortable sharing IP addresses of the query sources and some are even legally prevented from doing so.
GDPR seems to be the main driver for this right now. I'm (currently) satisfied that pseudonymization of IP addresses by truncation satisfies any obligations we might have there. Ray