-----Original Message----- From: John Levine [mailto:johnl@taugh.com] Sent: Wednesday, February 03, 2016 12:06 PM To: gtld-tech@icann.org Cc: Hollenbeck, Scott Subject: Re: [gtld-tech] [weirds] Search Engines Indexing RDAP Server Content
That will only work if a crawler reads robots.txt and respects the published directive(s). Not all do.
All of the search engines used by consumers do.
I've personally seen one operated by a company whose name starts with "G" GETting content on a site I operate in violation of the directives I publish in the site's robots.txt file. YMMV.
I'm still having trouble understanding what the problem is here. The specific set of records that Scott noticed are in fact just some examples linked from a public web page, and I see no reason to think that it'd be hard to keep RDAP info out of the usual search engines if that's what you want to do. For a very long time, Domaintools and others have scraped WHOIS info and provide a little of it for free and more for pay. RDAP doesn't change that.
RDAP *could* change that.
If you want to redact information beyond what's in WHOIS, that's a reasonable discussion to have, but it's exactly the same for WHOIS or RDAP.
If "it" ("it's exactly the same") refers to the source of the data, yes, they are the same. If "it" refers to the tools we have available to control access to the data I have to disagree. The example Gavin found is as you and others have noted. The problem (as I see it anyway) would be more obvious if the indexed response contained PII. As I've said before, I want to deploy RDAP in a way that addresses the issues we have with WHOIS. Functional equivalence provides no significant benefit. Scott