Re: [gtld-tech] [weirds] Search Engines Indexing RDAP Server Content

Feb. 3, 2016

      ...
-----Original Message-----
From: John Levine [mailto:johnl@taugh.com]
Sent: Wednesday, February 03, 2016 12:06 PM
To: gtld-tech@icann.org
Cc: Hollenbeck, Scott
Subject: Re: [gtld-tech] [weirds] Search Engines Indexing RDAP Server
Content
...
That will only work if a crawler reads robots.txt and respects the
published directive(s). Not all do.
All of the search engines used by consumers do.
I've personally seen one operated by a company whose name starts with "G" GETting content on a site I operate in violation of the directives I publish in the site's robots.txt file. YMMV.
...
I'm still having trouble understanding what the problem is here.  The
specific set of records that Scott noticed are in fact just some
examples linked from a public web page, and I see no reason to think
that it'd be hard to keep RDAP info out of the usual search engines if
that's what you want to do.  For a very long time, Domaintools and
others have scraped WHOIS info and provide a little of it for free and
more for pay.  RDAP doesn't change that.
RDAP *could* change that.
...
If you want to redact information beyond what's in WHOIS, that's a
reasonable discussion to have, but it's exactly the same for WHOIS or
RDAP.
If "it" ("it's exactly the same") refers to the source of the data, yes, they are the same. If "it" refers to the tools we have available to control access to the data I have to disagree. The example Gavin found is as you and others have noted. The problem (as I see it anyway) would be more obvious if the indexed response contained PII.

As I've said before, I want to deploy RDAP in a way that addresses the issues we have with WHOIS. Functional equivalence provides no significant benefit.

Scott