-----Original Message----- From: gtld-tech-bounces@icann.org [mailto:gtld-tech-bounces@icann.org] On Behalf Of Stephane Bortzmeyer Sent: Wednesday, February 03, 2016 5:05 AM To: Francisco Arias Cc: gtld-tech@icann.org Subject: Re: [gtld-tech] [weirds] Search Engines Indexing RDAP Server Content
On Wed, Feb 03, 2016 at 12:23:42AM +0000, Francisco Arias <francisco.arias@icann.org> wrote a message of 77 lines which said:
The search page (https://www.google.co.uk/search?q=site:rdg.afilias.info) appears to be the result of crawling links from the first link that appears there (http://rdg.afilias.info/rdap/help). The help page contains links to search and lookup examples that return several objects with their directly-related objects, which are in turn shown in the search results. This could have happened in web-Whois if someone were to publish a page containing example queries.
It seems to me that having a robots.txt at the root of the RDAP server would solve the problem (if you regard it as a problem).
User-agent: * Disallow: /
That will only work if a crawler reads robots.txt and respects the published directive(s). Not all do. Scott