Re: [gtld-tech] [weirds] Search Engines Indexing RDAP Server Content

newer
Re: [gtld-tech] Draft gTLD RDAP...

Francisco Arias

Jan. 29, 2016

4:54 p.m.

Hi Scott, In all the cross-posting, it seems you may have left out the list where people have currently in scope doing something about this https://community.icann.org/display/gTLDRDS/Next-Generation+gTLD+Registratio... Regards, -- Francisco On 1/29/16, 5:34 AM, "weirds on behalf of Hollenbeck, Scott" <weirds-bounces@ietf.org on behalf of shollenbeck@verisign.com> wrote:

...

(cross-posting to multiple lists - sorry if that's inconvenient)

So I saw a tweet from Gavin Brown (@GavinBrown) that describes how one particular search engine has indexed the RDAP server of a gTLD registry operator:

https://twitter.com/GavinBrown/status/692718904058191872

This is all the more reason to work on a client authentication specification that includes support for varying responses based on client identity and authorization. I've been working on such a specification and welcome feedback on the approach:

https://datatracker.ietf.org/doc/draft-hollenbeck-weirds-rdap-openid/

It also begs the question of the need for a BCP describing operational practices for server operators. There are ways for web servers to influence or restrict crawler behavior, but what's appropriate in this context?

Scott

_______________________________________________ weirds mailing list weirds@ietf.org https://www.ietf.org/mailman/listinfo/weirds

Show replies by date

Andrew Sullivan

January 2016

5:25 p.m.

New subject: [weirds] Search Engines Indexing RDAP Server Content

(Sticking to this list 'cause I happen to be subscribed here.) On Fri, Jan 29, 2016 at 04:54:49PM +0000, Francisco Arias wrote:

...

In all the cross-posting, it seems you may have left out the list where people have currently in scope doing something about this https://community.icann.org/display/gTLDRDS/Next-Generation+gTLD+Registratio...

While that's a fair point to make, I think part of Scott's observation is that we're learning there's a technical mistake in treating all registry data services as though they're interchangeable. Because Whois doesn't provide links, it doesn't ecourage crawlers to build an independent database of linked data the way RDAP does. So, without privacy protections, deploying RDAP as though it's just Whois-on-the-web actually introduces new vulnerabilities. That seems important to take into consideration in the new profile, regardless of what the policy documents say. Surely the policy documents do not require the introduction of new data vulnerabilities just because the policy implies that? Best regards, A -- Andrew Sullivan Dyn asullivan@dyn.com

Francisco Arias

10:39 p.m.

New subject: [weirds] Search Engines Indexing RDAP Server Content

On 1/29/16, 9:25 AM, "gtld-tech-bounces@icann.org on behalf of Andrew Sullivan" <gtld-tech-bounces@icann.org on behalf of asullivan@dyn.com> wrote:

...

(Sticking to this list 'cause I happen to be subscribed here.)

On Fri, Jan 29, 2016 at 04:54:49PM +0000, Francisco Arias wrote:

...
In all the cross-posting, it seems you may have left out the list where people have currently in scope doing something about this https://community.icann.org/display/gTLDRDS/Next-Generation+gTLD+Registratio...

While that's a fair point to make, I think part of Scott's observation is that we're learning there's a technical mistake in treating all registry data services as though they're interchangeable. Because Whois doesn't provide links, it doesn't ecourage crawlers to build an independent database of linked data the way RDAP does. So, without privacy protections, deploying RDAP as though it's just Whois-on-the-web actually introduces new vulnerabilities.

That seems important to take into consideration in the new profile, regardless of what the policy documents say. Surely the policy documents do not require the introduction of new data vulnerabilities just because the policy implies that?

In the gTLD world we have a set of services called Registration Data Directory Services (RDDS) comprised of WHOIS (port-43) and web-Whois (i.e., an HTML web page). RDAP is intended to join the group and eventually, WHOIS would go away. The behavior described as vulnerability has the same potential to appear in the so-called web-Whois that has been there for years and it is not being proposed to disappear in neither gTLD registries nor registrars. As John said, perhaps the issue is how the alluded RDAP service was implemented. "Beauty is in the eye of the beholder”. What you call a vulnerability others may call it a feature. Please don’t get me wrong, this is not about what I think. The fact of the matter is that gTLD contracts state that all information must be shown in RDDS services, period. If we don’t like it, there is the RDS policy development process that is tasked, among other things, to revisit differentiated access. With the exception of Scott, I don’t see any of the people that have complained about the lack of differentiated access in RDDS in the RDS list at https://community.icann.org/pages/viewpage.action?pageId=56986659. If you care about this issue, please participate in RDS. Regards, -- Francisco

Hollenbeck, Scott

11:15 p.m.

New subject: [weirds] Search Engines Indexing RDAP Server Content

On Jan 29, 2016, at 5:40 PM, Francisco Arias <francisco.arias@icann.org> wrote:

...

On 1/29/16, 9:25 AM, "gtld-tech-bounces@icann.org on behalf of Andrew Sullivan" <gtld-tech-bounces@icann.org on behalf of asullivan@dyn.com> wrote:

...
[snip]

...

...
With the exception of Scott, I don’t see any of the people that have complained about the lack of differentiated access in RDDS in the RDS list at https://community.icann.org/pages/viewpage.action?pageId=56986659. If you care about this issue, please participate in RDS.

It's not in that archive, but you should re-read the EWG report and the comments received thus far on the proposed profile. I'm not alone. Scott

Hollenbeck, Scott

11:22 p.m.

New subject: [weirds] Search Engines Indexing RDAP Server Content

Sorry, the note below might be confusing. There are several members of the EWG that volunteered for the PDP. The topic *will* come up, so Francisco's suggestion is a good one. Get involved. Scott

...

On Jan 29, 2016, at 6:16 PM, Hollenbeck, Scott <shollenbeck@verisign.com> wrote:

...
On Jan 29, 2016, at 5:40 PM, Francisco Arias <francisco.arias@icann.org> wrote:

On 1/29/16, 9:25 AM, "gtld-tech-bounces@icann.org on behalf of Andrew Sullivan" <gtld-tech-bounces@icann.org on behalf of asullivan@dyn.com> wrote:

...
[snip]

...
...
With the exception of Scott, I don’t see any of the people that have complained about the lack of differentiated access in RDDS in the RDS list at https://community.icann.org/pages/viewpage.action?pageId=56986659. If you care about this issue, please participate in RDS.

It's not in that archive, but you should re-read the EWG report and the comments received thus far on the proposed profile. I'm not alone.

Scott

Andrew Sullivan

2:49 a.m.

New subject: [weirds] Search Engines Indexing RDAP Server Content

On Fri, Jan 29, 2016 at 10:39:29PM +0000, Francisco Arias wrote:

...

The behavior described as vulnerability has the same potential to appear in the so-called web-Whois that has been there for years and it is not being proposed to disappear in neither gTLD registries nor registrars.

Poppycock. The RDAP provides, on purpose, links among the objects in its responses. Web whois basically provides a terminal-scrape of what people would get if they still knew how to type whois at a command line. Since crawlers respond automatically to the very machine-readable markup that RDAP was precisely designed to emit, this means that crawlers that were never intending to catalogue the entire whois will now do so as a matter of course.

...

"Beauty is in the eye of the beholder”. What you call a vulnerability others may call it a feature.

Yes. And when my customers are giving me their information and I am forced by contractual terms with ICANN to deploy that in a way that causes a whole new class of people to suck all that up into widely-searchable machine-readable archives, that seems to me to be a new [feature|vulnerability] that I was never in a position to warn people about and to which they didn't agree.

...

The fact of the matter is that gTLD contracts state that all information must be shown in RDDS services, period. If we don’t like it, there is the RDS policy development process that is tasked, among other things, to revisit differentiated access.

With respect, what you are claiming is that the procedure is being followed and therefore this is ok. I am claiming that Scott has uncovered a new consequence of the policy that seems to have consequences for the implementation, and that needs to be taken into consideration. I'm reasonably willing to believe that, if it turned out using RDAP caused you accidentally to forego your first-born child, we'd be having a different discussion about the implementation. So where, exactly, does the line fall here?

...

With the exception of Scott, I don’t see any of the people that have complained about the lack of differentiated access in RDDS in the RDS list at https://community.icann.org/pages/viewpage.action?pageId=56986659. If you care about this issue, please participate in RDS.

I have submitted my name, but I have to admit that part of the difficulty in getting permission to spend yet more time on this is the absurd way that ICANN develops policies around things affecting the Internet: anyone who wants to be a "participant" has to promise to join inconveniently-timed phone calls (well, ok, Internet-carried phone calls), fly to far away places for face to face meetings, and so on. If one could actually participate in Internet policy discussions using, you know, the Internet, it might be somewhat easier to justify participation. Best regards, A -- Andrew Sullivan Dyn asullivan@dyn.com

Francisco Arias

February 2016

12:23 a.m.

New subject: [weirds] Search Engines Indexing RDAP Server Content

I talked with Andrew about the email below and I think we clarified things. I thought I’ll share with the list the assessment that Gustavo and I did on the issue. Andrew, please feel free to correct me. Gustavo and I double checked the draft RDAP profile and do not see any element in there that is leading to expose more data than what the current Whois is, e.g., a domain name links to a few entities (e.g., registrant, registrar, admin, and tech contacts), a registrar, and zero or more name servers. The search page (https://www.google.co.uk/search?q=site:rdg.afilias.info) appears to be the result of crawling links from the first link that appears there (http://rdg.afilias.info/rdap/help). The help page contains links to search and lookup examples that return several objects with their directly-related objects, which are in turn shown in the search results. This could have happened in web-Whois if someone were to publish a page containing example queries. In other words, the alluded behavior is not something enabled by RDAP or the profile. Please let me know if we are missing something. Regards, -- Francisco On 1/29/16, 6:49 PM, "gtld-tech-bounces@icann.org on behalf of Andrew Sullivan" <gtld-tech-bounces@icann.org on behalf of asullivan@dyn.com> wrote:

...

On Fri, Jan 29, 2016 at 10:39:29PM +0000, Francisco Arias wrote:

...
The behavior described as vulnerability has the same potential to appear in the so-called web-Whois that has been there for years and it is not being proposed to disappear in neither gTLD registries nor registrars.

Poppycock. The RDAP provides, on purpose, links among the objects in its responses. Web whois basically provides a terminal-scrape of what people would get if they still knew how to type whois at a command line. Since crawlers respond automatically to the very machine-readable markup that RDAP was precisely designed to emit, this means that crawlers that were never intending to catalogue the entire whois will now do so as a matter of course.

...
"Beauty is in the eye of the beholder”. What you call a vulnerability others may call it a feature.

Yes. And when my customers are giving me their information and I am forced by contractual terms with ICANN to deploy that in a way that causes a whole new class of people to suck all that up into widely-searchable machine-readable archives, that seems to me to be a new [feature|vulnerability] that I was never in a position to warn people about and to which they didn't agree.

...
The fact of the matter is that gTLD contracts state that all information must be shown in RDDS services, period. If we don’t like it, there is the RDS policy development process that is tasked, among other things, to revisit differentiated access.

With respect, what you are claiming is that the procedure is being followed and therefore this is ok. I am claiming that Scott has uncovered a new consequence of the policy that seems to have consequences for the implementation, and that needs to be taken into consideration. I'm reasonably willing to believe that, if it turned out using RDAP caused you accidentally to forego your first-born child, we'd be having a different discussion about the implementation. So where, exactly, does the line fall here?

...
With the exception of Scott, I don’t see any of the people that have complained about the lack of differentiated access in RDDS in the RDS list at https://community.icann.org/pages/viewpage.action?pageId=56986659. If you care about this issue, please participate in RDS.

I have submitted my name, but I have to admit that part of the difficulty in getting permission to spend yet more time on this is the absurd way that ICANN develops policies around things affecting the Internet: anyone who wants to be a "participant" has to promise to join inconveniently-timed phone calls (well, ok, Internet-carried phone calls), fly to far away places for face to face meetings, and so on. If one could actually participate in Internet policy discussions using, you know, the Internet, it might be somewhat easier to justify participation.

Best regards,

A

-- Andrew Sullivan Dyn asullivan@dyn.com

Stephane Bortzmeyer

10:05 a.m.

New subject: [weirds] Search Engines Indexing RDAP Server Content

On Wed, Feb 03, 2016 at 12:23:42AM +0000, Francisco Arias <francisco.arias@icann.org> wrote a message of 77 lines which said:

...

The search page (https://www.google.co.uk/search?q=site:rdg.afilias.info) appears to be the result of crawling links from the first link that appears there (http://rdg.afilias.info/rdap/help). The help page contains links to search and lookup examples that return several objects with their directly-related objects, which are in turn shown in the search results. This could have happened in web-Whois if someone were to publish a page containing example queries.

It seems to me that having a robots.txt at the root of the RDAP server would solve the problem (if you regard it as a problem). Is there anything in the tons of ICANN rules and requirments that prevent us to install such as robots.txt? User-agent: * Disallow: /

Hollenbeck, Scott

12:06 p.m.

New subject: [weirds] Search Engines Indexing RDAP Server Content

...

-----Original Message----- From: gtld-tech-bounces@icann.org [mailto:gtld-tech-bounces@icann.org] On Behalf Of Stephane Bortzmeyer Sent: Wednesday, February 03, 2016 5:05 AM To: Francisco Arias Cc: gtld-tech@icann.org Subject: Re: [gtld-tech] [weirds] Search Engines Indexing RDAP Server Content

On Wed, Feb 03, 2016 at 12:23:42AM +0000, Francisco Arias <francisco.arias@icann.org> wrote a message of 77 lines which said:

...
The search page (https://www.google.co.uk/search?q=site:rdg.afilias.info) appears to be the result of crawling links from the first link that appears there (http://rdg.afilias.info/rdap/help). The help page contains links to search and lookup examples that return several objects with their directly-related objects, which are in turn shown in the search results. This could have happened in web-Whois if someone were to publish a page containing example queries.

It seems to me that having a robots.txt at the root of the RDAP server would solve the problem (if you regard it as a problem).

User-agent: * Disallow: /

That will only work if a crawler reads robots.txt and respects the published directive(s). Not all do. Scott

Michele Neylon - Blacknight

12:38 p.m.

New subject: [weirds] Search Engines Indexing RDAP Server Content

On 03/02/2016, 12:06, "gtld-tech-bounces@icann.org on behalf of Hollenbeck, Scott" <gtld-tech-bounces@icann.org on behalf of shollenbeck@verisign.com> wrote:

...

...
-----Original Message----- From: gtld-tech-bounces@icann.org [mailto:gtld-tech-bounces@icann.org] On Behalf Of Stephane Bortzmeyer Sent: Wednesday, February 03, 2016 5:05 AM To: Francisco Arias Cc: gtld-tech@icann.org Subject: Re: [gtld-tech] [weirds] Search Engines Indexing RDAP Server Content On Wed, Feb 03, 2016 at 12:23:42AM +0000, Francisco Arias <francisco.arias@icann.org> wrote a message of 77 lines which said:

...
The search page (https://www.google.co.uk/search?q=site:rdg.afilias.info) appears to be the result of crawling links from the first link that appears there (http://rdg.afilias.info/rdap/help). The help page contains links to search and lookup examples that return several objects with their directly-related objects, which are in turn shown in the search results. This could have happened in web-Whois if someone were to publish a page containing example queries. It seems to me that having a robots.txt at the root of the RDAP server would solve the problem (if you regard it as a problem). User-agent: * Disallow: /

That will only work if a crawler reads robots.txt and respects the published directive(s). Not all do.

Scott

The nastier bots ignore the robots.txt directives .. As Scott and others have pointed out, unauthenticated access *is* a problem Trying to draw parallels between current whois (web or otherwise) and RDAP might work with less technical types, but with this audience it simply won’t fly. RDAP’s entire “power” lies in the way that you can traverse the database in multiple ways You cannot do that with “normal” whois and this is both a security and a privacy issue Repeatedly telling us you don’t think it is doesn’t change the fact that it is Regards Michele -- Mr Michele Neylon Blacknight Solutions Hosting, Colocation & Domains http://www.blacknight.host/ http://blog.blacknight.com/ http://ceo.hosting/ Intl. +353 (0) 59 9183072 Direct Dial: +353 (0)59 9183090 ------------------------------- Blacknight Internet Solutions Ltd, Unit 12A,Barrowside Business Park,Sleaty Road,Graiguecullen,Carlow,Ireland Company No.: 370845

...

Andrew Newton

4:58 p.m.

New subject: [weirds] Search Engines Indexing RDAP Server Content

On Wed, Feb 3, 2016 at 7:38 AM, Michele Neylon - Blacknight <michele@blacknight.com> wrote:

...

On 03/02/2016, 12:06, "gtld-tech-bounces@icann.org on behalf of Hollenbeck, Scott" <gtld-tech-bounces@icann.org on behalf of shollenbeck@verisign.com> wrote:

...
...
-----Original Message----- From: gtld-tech-bounces@icann.org [mailto:gtld-tech-bounces@icann.org] On Behalf Of Stephane Bortzmeyer Sent: Wednesday, February 03, 2016 5:05 AM To: Francisco Arias Cc: gtld-tech@icann.org Subject: Re: [gtld-tech] [weirds] Search Engines Indexing RDAP Server Content On Wed, Feb 03, 2016 at 12:23:42AM +0000, Francisco Arias <francisco.arias@icann.org> wrote a message of 77 lines which said:

...
The search page (https://www.google.co.uk/search?q=site:rdg.afilias.info) appears to be the result of crawling links from the first link that appears there (http://rdg.afilias.info/rdap/help). The help page contains links to search and lookup examples that return several objects with their directly-related objects, which are in turn shown in the search results. This could have happened in web-Whois if someone were to publish a page containing example queries. It seems to me that having a robots.txt at the root of the RDAP server would solve the problem (if you regard it as a problem). User-agent: * Disallow: /

That will only work if a crawler reads robots.txt and respects the published directive(s). Not all do.

Scott

The nastier bots ignore the robots.txt directives ..

As Scott and others have pointed out, unauthenticated access *is* a problem

Trying to draw parallels between current whois (web or otherwise) and RDAP might work with less technical types, but with this audience it simply won’t fly.

RDAP’s entire “power” lies in the way that you can traverse the database in multiple ways

You cannot do that with “normal” whois and this is both a security and a privacy issue

Repeatedly telling us you don’t think it is doesn’t change the fact that it is

Regards

Michele

This is incidental crawling though. Data miners have been targeting Whois for years with great success. -andy

Francisco Arias

8:54 p.m.

New subject: [weirds] Search Engines Indexing RDAP Server Content

On 2/3/16, 4:38 AM, "Michele Neylon - Blacknight" <michele@blacknight.com> wrote:

...

As Scott and others have pointed out, unauthenticated access *is* a problem

Trying to draw parallels between current whois (web or otherwise) and RDAP might work with less technical types, but with this audience it simply won’t fly.

RDAP’s entire “power” lies in the way that you can traverse the database in multiple ways

You cannot do that with “normal” whois and this is both a security and a privacy issue

Hi Michele, would you care to elaborate on this? -- Francisco

...

Repeatedly telling us you don’t think it is doesn’t change the fact that it is

Regards

Michele

John Levine

5:06 p.m.

New subject: [weirds] Search Engines Indexing RDAP Server Content

...

That will only work if a crawler reads robots.txt and respects the published directive(s). Not all do.

All of the search engines used by consumers do. I'm still having trouble understanding what the problem is here. The specific set of records that Scott noticed are in fact just some examples linked from a public web page, and I see no reason to think that it'd be hard to keep RDAP info out of the usual search engines if that's what you want to do. For a very long time, Domaintools and others have scraped WHOIS info and provide a little of it for free and more for pay. RDAP doesn't change that. If you want to redact information beyond what's in WHOIS, that's a reasonable discussion to have, but it's exactly the same for WHOIS or RDAP. R's, John

Hollenbeck, Scott

5:40 p.m.

New subject: [weirds] Search Engines Indexing RDAP Server Content

...

-----Original Message----- From: John Levine [mailto:johnl@taugh.com] Sent: Wednesday, February 03, 2016 12:06 PM To: gtld-tech@icann.org Cc: Hollenbeck, Scott Subject: Re: [gtld-tech] [weirds] Search Engines Indexing RDAP Server Content

...
That will only work if a crawler reads robots.txt and respects the published directive(s). Not all do.

All of the search engines used by consumers do.

I've personally seen one operated by a company whose name starts with "G" GETting content on a site I operate in violation of the directives I publish in the site's robots.txt file. YMMV.

...

I'm still having trouble understanding what the problem is here. The specific set of records that Scott noticed are in fact just some examples linked from a public web page, and I see no reason to think that it'd be hard to keep RDAP info out of the usual search engines if that's what you want to do. For a very long time, Domaintools and others have scraped WHOIS info and provide a little of it for free and more for pay. RDAP doesn't change that.

RDAP *could* change that.

...

If you want to redact information beyond what's in WHOIS, that's a reasonable discussion to have, but it's exactly the same for WHOIS or RDAP.

If "it" ("it's exactly the same") refers to the source of the data, yes, they are the same. If "it" refers to the tools we have available to control access to the data I have to disagree. The example Gavin found is as you and others have noted. The problem (as I see it anyway) would be more obvious if the indexed response contained PII. As I've said before, I want to deploy RDAP in a way that addresses the issues we have with WHOIS. Functional equivalence provides no significant benefit. Scott

Francisco Arias

8:52 p.m.

New subject: [weirds] Search Engines Indexing RDAP Server Content

On 2/3/16, 9:40 AM, "gtld-tech-bounces@icann.org on behalf of Hollenbeck, Scott" <gtld-tech-bounces@icann.org on behalf of shollenbeck@verisign.com> wrote:

...

As I've said before, I want to deploy RDAP in a way that addresses the issues we have with WHOIS. Functional equivalence provides no significant benefit.

Just to be clear, differentiated access is not the only benefit you get from RDAP. I can think of at least the below benefits: 1. Internationalization support for registration data 2. Standardized query, response, and error messages 3. Standardized extensibility 4. Supports private access to data (i.e., over HTTPS) 5. Bootstrapping mechanism to easily find the authoritative server for a given query 6. Standardized redirection/reference mechanism (e.g., from a thin registry to a registrar) 7. Flexibility to support various policies Regards, -- Francisco

luvingnc＠aol.com

5:22 p.m.

New subject: [weirds] Search Engines Indexing RDAP Server Content

Really? You would elevate these purported benefits to the same level of importance as authenticated (or "differentiated") access? My take on these purported benefits is: 1. Internationalization? Ok, this is a stretch, but I'll give it to you. But it's not like internationalized characters are disallowed by whois. See, for example: whois -h whois.nic.xn--cg4bki nic.xn--cg4bki There are others. 2. Standardized query/response/errrors? Again, a stretch. 301, 302, 404, 429... big deal. Anyway, AWIP and CNRA have tried to do the first two. 3. Ahem... extensibility? Really? Anyone wishing to support anything beyond the profile must undergo the dreaded RSEP, effectively muting this benefit. 4. HTTPS? That's more important than authenticated/differentiated access? Who is clamoring for this? Nobody! 5. Standardized bootstrapping? New gTLDs must all support whois.nic.<tld> 6. Standardized redirection for thin? There are three thin registries: com, net, tv. All provide the reference to registrar with the same key "whois server:" 7. Flexibility to support various policies? Okay, maybe someone at ICANN cares a lot about this, but few in the community care, especially given #3 above. What the community does care about is authenticated/differentiated access. As several have already written, RDAP without this is simply a repackaging of existing capabilities. *NONE* of the above are significant or impactful changes to existing capabilities that benefit any entity, individual or corporate. I get the impression of ICANN internal pressure to force-feed the community RDAP to meet a date, rather than a desire to put forth a noteworthy advancement in RDDS. Rather than dictate compliance with a profile that *seems* a product of ICANN's doing, rather than a grass-roots initiative, why doesn't ICANN invest effort in coordinating a solution to the obvious need for authenticated/differentiated access? And, if ICANN wishes to dictate something useful, federated authentication amongst registries would be a good start. Better yet, ICANN could be the authenticator - THAT would be useful. Ann Hammond -----Original Message----- From: Francisco Arias <francisco.arias@icann.org> To: Hollenbeck, Scott <shollenbeck@verisign.com>; gtld-tech <gtld-tech@icann.org> Sent: Fri, Feb 5, 2016 3:53 pm Subject: Re: [gtld-tech] [weirds] Search Engines Indexing RDAP Server Content On 2/3/16, 9:40 AM, "gtld-tech-bounces@icann.org on behalf of Hollenbeck, Scott" <gtld-tech-bounces@icann.org on behalf of shollenbeck@verisign.com> wrote:

...

As I've said before, I want to deploy RDAP in a way that addresses the issues we have with WHOIS. Functional equivalence provides no significant benefit.

Andrew Sullivan

11:37 p.m.

New subject: [weirds] Search Engines Indexing RDAP Server Content

Hi, In Francisco's defence, which is nowise to suggest I think authenticated-and-differentiated access less important: On Sat, Feb 06, 2016 at 12:22:48PM -0500, Ann Hammond via gtld-tech wrote:

...

1. Internationalization? Ok, this is a stretch, but I'll give it to you. But it's not like internationalized characters are disallowed by whois. See, for example: whois -h whois.nic.xn--cg4bki nic.xn--cg4bki There are others.

Actually, it's not that i18n is disallowed or allowed by whois. It's that the whois protocol is _completely broken_ for this, because it has no way to negotiate anything. With whois, you connect on port 43, send something, and get something vomited back. It's probably ASCII, but no promises. That's the whole protocol. That we are still using whois even a little ought to embarrass us all.

...

2. Standardized query/response/errrors? Again, a stretch. 301, 302, 404, 429... big deal. Anyway, AWIP and CNRA have tried to do the first two.

For machine-to-machine communication, this is a very big deal. And it's not only response codes. You also get JSON objects as opposed to structured text, so the entire result is machine parseable. This was squarely in our sights as a goal when we got the WEIRDS BoF going, and I think it's a successful part of the RDAP design. (Of course, this very machine-parsability is what has so many of us worried that there are unseen opportunities for data leakage without authenticated differential access. I think so far we've not demonstrated such a leak, but that doesn't mean there isn't one.)

...

3. Ahem... extensibility? Really? Anyone wishing to support anything beyond the profile must undergo the dreaded RSEP, effectively muting this benefit.

The ability is certainly there, though I must concede that an ability that can't be exercised for contractual reasons is perhaps somewhat less useful. Let us hope the PDP is successful!

...

4. HTTPS? That's more important than authenticated/differentiated access? Who is clamoring for this? Nobody!

I am. Lookups by innocent parties of registration data should no more be trivially observable on the network than anything else. Encrypt it all.

...

5. Standardized bootstrapping? New gTLDs must all support whois.nic.<tld>

To me, the bootstrapping's sort of a kludge anyway. I would have liked something more flexible and more in keeping with the extensibility, for greater use down the tree, but I was in the rough in the working group.

...

6. Standardized redirection for thin? There are three thin registries: com, net, tv. All provide the reference to registrar with the same key "whois server:"

The point is that rwhois hasn't been that reliable, and RDAP solves this.

...

7. Flexibility to support various policies? Okay, maybe someone at ICANN cares a lot about this, but few in the community care, especially given #3 above.

I cared quite a lot about this when WEIRDS was going, but the IANA-based bootstrapping and the tight contractual control makes this feature rather less interesting.

...

I get the impression of ICANN internal pressure to force-feed the community RDAP to meet a date, rather than a desire to put forth a noteworthy advancement in RDDS. Rather than dictate compliance with a profile that *seems* a product of ICANN's doing, rather than a grass-roots initiative, why doesn't ICANN invest effort in coordinating a solution to the obvious need for authenticated/differentiated access?

I think Francisco's point is that ICANN is doing that co-ordination, but under a PDP; and therefore changes can't be made now. The argument I made before, and still think is true, is that standing up a mandatory profile for contracted parties that contained all the protocol features would mean lower barriers to deployment later. I think ICANN's staff disagree. I think informed people of good will can disagree about this; and in any case I have learned that they have a couple of regular consensus-policy deployment windows a year, which means at least that we can have specific targets. I'd still prefer my approach, though :)

...

And, if ICANN wishes to dictate something useful, federated authentication amongst registries would be a good start. Better yet, ICANN could be the authenticator - THAT would be useful.

I think Scott has already suggested using the federated-authentication stuff in http(s) to do some auth-provider work. ICANN's ability to authenticate a large number of some kinds of client would indeed seem to be a valuable contribution to that development/PoC effort. Best regards, A -- Andrew Sullivan Dyn asullivan@dyn.com

John Levine

2:46 a.m.

New subject: [weirds] Search Engines Indexing RDAP Server Content

To pile on a little:

...

...
2. Standardized query/response/errrors? Again, a stretch. 301, 302, 404, 429... big deal. Anyway, AWIP and CNRA have tried to do the first two.

For machine-to-machine communication, this is a very big deal. And it's not only response codes. ...

Anyone who thinks that standardized queries and responses is not a big deal has clearly never tried to write a WHOIS parser. Mine is over 3500 lines of perl and still misses lots of cases. Standardized redirection is already a big win for IP address lookups, since the RIRs swap space around all the time, but there are plenty of domains where subtrees are handled by different entities, with .ac.uk vs the rest of .uk being an obvious example. I am the registry for a bunch of tiny geographical subdomains of .us. Currently there's no way to ask about my registrants but with RDAP there should be.

...

...
3. Ahem... extensibility? Really? Anyone wishing to support anything beyond the profile must undergo the dreaded RSEP, effectively muting this benefit.

There are over 200 ccTLDs.

...

...
5. Standardized bootstrapping? New gTLDs must all support whois.nic.<tld>

There are over 200 ccTLDs. It's also impressive how many of the WHOIS names for new gTLDs in the IANA database are not whois.nic.<tld>.

...

...
6. Standardized redirection for thin? There are three thin registries: com, net, tv. All provide the reference to registrar with the same key "whois server:"

There are over 200 ccTLDs. Oh, and .jobs is thin too. although it's hard to use since the name of its whois server in the IANA database is wrong. R's, John

Michele Neylon - Blacknight

11:33 a.m.

New subject: [weirds] Search Engines Indexing RDAP Server Content

Re: whois for new gTLDs - whois.nic.$tld will work no matter what is in the IANA DB - they might be using redirects / pointers of some kind, but I haven’t seen any that don’t work - and they’re contractually obliged to use that format. -- Mr Michele Neylon Blacknight Solutions Hosting, Colocation & Domains http://www.blacknight.host/ http://blog.blacknight.com/ http://ceo.hosting/ Intl. +353 (0) 59 9183072 Direct Dial: +353 (0)59 9183090 ------------------------------- Blacknight Internet Solutions Ltd, Unit 12A,Barrowside Business Park,Sleaty Road,Graiguecullen,Carlow,Ireland Company No.: 370845

Mark Elkins

1:03 p.m.

New subject: [weirds] Search Engines Indexing RDAP Server Content

It would be great if "old" gTLD's did the same thing.... and even ccTLD's for that matter. On 07/02/2016 13:33, Michele Neylon - Blacknight wrote:

...

Re: whois for new gTLDs - whois.nic.$tld will work no matter what is in the IANA DB - they might be using redirects / pointers of some kind, but I haven’t seen any that don’t work - and they’re contractually obliged to use that format.

-- Mark James ELKINS - Posix Systems - (South) Africa mje@posix.co.za Tel: +27.128070590 Cell: +27.826010496 For fast, reliable, low cost Internet in ZA: https://ftth.posix.co.za

Michele Neylon - Blacknight

2:05 p.m.

New subject: [weirds] Search Engines Indexing RDAP Server Content

Mark - totally agree :) -- Mr Michele Neylon Blacknight Solutions Hosting, Colocation & Domains http://www.blacknight.host/ http://blog.blacknight.com/ http://ceo.hosting/ Intl. +353 (0) 59 9183072 Direct Dial: +353 (0)59 9183090 ------------------------------- Blacknight Internet Solutions Ltd, Unit 12A,Barrowside Business Park,Sleaty Road,Graiguecullen,Carlow,Ireland Company No.: 370845 On 07/02/2016, 13:03, "Mark Elkins" <gtld-tech-bounces@icann.org on behalf of mje@posix.co.za> wrote:

...

It would be great if "old" gTLD's did the same thing.... and even ccTLD's for that matter.

On 07/02/2016 13:33, Michele Neylon - Blacknight wrote:

...
Re: whois for new gTLDs - whois.nic.$tld will work no matter what is in the IANA DB - they might be using redirects / pointers of some kind, but I haven’t seen any that don’t work - and they’re contractually obliged to use that format.

-- Mark James ELKINS - Posix Systems - (South) Africa mje@posix.co.za Tel: +27.128070590 Cell: +27.826010496 For fast, reliable, low cost Internet in ZA: https://ftth.posix.co.za

John Levine

4:33 p.m.

New subject: weirds bootstrap, was Search Engines Indexing RDAP Server Content

In article <56B740BA.6070208@posix.co.za> you write:

...

It would be great if "old" gTLD's did the same thing.... and even ccTLD's for that matter.

We went through this in great length in WEIRDS, and I'd encourage you to review the mailing list archive. We considered a bunch of in-band signals such as reserved names and SRV records, but we decided that we couldn't tell zone operators, particularly ccTLD zone operators, what to put in their zone files. I'm not thrilled with what we came up with, but having written an IP address RDAP client I'm using in small scale production, I can report that it works fine and the coding is not difficult. I'd be happy if the gTLDs just kept the info in the IANA database up to date. I'm looking at you, Verisign. R's, John PS: For all of your gTLD WHOIS location needs, check out <tld>.ws.sp.am. Now with DNSSEC!

Marc Blanchet

4:40 p.m.

New subject: weirds bootstrap, was Search Engines Indexing RDAP Server Content

On 7 Feb 2016, at 11:33, John Levine wrote:

...

In article <56B740BA.6070208@posix.co.za> you write:

...
It would be great if "old" gTLD's did the same thing.... and even ccTLD's for that matter.

We went through this in great length in WEIRDS, and I'd encourage you to review the mailing list archive. We considered a bunch of in-band signals such as reserved names and SRV records, but we decided that we couldn't tell zone operators, particularly ccTLD zone operators, what to put in their zone files. I'm not thrilled with what we came up with,

no one was. It is an example of a compromise between various alternatives and design constraints. and as you wrote, I’m not sure we want to start again that long discussion. ;-)

...

but having written an IP address RDAP client I'm using in small scale production, I can report that it works fine and the coding is not difficult.

I'd be happy if the gTLDs just kept the info in the IANA database up to date.

well, in fact, the rdap bootstrap registry is just that, but with the additional flexibility for RDAP. So the registry value will be related to the correctness of the content. Marc.

...

I'm looking at you, Verisign.

R's, John

PS: For all of your gTLD WHOIS location needs, check out <tld>.ws.sp.am. Now with DNSSEC!

Hollenbeck, Scott

11:51 a.m.

New subject: weirds bootstrap, was Search Engines Indexing RDAP Server Content

...

-----Original Message----- From: gtld-tech-bounces@icann.org [mailto:gtld-tech-bounces@icann.org] On Behalf Of John Levine Sent: Sunday, February 07, 2016 11:33 AM To: gtld-tech@icann.org Subject: Re: [gtld-tech] weirds bootstrap, was Search Engines Indexing RDAP Server Content

In article <56B740BA.6070208@posix.co.za> you write:

...
It would be great if "old" gTLD's did the same thing.... and even ccTLD's for that matter.

We went through this in great length in WEIRDS, and I'd encourage you to review the mailing list archive. We considered a bunch of in-band signals such as reserved names and SRV records, but we decided that we couldn't tell zone operators, particularly ccTLD zone operators, what to put in their zone files. I'm not thrilled with what we came up with, but having written an IP address RDAP client I'm using in small scale production, I can report that it works fine and the coding is not difficult.

I'd be happy if the gTLDs just kept the info in the IANA database up to date. I'm looking at you, Verisign.

What exactly are you looking at Verisign for? We're not running any production RDAP services. When the time come to add information to the registry it will be taken care of. Scott

Siemen Roorda

6:26 a.m.

New subject: [weirds] Search Engines Indexing RDAP Server Content

I would _love_ it for ccTLDs, but unfortunately nic.* is not always the registry's website. nic.ru is a good example of that. Let alone registries that do not even know what whois is! Taking IANA as a leading source is what we do as far as the information exists, and maybe from that side a 'wrapper' like $tld.whois.iana.org could be provided. I know there is such functionality around but I've never found one that is 100% free of errors. (Okay, maybe such feature on IANA level wouldn't be either, but still...) Siemen On 07/02/16 14:03, Mark Elkins wrote:

...

It would be great if "old" gTLD's did the same thing.... and even ccTLD's for that matter.

On 07/02/2016 13:33, Michele Neylon - Blacknight wrote:

...
Re: whois for new gTLDs - whois.nic.$tld will work no matter what is in the IANA DB - they might be using redirects / pointers of some kind, but I haven’t seen any that don’t work - and they’re contractually obliged to use that format.

-- Met vriendelijke groet, Siemen Roorda Product developer Openprovider Hosting Concepts B.V. Willem Buytewechstraat 40 3024 BN Rotterdam The Netherlands Tel +31 (0)10 448 22 96 Fax +31 (0)10 244 02 50 www.openprovider.nl www.twitter.com/openprovider www.openprovider.co.uk www.twitter.com/openprovider_en www.openprovider.es www.twitter.com/openprovider_es The information contained in this communication is confidential and may be legally privileged. It is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. If you are not the intended recipient you are hereby (a): notified that any disclosure, copying, distribution or taking any action with respect to the content of this information is strictly prohibited and may be unlawful, and (b): kindly requested to inform the sender immediately and destroy any copies.

John Levine

12:38 a.m.

New subject: whois bootstrap, was [weirds] Search Engines Indexing RDAP Server Content

...

Taking IANA as a leading source is what we do as far as the information exists, and maybe from that side a 'wrapper' like $tld.whois.iana.org could be provided. I know there is such functionality around but I've never found one that is 100% free of errors.

I have a WHOIS locator list updated daily at <tld>.ws.sp.am. (WS is Whois Server.) It uses a variety of methods to find the servers, the IANA list, the common whois.nic.<tld>, and a manual list for the ones that are just hard to find. As far as I know it's correct and complete. Oh, and it's DNSSEC signed, too, although that doesn't help much since very few of the WHOIS names it points to are. R's, John

Marc Blanchet

12:52 a.m.

New subject: whois bootstrap, was [weirds] Search Engines Indexing RDAP Server Content

On 8 Feb 2016, at 19:38, John Levine wrote:

...

...
Taking IANA as a leading source is what we do as far as the information exists, and maybe from that side a 'wrapper' like $tld.whois.iana.org could be provided. I know there is such functionality around but I've never found one that is 100% free of errors.

I have a WHOIS locator list updated daily at <tld>.ws.sp.am. (WS is Whois Server.)

It uses a variety of methods to find the servers, the IANA list, the common whois.nic.<tld>, and a manual list for the ones that are just hard to find. As far as I know it's correct and complete.

that is the whole problem. everyone currently has own secret sauce to find those. we need some authoritative way to get the servers one is looking for, in a predictable and standard way. RDAP bootstrap, while not perfect, do provide that capability. Marc.

...

Oh, and it's DNSSEC signed, too, although that doesn't help much since very few of the WHOIS names it points to are.

R's, John

Andrew Sullivan

7:18 a.m.

New subject: [weirds] Search Engines Indexing RDAP Server Content

On Sun, Feb 07, 2016 at 03:03:54PM +0200, Mark Elkins wrote:

...

It would be great if "old" gTLD's did the same thing.... and even ccTLD's for that matter.

What would be great is if we ditched whois altogether in favour of RDAP, which has a mechanism to look this up for every top-level DNS name registry. Then we wouldn't have to rely on whois server name heuristics. I'm very much in favour of proceeding with RDAP; I just think the initial profile could be better. Best regards, A

Francisco Arias

7:39 p.m.

New subject: [weirds] Search Engines Indexing RDAP Server Content

On 2/3/16, 2:05 AM, "Stephane Bortzmeyer" <bortzmeyer@nic.fr> wrote:

...

On Wed, Feb 03, 2016 at 12:23:42AM +0000, Francisco Arias <francisco.arias@icann.org> wrote

...
The search page (https://www.google.co.uk/search?q=site:rdg.afilias.info) appears to be the result of crawling links from the first link that appears there (http://rdg.afilias.info/rdap/help). The help page contains links to search and lookup examples that return several objects with their directly-related objects, which are in turn shown in the search results. This could have happened in web-Whois if someone were to publish a page containing example queries.

It seems to me that having a robots.txt at the root of the RDAP server would solve the problem (if you regard it as a problem). Is there anything in the tons of ICANN rules and requirments that prevent us to install such as robots.txt?

User-agent: * Disallow: /

There is nothing in the draft RDAP profile or any other rule in ICANN that I’m aware that would prohibit a registry/registrar to do what you suggest for both web-Whois and RDAP, Stephane. Regards, -- Francisco

Hollenbeck, Scott

12:05 p.m.

New subject: [weirds] Search Engines Indexing RDAP Server Content

...

-----Original Message----- From: gtld-tech-bounces@icann.org [mailto:gtld-tech-bounces@icann.org] On Behalf Of Francisco Arias Sent: Tuesday, February 02, 2016 7:24 PM To: gtld-tech@icann.org Subject: Re: [gtld-tech] [weirds] Search Engines Indexing RDAP Server Content

I talked with Andrew about the email below and I think we clarified things. I thought I’ll share with the list the assessment that Gustavo and I did on the issue. Andrew, please feel free to correct me.

Gustavo and I double checked the draft RDAP profile and do not see any element in there that is leading to expose more data than what the current Whois is, e.g., a domain name links to a few entities (e.g., registrant, registrar, admin, and tech contacts), a registrar, and zero or more name servers.

The search page (https://www.google.co.uk/search?q=site:rdg.afilias.info) appears to be the result of crawling links from the first link that appears there (http://rdg.afilias.info/rdap/help). The help page contains links to search and lookup examples that return several objects with their directly-related objects, which are in turn shown in the search results. This could have happened in web-Whois if someone were to publish a page containing example queries.

In other words, the alluded behavior is not something enabled by RDAP or the profile.

Please let me know if we are missing something.

It's not about exposing more data. It's about making it even easier for that data to be extracted, indexed, archived, and accessed. Unauthenticated RDAP and the current profile proposal continue the WHOIS practice of making PII easily accessible to anyone who asks (including search engines). The fact that the issue isn't new doesn't make it any less of an issue. Scott

3715

Age (days ago)

3726

Last active (days ago)

List overview

Download

29 comments

11 participants

participants (11)

Andrew Newton
Andrew Sullivan
Francisco Arias
Hollenbeck, Scott
John Levine
luvingnc＠aol.com
Marc Blanchet
Mark Elkins
Michele Neylon - Blacknight
Siemen Roorda
Stephane Bortzmeyer