This is extremely interesting and important information. Could you please explain how registrants and users might access such a distributed database (technically) to modify/correct their data or just to view it? Thank you in advance, Nathalie Sent from my iPhone On Jun 25, 2016, at 6:25 AM, James Gannon <james@cyberinvasion.net> wrote:
Suffice it to say I agree with all of Andrews points. I also had been feeling uncomfortable with our direction, but had not known the best way in which to bring it up. Thanks Andrew for doing it for me!
-J
On 25/06/2016, 03:03, "gnso-rds-pdp-wg-bounces@icann.org on behalf of Andrew Sullivan" <gnso-rds-pdp-wg-bounces@icann.org on behalf of ajs@anvilwalrusden.com> wrote:
Dear colleagues,
Apologies first. I'm not going to be in Helsinki. I'm in the middle of a move from NH back to Toronto, and it turns out that my movers' understanding of, "I need to leave on $date," entails arranging things such that goods will arrive after $date. Alas, in this case the goods arrive Monday. I will attempt to follow the ICANN meetings remotely next week, but I expect it will be tricky.
I have been deeply dissatisfied with the way the work is going, and I believe it is because I see a mismatch in what we are trying to do and the kind of system we are trying to do it to. In particular, I think we are trying to treat the RDS as a single monolithic system, and attempting to build "requirements" that match that assumption. Here is an effort to sketch why I think that. I didn't have time to write a short note, &c. &c. Sorry this is long.
Since the very introduction of the competitive-registrar model (and arguably before that), the RDS has been a distributed database. It is far less successful than the other distrubuted database we all know and love -- DNS -- but it is nevertheless distributed.
The distribution comes from different parties having various parts of the data. In so-called "thin" registries, this was always the case. The registry has names and nameservers, and since the invention of registrars knows who the registrar is. But if you wanted to know certain kinds of data, you had to ask the registrar in question.
Because in (say) 1999-2001 nobody had anything better than the whois/rwhois/whois++ protocol(s) to deliver this kind of data, a whole bunch of bad compromises got enshrined in policy. First, we continued to use whois and its descendents (anything on port 43) as the model for all of this. The plain fact is that whois was obsolete nearly at birth. It's a terrible protocol, and should be taken behind the ice house and put out of its misery.
Second, in order to "fix up" whois, clients were created all over the Internet that built in a bunch of assumptions about whom to ask for what data. The consequence of this was that clients routinely got bad data as they queried the wrong server. Old registrar data hung around even after a transfer. When I worked on the org transition from Verisign to PIR in 2003 (?), it took a long time before whois clients stopped asking Verisign about org data. And so on.
Third, in an attempt to hack around the above technical flaws in an already-obsolete protocol, "thick whois" gained popularity in possibly the worst possible arrangement known to data science. Instead of insisting that registries hold the data and that registrars and everyone else treat the registry data as The Truth, we created "thick" whois in registries _without allowing registrars to stop their service_. Any half-competent database person will tell you that storing "the same data" in two places that don't have tight connections is an excellent way to create data inconsistency, but is not a good way to arrive at the truth. (Latterly, as though illustrating the tendency of people to double down on bad ideas, there have been suggestions that ICANN should run the One Giant RDS of the Universe and hold all the data in a central place. What could possibly go wrong?)
The thread running through this history of error is the idea that the RDS is one system. But like the DNS, it only appears to be one system. It's actually a "distributed database", where in this case the distribution is separable on organization lines. That is, registries -- including ICANN, who can be thought of in this case as both the registry and registrar for the root zone -- have some data. Registrars have some other data. Resellers and privacy/proxy services have yet other data. In many cases, the data does not need to be shared across these organizational lines to make it queryable by humans.
The reason that isn't clear to most of us is because whois -- the RDS we use today -- _was_ designed as a monolithic system. It was designed that way because back when it was created -- RFC 812 is from _1982_! -- the database _was_ a monolithic database. Whois (the protocol and the client program) continues to have all the deficiencies for distributed use that you might expect of a program or protocol designed to talk to exactly one authoritative service. Whois++ and rwhois attempted to graft on to this basic protocol some distributed operation, but the graft didn't really take and the ornamental shrub now looks like a weed.
People have nevertheless internalized the whois-based thinking, which is why we keep asking things like, "What data should be collected?" In a distributed system like this, that's barely interesting, for the commercial interests in this case all militate against collecting data that nobody needs for any function. Instead, we should ask what data should be collected _by different actors_. This implicitly involves describing what those actors are doing to require the data.
The nice thing, of course, is that protocol designers have done _a lot_ of this work for us, when they were working on RDAP. They did this because they were trying to come up with use cases for the protocol, which finally did away with the monolithic-system thinking of whois and offers us a protocol designed precisely to work in the distributed-database environment that is the actual registration system. That we even still have a work step that involves evaluating what protocol we're going to use for all this makes me a little ill.
It seems to me that we can just say that we have to embrace the distributed-database fact. For first, it's a fact of how registration actually works now. If we don't agree with that, I think we should give up. Second, it's consistent with how every single other thing on the Internet that has not crashed and burned works. The Internet cannot scale depending on monolithic systems. And nobody has the power to impose one anyway.
Once we have done that, there are still important policy issues about what data ought to be collected by anyone, under what conditions they might reveal it to someone else (and who that someone else is), and so on. But there are empirical tests for whether some of the answers people are proposing really match the distributed nature of the system. If they don't, we can close off those avenues of inquiry, because they'll never be productive.
Best regards,
A
-- Andrew Sullivan ajs@anvilwalrusden.com _______________________________________________ gnso-rds-pdp-wg mailing list gnso-rds-pdp-wg@icann.org https://mm.icann.org/mailman/listinfo/gnso-rds-pdp-wg
gnso-rds-pdp-wg mailing list gnso-rds-pdp-wg@icann.org https://mm.icann.org/mailman/listinfo/gnso-rds-pdp-wg