Hi, First, let me start by agreeing very strongly with Greg that we can make some big gains by distinguishing between what he calls "thin data" (and what I think of as "name data" -- that is, data about the name _as such_) and "contact data" (what I think of as "registration data" -- that is, data about the registration: who did it, whom you can contact in the event of trouble, how to do so, and so on). In the interests of pushing forward along those lines, I'd like to take the position in this mail that the first class of these is one "tranche", and that each such field can be considered. Below I consider each such field and the arguments for and against competely unauthenticated, public access to it. I'm not actually sure I agree with Greg that it is not PII and noncontroversial, but I certainly agree that it is _less_ PII and way less controversial. On Wed, Dec 07, 2016 at 02:55:00PM +0000, Greg Aaron wrote:
called the THIN DATA. This is the basic data about a domain name registration:
the domain name,
For an RDS query about a domain name, this is the primary key by which the data must be fetched. Therefore, this is a necessary condition for an RDS at all. It must be included. Consequently, if someone disagrees that this is required data, that is someone who thinks we should not have an RDS. We should then have the discussion about whether we should have an RDS at all.
the sponsoring registrar name
For an RDS query about a domain name, this data is helpful to humans who are trying to track down the registration of the domain name. Since the point of an RDS is to allow someone who needs certain data about a registration to find that data, there may be an argument that being able to find out the source of the registration could be important. So, that is a reason to include the data. In addition, in a disrtibuted system (so, for instance, if we reverted to thin registries, which a technology like RDAP makes easy), it is necessary to get a referral to the _authoritative_ source of the data, and since the data actually comes from registrars rather than registries getting the sponsoring registrar is needed. (Whether the name is what's necessary for that is a different question; see below.) One could argue that this data should not be included because it is extraneous and could be looked up another way. One could argue that this data should not be included because it gives those who wish to do unauthorized transfers additional information in service of that transfer. Registrars could argue that they don't want their domains under management leaked (because this would allow people to harvest numbers and profile registrar operations).
and ID,
The arguments for and against here are the same as for the registrar name, except for the human consumption part. This ID is much preferable for automatic handling of the data.
the domain's status(es) ,
One could argue that this data needs to be public because one needs to know whether a name ought to be working on the Internet. One could argue that this data should not be included because most of it is not directly relevant to whether a domain name ought to be working. (For instance, whether an update is pending is not necessarily relevant to whether a name ought to resolve on the Internet right now.) Moreover, one could argue that at least some status values radiate information about what a registrant may have done, and also potentially supports attempts to game the registration system to obtain a domain name contrary to the interests of the previous registrant.
created- [date]
One can argue that this data needs to be public in order that one can understand whether the domain name one is querying about is in fact the name that is registered. For instance, if I want to know about example.com that was registered in 1998, and I get a response about a name created in 2017, then that tells me that the domain I am naming is not in fact the same name as the one that is currently registered. (A way to think about this is that the RDS is an atemporal database, but we often ask things that have an implicit temporal reference.) The counter-argument is that the above use is an indirect way of achieving a unique key, and the correct response would be to use unique IDs (perhaps Registry Object IDs or ROIDs) to uniquely identify the domain name rather than proxying by date.
updated- [date]
One can argue that this data needs to be public in order to aid in troubleshooting: if a name worked an hour ago and one can see the updated timestamp as having happened within that window, then the troubleshooter may infer that the update may be a factor in the failure. The counter-argument is that this data radiates information about actions taken on a domain name, and therefore could be used as part of an analysis that yields PII even if it is not PII itself.
expiration date[s]
One can argue that this data needs to be public in order to understand whether there is an operational threat to ongoing operations. One can argue that this data needs not to be public because it does not directly aid Internet operations, and can provide help to those who would attempt to game the registration system to "take over" a domain.
nameservers.
One can argue that this data needs to be public because it helps in troubleshooting failures: if a domain is not working and the DNS and registration data do not match, that may be the source of the probem. Follow-on efforts might include finding the gap between the name servers and registration system, waiting for the propagation time of the registry to pass, or whatever. Moreover, in principle if the registry and DNS are in harmony this data is already public, so there is no harm in including it in another public repository. It is hard for me to come up with an argument why this should not be public except for the case where someone thinks the RDDS is a bad idea in general. I hope this outline helps in narrowing the discussion about these data elements. Best regards, A -- Andrew Sullivan ajs@anvilwalrusden.com