At 19:53 09/09/2011, Naela Sarras wrote:
From: doc <<mailto:raymond.doctor@gmail.com>raymond.doctor@gmail.com> Date: Fri, 19 Aug 2011 20:13:48 -0700 To: "<mailto:vip@icann.org>vip@icann.org" <<mailto:vip@icann.org>vip@icann.org> Subject: WHITE PAPER ON "DEFINITIONS" : YOUR COMMENTS AND FEEDBACK REQUESTED
Dear colleagues, I had drafted a white paper on the definitions and questionnaires put forward at the Singapore meet which I attended remotely. The paper comprises three sections: 2.1 On Unicode 2.2. DNS 2.3. The main part of the paper dealing with the notion of variant-hood and which links variants with a script typology to arrive at a "unified theory" of variants.
Dear Naela, I am currently overloaded, but I will try to read all of the documents sent by Raymond. However, I wish we would first clearly and commonly understand where we are and what we are currently doing before we try doing anything else. The Internet is a communication process that aims at permitting every human and machine to digitally relate together. As such, it is the most complex system ever built by humans, and the first one that has attained a universal nature. 1. VIP wants to create some new order in the use of this system in replacing supposed bijective resolution/registration relations (one name -> one IP) with surjective relations (several variant names -> one IP). I say "supposed" because the DNS system is already surjective (the same IP can support several hosts - HTTP.1.1.). This means that the bijection is today "one name -> one IP + one name -> one host". If we want to be complete, the communication of multicast addressing is intensely injective (one name -> multiple hosts). 2. we also have an additional problem, which is IDNA. IDNA introduces four major issues: - punycode does not transport the characteristics of a variance (what makes the variant equivalent) into ASCII. The impact of this has not been studied yet, to my knowledge, in terms of security and of the certain identification of the destination. - punycode is not complete. This is due to the lack of a definition at this time of the metadata injection method. This method is necessary for supporting, for example, French majuscules, what may or may not lead to a transliteration in uppercases. - IDNA is an incorrect architecture on the user side that has to be changed. This is because it is defined as being supported at the application level. On the client side, several applications with different versions or parameters may, therefore, resolve different "address+domain-name"s. On the host side one becomes dependent on the distant application architecture and one does not know for sure (otherwise, this is a VPN) what may happened on the User side. Anyway, the the relation becomes: "one out of several names->client punycode -> server-punycode -> IP + one out of several names -> host -> application". Sometimes the dichotomy host/application will be reduced but we have to live with it for now and be sure that it does not introduce too many discrepancies or security risks. - IDNA is based upon Unicode. IDNA2008 has reduced the impact of the use of Unicode and of its versioning. However, it has not eliminated the noise and limitations and constraints introduced by the use of a middle foreign system. "Foreign" in the sense that ISO 10646 was not designed to support IDNA. This means that the relation now actually becomes: "one out of several names->unicode->client punycode -> server-punycode->unicode -> IP + one out of several names -> host -> application" 3. we have another important problem, which is IPv6. IPv6 provides each Internet user with: - a way to be independently called. - more IIDs (second part of the IPv6 address, that for clarity I name IDv6) than the whole existing Internet number of IP addresses. It is, therefore, possible that every user scales his/her naming scheme accordingly. There is no technical restriction to that; it is just a matter of the database size on his/her PC. Plug and Play will most probably result in such weird local name-spaces populated by different SDOs with their own possible support of variants. This should lead ICANN to publish variant support rules in a way that other SDOs can use and adapt-- and adopt a strategy that supports the transition to such a brave new naming world. 4. all this is obviously subject to the information theory and to the algorithmic information theory <http://en.wikipedia.org/wiki/Algorithmic_information_theory>http://en.wikipedia.org/wiki/Algorithmic_information_theory that takes into account that domain-names are information to processes and people. Let's look at the issue as a general issue for the general DNS family of systems: DDDS. <http://en.wikipedia.org/wiki/Dynamic_Delegation_Discovery_System>http://en.wikipedia.org/wiki/Dynamic_Delegation_Discovery_System. The DDDS should be reversible, like the DNS. Do we want, and how do we make, such systems to be transparently reversible to variants? This means, if a variant is entered and results into an IP+host+application, how do we make sure that the reversion (reverse process operation) may not result in another variant? This calls for some additional implicit, passive, referent or active metadata (i.e. in the copper, in the header, in the context, or in the system intelligent dynamic). Our chain architecturally becomes: "one out of several names->metadata->unicode->client punycode -> server-punycode->unicode -> metadata -> IP + one out of several names -> host -> application" 5. then, there are morphological, semantical, and pragmatical issues to be considered by the linguists. (e.g. cf. Raymond). Not a small task, but which has to be carried within the framework I describe. 6. then, we have the multilinguistic problem of homography, i.e. finding a canonicalization algorithm to prevent the signs of a script used by a language to be confusable with signs used by the script of another language. We started from linguistic diversity and its implications and we have to control what we decide against the consequences on linguistic mutuality in the linguistic ecosystem. Now, what do we have that will enables us to discuss this? We have seven fundamental concepts that we can define "à la" Gregory Bateson: - data: the differences necessary for a process. - information: the differences that make a difference (Bateson). - variants: the differences that make no difference. - canonicalization: reducing the unnecessary differences. - consistency: the differences do not conflict. - protocol: what document data interchanges. - languages: human communication protocols. This means that every other notion that we may need (glossary [I fully agree with Raymond here]) has to be referenced in relation to new concepts that we first have to accept as pertinent and coherent with the seven master concept above. Why so? Because we need to ensure that we do not introduce any flaws (logic, security, etc.) to the reasoning and consequences. This is based upon RFC 1958 (we are to be ready for every possible "change" - here, a new kind of variant) and RFC 3439 (in a very large system, like the Internet, in which its naming is larger than the Internet itself as it may extend to other technologies, the prevalent principle is the principle of simplicity). Reasoning at the conceptual level gives us a better chance to keep things simple and coherent at the operational level. jfc