With the exception of the hyperbole at the outset and some slight garbling on confidence level definitions, I largely agree with George.

The extremely low response rate and (more importantly) lack of randomness in the sample of responders essentially prohibits one from drawing any conclusions about the population as a whole. IF the selection was random, then the survey margin of error would be about +/- 18% with a 95% confidence level. This would mean we are 95% sure the margin of error is 18% or less and there is a 5% chance the margin of error is greater.

However, “random" means that we picked the 33 members that responded essentially out of a hat filled with the names of the 6600 members. But that is not what happened. The survey probably received responses from their most DNS-savvy members - those that found the purpose of the survey interesting or where the questions seemed more straight-forward. This significantly skews the results. Georges email demonstrates this in more detail.

I don’t think the survey deals with the skewed data set fairly or honestly. The survey characterizes the findings as traits of the entire membership rather than as traits of the population that responded. This can’t be defended.

For instance, I don’t think it is correct to say: "Vast majority (97%) of members registered domain names in past 24 months, with 9 in 10 registering new TLDs,“ and "9 in 10 members have registered new TLD domains in the past two years in the Sunrise Period." I think new TLD owners would be very surprised (and happy) to hear this. It would be accurate to say that "97% of the respondents registered….” (See slides 9 and 11)

More harmful to the credibility of the study are statements such as:

3 in 4 members (76%) have incurred costs for internet monitoring of trademarks in the past 2 years, with more than half (57%) of the members spending $10k or more. (see slide 12)
On average, INTA members spend $150,000 per year on defensive actions (see slides 10 & 27)

These are the types of quotes that find themselves into print and become believed. (“INTA members spend $150K each in defensive efforts, a ~$1 billion cost to industry!”) As George noted, as a rule, larger companies responded and so it can not established by this survey that each of the remaining (smaller) INTA members average $150,000 per year in defensive spend.

There is another interesting facet to the asserted $150K / year spend rate. One company spent $5.2MM. Assuming this $5.2MM spend was over a two-year period, that means that the other 32 respondents averaged (33 x $292K — $5.2MM) / (32 x 2) = $69,000 / year. So except for one outlier, the per year spend by the brand owners that chose to answer the study is half of what the study states. Why didn’t the study make this clear? (see slide 10).

I am not sure of the purpose of the study but there are uses that can be made of it:

There was one conclusion I could draw. It states that UDRP and Sunrise were the favored rights protection mechanisms, used to a major or moderate extent by 67% and 64% of the respondents respectively. The next most utilized RPMs were Trademark Claims and URS (by 36% and 27% respectively). To me this says that, to those who are in-the-know, Sunrise is a highly-valued RPM and, therefore, should be continued. (Sorry, George) (see slides 15 and 51)

Also, the study makes one fact clear that we have already supposed: that business are not aware of new gTLDs and domain utility in general. There are several data sets that point to this. Rather than education efforts that identify costs and target abuse prevention and mitigation only, Brand education could also describe the benefits of domains as strategic tools, that provide greater access to products and indicia of reliability to brands’ customers.

I know this was way pedantic. Sorry. I can’t be on the call as it is at 4AM my time but I’d be pleased to respond to comments or questions.

Best regards,

Kurt

On Aug 29, 2017, at 10:05 PM, George Kirikos <icann@leap.com> wrote:

Hi folks,

I'm not sure how many have had a chance to read the INTA materials for
tomorrow's call yet, or have any background in statistics, but the
survey has truly deep and fatal flaws, making any conclusions drawn
from it entirely unreliable and non-robust.

I could write 50 pages on this (I've read the report three times now,
in horror), but I'll keep it relatively brief (and make these
statements in advance of the call, so that Lori or INTA/Nielsen have a
chance to rebut).

The entire basis of statistical inference is that one can make
statements about an entire population with a certain level of
confidence using only data from a subset of that population (i.e. the
sample in question). Prerequisites are that (a) the sample be random,
and (b) the sample be of sufficient size. INTA's study fails on both
counts (self-selected and unrepresentative sample, and a mere 33
responses).

INTA claims to represent 7,000 organizations as members:

https://www.inta.org/About/Pages/Overview.aspx

While they acknowledge on page 5 of the slides the small sample size
and suggest "some caution", alarm bells should be ringing regarding
that small sample size. Page 6 then demonstrates how unrepresentative
and non-random that sample is, with 52% of the 33 respondents having
total revenue exceeding $5 billion/year, and a whopping 77% (27%+52%)
having revenues exceeding $1 billion. This is hardly representative of
typical TM owners. Similarly, 39% of this sample had 25,000 or more
employees, and 78% (39%+39%) had 5,000 or more employees.

All throughout the report, the slides say "INTA members" (i.e. wrongly
attempting to extrapolate and assert a truth about the entire
population, rather than limiting the statements to be applicable only
to the sample of 33 respondents).

Basic sanity checks were not done with those
extrapolations/inferences. On page 25, the report asserts that "more
than 4 in 10 members have applied to operate a new TLD"?

45% of 7000 members implies 3,150 INTA members applied for new gTLDs.
That's not correct. The total applications by everyone was 1930 -- see
https://newgtlds.icann.org/en/program-status/statistics, and the
number by brand owners is a subset of that total (664 according to
https://icannwiki.org/Brand_TLD and that will be a bit high, due to
multiple applications). If one extrapolated that to the entire
universe of trademark holders (i.e. including non-INTA members),
millions of TM owners, it would be even more obvious how
unrepresentative and non-random the data in this sample is relative to
a "typical" TM holder. This sample is highly skewed to the largest of
the large organizations who happened to self-select a response to this
survey.

All throughout the report, important data on confidence intervals is
missing, obscuring the fact that the level of confidence is extremely
low (and the margin of error is high) due to the small sample size.
[confidence intervals are statements like "+/- 5%, 19 times out of 20]

There are actually calculators that let one know how big a sample
should be, in order to have a certain level of confidence and/or a
margin of error.

e.g. see: https://www.surveymonkey.com/mp/sample-size-calculator/

For a population size of 7000 members (INTA's total membership) and a
95% confidence level, with a huge 10% margin of error, you'd still
need 95 survey responses. Yet, there were only 33 responses. This is
particularly important to be kept in mind for charts with percentages
(pp. 17 and beyond), where the margin of error, even if sampled
properly, would be enormous. Furthermore, those would have had to
have been RANDOMLY sampled responses to be proper, which we know isn't
the case. If you wanted smaller margins of error, say +/- 5%, you need
an even larger sample size (in this case, 365). Another useful
calculator is at:

https://www.surveymonkey.com/mp/margin-of-error-calculator/

ICANN has done surveys, by Nielsen even, that didn't suffer from these
deficiencies, e.g. see:

https://newgtlds.icann.org/en/reviews/cct/registrant-survey-faqs-25sep15-en

A key takeaway from that work was "Due to a low response rate to
emailed invitations to complete the survey, ICANN then worked with
Domain Tools to procure a larger sample of WHOIS records." They took
greater care in that study to have *randomized* samples, too, along
with the larger sample size.

While it is somewhat interesting to have a glimpse into brand
protection of some of the largest companies, ultimately this study is
not robust.

In summary, any conclusions from this INTA study really need to be
taken with a grain of salt, due to the small sample size, combined
with the non-random and unrepresentative sample itself. Indeed, many
of the conclusions need to be read as the *opposite* of what the study
suggests (i.e. if defensive costs are $150K/year for companies with $5
billion+ in revenues, that's a drop in the bucket, and would be much,
much smaller for a "typical" TM owner). To correct these deficiencies,
future surveys need to be random (easily done, e.g. random sample the
USPTO database or other national registries) and have a much larger
sample size. Understandably, that costs money, but that's what it
takes to do things properly.

Sincerely,

George Kirikos
416-588-0269
http://www.leap.com/

On Mon, Aug 28, 2017 at 4:02 PM, Mary Wong <mary.wong@icann.org> wrote:
Dear all,

The proposed agenda for our next Working Group call, scheduled for 0300 UTC
on Thursday 31 August, is as follows:

Roll call (via Adobe Connect and phone bridge only); updates to Statements
of Interest
Review and discuss results of INTA Cost Impact Survey
Next steps/next meeting

For Agenda Item #2, please review the survey results here:
https://community.icann.org/download/attachments/61606864/INTA%20Cost%20Impact%20Report%20revised%204-13-17%20v2.1.pdf?version=1&modificationDate=1500376749000&api=v2

Lori Schulman of INTA, and a member of this Working Group, also did a
presentation of the results to the Competition, Consumer Protection &
Consumer Trust (CCT) Review Team recently that may be helpful to review:
https://community.icann.org/download/attachments/61606864/ICANN%20New%20gTLD%20Survey%20Update%2010May%20Final.pdf?version=1&modificationDate=1501098808000&api=v2.
We are hopeful that Lori will be able to join us for this call, to
facilitate our review and discussion.

Thanks and cheers

Mary

_______________________________________________
gnso-rpm-wg mailing list
gnso-rpm-wg@icann.org
https://mm.icann.org/mailman/listinfo/gnso-rpm-wg
_______________________________________________
gnso-rpm-wg mailing list
gnso-rpm-wg@icann.org
https://mm.icann.org/mailman/listinfo/gnso-rpm-wg