Hi All, I had a kind developer at Tucows 'screen scrape' all of the PPSAI public comments. This means they wrote a program that essentially visited all*of the comments submitted and captured: -the sender -the subject -the body of the message - a url for any attachments - a url for the comment itself online To those fields I've added: - Has Attachment (Y/N) - this allows for easy filtering of comments with attachments - NameCheap (Y/N) - this flag is generated if the message body contains the words "regardless of whether the request comes from a private individual" which comes from the templated namecheap comments.** -Word count - allows for sorting by comment length Screenscraping is never exact, and it has a tough time with some formatting. By and large though it's pretty good and I've found it useful so far for triaging and prioritizing comments. Caveats: * I know it's missing about 15 or so comments, I've figured out a way to identify which are missing and will send those along tomorrow. ** There are many namecheap comments where the sender chose to write their own text and therefore the above phrase is not included, these don't have the Y flag. Similarly, many with the flag will include extra content the sender chose to add. These can be identified by applying a filter on the namecheap column and then sorting by wordcount. The above phrase was chosen because it was long enough to be unlikely to show up in in other comments. This is obviously not perfect at identify those comments. With a bit of excel expertise you should be able to filter and sort the submitted comments as you see fit. We have an obligation to read what's been submitted, and I hope you find the attached makes reading the comments easier, and that it's helpful in understanding what the public is telling us. Graeme (Also, apologies to ICANN for the punishment we gave their webservers while testing and scraping the comments) -- _________________________ Graeme Bunton Manager, Management Information Systems Manager, Public Policy Tucows Inc. PH: 416 535 0123 ext 1634
Hello Again All, Please find attached a new version of my spreadsheet, which now includes about 15 or so missing comments. All of them were templated. Graeme On 2015-07-13 11:07 PM, Graeme Bunton wrote:
Hi All,
I had a kind developer at Tucows 'screen scrape' all of the PPSAI public comments. This means they wrote a program that essentially visited all*of the comments submitted and captured: -the sender -the subject -the body of the message - a url for any attachments - a url for the comment itself online
To those fields I've added: - Has Attachment (Y/N) - this allows for easy filtering of comments with attachments - NameCheap (Y/N) - this flag is generated if the message body contains the words "regardless of whether the request comes from a private individual" which comes from the templated namecheap comments.** -Word count - allows for sorting by comment length
Screenscraping is never exact, and it has a tough time with some formatting. By and large though it's pretty good and I've found it useful so far for triaging and prioritizing comments.
Caveats: * I know it's missing about 15 or so comments, I've figured out a way to identify which are missing and will send those along tomorrow. ** There are many namecheap comments where the sender chose to write their own text and therefore the above phrase is not included, these don't have the Y flag. Similarly, many with the flag will include extra content the sender chose to add. These can be identified by applying a filter on the namecheap column and then sorting by wordcount. The above phrase was chosen because it was long enough to be unlikely to show up in in other comments. This is obviously not perfect at identify those comments.
With a bit of excel expertise you should be able to filter and sort the submitted comments as you see fit.
We have an obligation to read what's been submitted, and I hope you find the attached makes reading the comments easier, and that it's helpful in understanding what the public is telling us.
Graeme
(Also, apologies to ICANN for the punishment we gave their webservers while testing and scraping the comments)
_______________________________________________ Gnso-ppsai-pdp-wg mailing list Gnso-ppsai-pdp-wg@icann.org https://mm.icann.org/mailman/listinfo/gnso-ppsai-pdp-wg
-- _________________________ Graeme Bunton Manager, Management Information Systems Manager, Public Policy Tucows Inc. PH: 416 535 0123 ext 1634
participants (1)
-
Graeme Bunton