I'm thinking of this more at a higher layer: balancing user privacy versus the desire to collect data for analysis. What level of control should the user have on the decision to publish/not publish? In option #3, publication of the data is almost an after-thought (active opt-in). It requires the user to do something extra, which they might skip or forget once they see their results. Option #2 means the user has to actively do something to NOT publish (active opt-out). -Ken On Wed, Jul 7, 2021 at 6:14 AM Andrew McConachie <andrew.mcconachie@icann.org> wrote:
Hi Afifa,
I don’t see how adding hashes solves anything.
Here is how I understand this option working as designed. 1) User runs tool 2) Tool writes to a file 3) User runs publish_data.sh script 4) publish_data.sh reads file from disk and pushes data to repository
If the file is edited after #2 you either get bad data or you get a broken push in #4. Adding a hash would theoretically prevent bad data and force a push failure, but there’s nothing stopping the user from recomputing the hash after they've edited the file.
And adding hashes isn't going to help if #2 fails because the tool doesn’t have write access to disk.
You can avoid all this mess by simply never writing anything to the local disk.
I’m happy to be overridden and told that I should shut up because I’m ICANN staff. But I only see future headaches if the Caucus decides the tool requires read/write interaction with the local disk.
Thanks, Andrew
On 7 Jul 2021, at 11:53, Afifa Abbas <afifa.abbas118@gmail.com> wrote:
Sorry to interrupt the conversation, supporting Ken's 3rd option, to check the integrity of the file we can use hashes to confirm that the data has not been edited. But the hash needs to be checked twice, before upload and after upload and match the value accordingly. Just a thought, open to suggestions.
Regards, Afifa
On Wed, 7 Jul 2021, 14:43 Andrew McConachie, <andrew.mcconachie@icann.org> wrote:
On 6 Jul 2021, at 17:51, Ken Renard <kdrenard2@gmail.com> wrote:
3. User runs the tool and output is generated. Users are then asked (and encouraged) to take an independent action to publish the data (such as run a publish_data.sh script). The user may review the data before executing the action to publish.
Hi Ken,
This one is tricky from a quality control standpoint.
First, because it requires that the tool store data locally somewhere, and then be able to retrieve it later. This can break for a number of reasons.
Second, it gives the user a chance to edit the file before it is sent. With this choice we’re going to end up with lots of user edited data entries, or failed data push events where the user tries to edit the data and then breaks the format of it. Thereby breaking the whole process.
--Andrew _______________________________________________ rssac-caucus mailing list rssac-caucus@icann.org https://mm.icann.org/mailman/listinfo/rssac-caucus
_______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.