PII = personally identifiable information (Anyone who can address the question probably already knows that... but I was curious, and figured I'd spare others the look-up.)
-- http://FarmBillPrimer.org http://www.BaltimoreUrbanAg.org (Please send events; This site is hungry.) http://www.ExcellentNutrition.org http://www.packtpub.com/drupal-5-views-recipes/book On Thu, Feb 6, 2014 at 3:49 PM, Tom Lee <t...@sunlightfoundation.com> wrote: > We've been kicking around an idea at Sunlight that aims to use > cryptographic ideas to resolve some of the concerns around the publication > of publicly identifiable information in government disclosures. I could use > some smart people to tell me what's dumb about it. > > We often face challenges related to disambiguating entities: is the John > Smith who gave political donation A the same John Smith that gave political > donation B? One obvious solution to this problem is to push to expand the > information that's collected and disclosed -- if we had John's driver's > license number (DLN), for instance, it'd be easy to disambiguate these > records. But that could introduce privacy concerns for John. One approach > to this problem (which I don't think government has tried) is employing a > one-way hash. > > Obviously the input key space for DLNs and most other personal ID numbers > is so small that reversing this with a dictionary attack would be trivial. > You can add a salt, but only on a per-entity basis (not a per-record basis) > if you want to preserve the capacity to disambiguate. That in turns calls > for a lookup table in which the input keys are stored, which kind of > defeats the point of using a hash (you might as well just assign random > output IDs for each input ID). I would worry about government's ability to > keep this lookup table secure, and I worry about the brittleness of such a > system. > > Alternately, you can use a single system-wide secret (or set of secrets) > to transform inputs into reliable outputs. I think this is less brittle and > maybe easier to preserve as a secret, but this system might be too easily > reversible given the ability to observe its outputs and know the universe > of possible inputs. I'm unsure of the cryptographic options that might be > appropriate here. > > For all I know, the lack of implementations using this kind of one-way > transformation isn't about government sluggishness but rather about its > feasibility. I'd be very curious to hear folks ideas on this score, though. > My general hunch is that something must be possible -- even a few bits' > worth of disambiguating information would be hugely useful to us, and > presumably you're not leaking important amounts of information by, say, > sharing the last digit of a DLN. So there must be a spectrum of options. > But as is probably apparent, I don't think I've got a handle on how to > think about this problem rigorously. > > Tom > > -- > You received this message because you are subscribed to the Google Groups > "sunlightlabs" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to sunlightlabs+unsubscr...@googlegroups.com. > To post to this group, send email to sunlightl...@googlegroups.com. > Visit this group at http://groups.google.com/group/sunlightlabs. > For more options, visit https://groups.google.com/groups/opt_out. >
-- Liberationtech is public & archives are searchable on Google. Violations of list guidelines will get you moderated: https://mailman.stanford.edu/mailman/listinfo/liberationtech. Unsubscribe, change to digest, or change password by emailing moderator at compa...@stanford.edu.