I haven't done a full analysis but do have a few questions
On 9/9/2013 5:58 PM, Chris Peterson wrote:
Our private database maps access point hash IDs to locations (and
other metadata). Assuming:
H1 = Hash(AP1.MAC + AP1.SSID)
H2 = Hash(AP2.MAC + AP2.SSID)
I assume + means concatenate. I might suggest XORing the values. SSID
names are usually human readable, not meant to be secure and thus follow
predictable patterns. I also hope you're not using the patterned MAC
notation but rather the 48 bit address space representation.
Our private database's schema looks something like:
Hash(AP1.MAC + AP1.SSID) ==> AP1.latitude, AP1.longitude, ...
Hash(AP2.MAC + AP2.SSID) ==> AP2.latitude, AP2.longitude, ...
Is the data aged? What happens if I move? Does this give Mozilla the
ability to historically track me if I move my device? Is that a problem?
(I'm not saying it is, just an observation).
You mention below about filtering APs in multiple locations but clearly
they can move as people relocate.
What is the granularity of the lat/long?
Our published database would include two tables. The first table would
map a random row id to metadata about an anonymous access point:
Random1 ==> AP1.latitude, AP1.longitude, ...
Random2 ==> AP2.latitude, AP2.longitude, ...
I would be hesitant to use the word anonymous here. Latlong is easily
combine with other publicly available databases that could identify
individual address and thus individuals. Again, it comes down to
granularity of the data.
The second table's primary key would be a hash of hashes. It would map
a hash of two neighboring access points' hash IDs to a row id of the
first table. Something like:
Hash(H1 + H2) ==> Random1
Hash(H2 + H1) ==> Random2
Someone querying the published database would need to know the MAC
addresses and current SSIDs of two neighboring access points to look
up either's location.
When you say published, do you mean that the entire DB is published for
use by "researchers" or that it's just has a publicly exposed API that
responds to queries?
I'm assuming if AP3 through AP10 were all also in the vicinity that
Hash(H1+Hx) ==> Random1 where x is in {2,..,10}, correct?
If so, is whatever value Hy is the prefix in the concatenation will
correspond to APy's Random id?
btw, should we use SHA-2 instead of SHA-1? In 2009, NIST recommended
that "Federal agencies should stop using SHA-1 for applications that
require collision resistance as soon as practical, and must use the
SHA-2 family of hash functions for these applications after 2010."
Yes
*R. Jason Cronk, Esq., CIPP/US*
/Privacy Engineering Consultant/, *Enterprivacy Consulting Group*
<enterprivacy.com>
* phone: (828) 4RJCESQ
* twitter: @privacymaverick.com
* blog: http://blog.privacymaverick.com
_______________________________________________
dev-security mailing list
dev-security@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security