Re: [julia-users] undo hash (dehash, unhash, etc)

2014-12-06 Thread David Koslicki
Thanks John! On Saturday, December 6, 2014 11:43:21 AM UTC-8, John Myles White wrote: > > I finally updated METADATA to point to the fixed version of BloomFilters. > > — John > > On Dec 6, 2014, at 9:23 AM, Stefan Karpinski > wrote: > > On Sat, Dec 6, 2014 at 12:14 PM, David Koslicki > wrote: >

Re: [julia-users] undo hash (dehash, unhash, etc)

2014-12-06 Thread John Myles White
I finally updated METADATA to point to the fixed version of BloomFilters. — John On Dec 6, 2014, at 9:23 AM, Stefan Karpinski wrote: > On Sat, Dec 6, 2014 at 12:14 PM, David Koslicki wrote: > > Implementing your own Bloom filter really shouldn't be too hard. > Alternatively, it might not be

Re: [julia-users] undo hash (dehash, unhash, etc)

2014-12-06 Thread Stefan Karpinski
On Sat, Dec 6, 2014 at 12:14 PM, David Koslicki wrote: > >> Implementing your own Bloom filter really shouldn't be too hard. >> Alternatively, it might not be too hard to file some issues against John's >> package and get it into better working state. If you mention me in an >> issue, I can also

Re: [julia-users] undo hash (dehash, unhash, etc)

2014-12-06 Thread David Koslicki
On Saturday, December 6, 2014 7:56:49 AM UTC-8, Stefan Karpinski wrote: > > On Fri, Dec 5, 2014 at 8:48 PM, David Koslicki > wrote: > >> Thanks for the suggestions: >> >> On Friday, December 5, 2014 5:40:57 PM UTC-8, Jason Merrill wrote: >>> >>> This is the best you can do if >>> >>> 1. Every in

Re: [julia-users] undo hash (dehash, unhash, etc)

2014-12-06 Thread Stefan Karpinski
On Fri, Dec 5, 2014 at 8:48 PM, David Koslicki wrote: > Thanks for the suggestions: > > On Friday, December 5, 2014 5:40:57 PM UTC-8, Jason Merrill wrote: >> >> This is the best you can do if >> >> 1. Every input in your space of possibilities is equally likely, and >> 2. You need to remember eve

Re: [julia-users] undo hash (dehash, unhash, etc)

2014-12-05 Thread David Koslicki
Thanks for the suggestions: On Friday, December 5, 2014 5:40:57 PM UTC-8, Jason Merrill wrote: > > This is the best you can do if > > 1. Every input in your space of possibilities is equally likely, and > 2. You need to remember every input that you've seen > 3. You need to know the order you saw

Re: [julia-users] undo hash (dehash, unhash, etc)

2014-12-05 Thread Jameson Nash
Being able to suggest an appropriate compression algorithm would require knowing both how you intend to use the data (so that you don’t kill performance by packing the data wrong) and what the data looks like (where the “entropy” http://en.wikipedia.org/wiki/Entropy_%28information_theory%29 exists

Re: [julia-users] undo hash (dehash, unhash, etc)

2014-12-05 Thread Jason Merrill
This is the best you can do if 1. Every input in your space of possibilities is equally likely, and 2. You need to remember every input that you've seen 3. You need to know the order you saw them in If you just need to know "have I seen this input before," and you can accept some false positives

Re: [julia-users] undo hash (dehash, unhash, etc)

2014-12-05 Thread David Koslicki
Duly noted, though I did get my answer (no) pretty quickly! ;) Of course, the main problem is still an issue. But then again, it's kind of an "open problem" in bioinformatics (so I don't think this would be the correct forum to ask it in). I appreciate your help! On Friday, December 5, 2014 5:2

Re: [julia-users] undo hash (dehash, unhash, etc)

2014-12-05 Thread David Koslicki
Good suggestion, but I've tried that already, and besides the fact that the HDF5 package (https://github.com/timholy/HDF5.jl) doesn't yet support Int128, this would result in file sizes upwards of 750Gb (too large for my purposes). On Friday, December 5, 2014 5:19:00 PM UTC-8, Jason Merrill wro

Re: [julia-users] undo hash (dehash, unhash, etc)

2014-12-05 Thread Jason Merrill
As a meta point, beware the XY problem: http://meta.stackexchange.com/a/66378 In other words, you'll typically get better answers faster if you start with the broad context, like On Friday, December 5, 2014 5:13:49 PM UTC-8, David Koslicki wrote: >> >> I have strings (on the alphabet {A,C,T,G})

Re: [julia-users] undo hash (dehash, unhash, etc)

2014-12-05 Thread Jason Merrill
Here's one possibility: Interpret A, C, T, G as two bit integers, i.e. A=00, C=01, T=10, G=11. A string of up to 50 of these has 2*50=100 bits, so you could store any such string as a unique Int128. On Friday, December 5, 2014 5:13:49 PM UTC-8, David Koslicki wrote: > > I have strings (on the a

Re: [julia-users] undo hash (dehash, unhash, etc)

2014-12-05 Thread David Koslicki
I have strings (on the alphabet {A,C,T,G}) of length 30 to 50. I am trying to hash them to save on space (as I have a few million to billion of them). I know I should be using a bloom filter (http://en.wikipedia.org/wiki/Bloom_filter) or some other such space-saving data structure, but I'm too

Re: [julia-users] undo hash (dehash, unhash, etc)

2014-12-05 Thread David Koslicki
So a better question to ask would have been: "Is the built-in julia function for hashing strings a perfect hash function". I assume the answer is no... On Friday, December 5, 2014 5:08:08 PM UTC-8, John Myles White wrote: > > For specialized cases it is possible to achieve 1-1-ness: > http://en

Re: [julia-users] undo hash (dehash, unhash, etc)

2014-12-05 Thread Jason Merrill
There might be a good solution to the particular problem you're trying to solve, though. What are you trying to do? On Friday, December 5, 2014 5:08:08 PM UTC-8, John Myles White wrote: > > For specialized cases it is possible to achieve 1-1-ness: > http://en.wikipedia.org/wiki/Perfect_hash_func

Re: [julia-users] undo hash (dehash, unhash, etc)

2014-12-05 Thread John Myles White
For specialized cases it is possible to achieve 1-1-ness: http://en.wikipedia.org/wiki/Perfect_hash_function But this is not something that most people aspire to do for most types since 1-1-ness isn't essential in most applications and is costly to achieve. -- John On Dec 5, 2014, at 5:03 PM,

Re: [julia-users] undo hash (dehash, unhash, etc)

2014-12-05 Thread David Koslicki
Ah, of course! I was hoping that on certain data types it was 1-1, but I guess that was a long shot. Thanks for clarifying. On Friday, December 5, 2014 4:57:41 PM UTC-8, Jason Merrill wrote: > > If the space of possible hashes is smaller than the space of possible > inputs (e.g. the hash is repr

Re: [julia-users] undo hash (dehash, unhash, etc)

2014-12-05 Thread Jason Merrill
If the space of possible hashes is smaller than the space of possible inputs (e.g. the hash is represented with fewer bits than the input data is), which is typically the case, then you can use the Pigeonhole Principle to prove what John wrote: https://en.wikipedia.org/wiki/Pigeonhole_principle

Re: [julia-users] undo hash (dehash, unhash, etc)

2014-12-05 Thread John Myles White
This function is impossible to write in generality since hash functions aren't one-to-one. -- John On Dec 5, 2014, at 4:32 PM, David Koslicki wrote: > Hello, > > Is there a built in function that will undo hash()? > > i.e. I am looking for a function "dehash()" such that > dehash(hash("ACTG

[julia-users] undo hash (dehash, unhash, etc)

2014-12-05 Thread David Koslicki
Hello, Is there a built in function that will undo hash()? i.e. I am looking for a function "dehash()" such that dehash(hash("ACTG")) == "ACTG" I can't seem to find this anywhere (documentation, google, this user group, etc). Thanks, ~David