Thanks John!
On Saturday, December 6, 2014 11:43:21 AM UTC-8, John Myles White wrote:
>
> I finally updated METADATA to point to the fixed version of BloomFilters.
>
> — John
>
> On Dec 6, 2014, at 9:23 AM, Stefan Karpinski > wrote:
>
> On Sat, Dec 6, 2014 at 12:14 PM, David Koslicki > wrote:
>
I finally updated METADATA to point to the fixed version of BloomFilters.
— John
On Dec 6, 2014, at 9:23 AM, Stefan Karpinski wrote:
> On Sat, Dec 6, 2014 at 12:14 PM, David Koslicki wrote:
>
> Implementing your own Bloom filter really shouldn't be too hard.
> Alternatively, it might not be
On Sat, Dec 6, 2014 at 12:14 PM, David Koslicki
wrote:
>
>> Implementing your own Bloom filter really shouldn't be too hard.
>> Alternatively, it might not be too hard to file some issues against John's
>> package and get it into better working state. If you mention me in an
>> issue, I can also
On Saturday, December 6, 2014 7:56:49 AM UTC-8, Stefan Karpinski wrote:
>
> On Fri, Dec 5, 2014 at 8:48 PM, David Koslicki > wrote:
>
>> Thanks for the suggestions:
>>
>> On Friday, December 5, 2014 5:40:57 PM UTC-8, Jason Merrill wrote:
>>>
>>> This is the best you can do if
>>>
>>> 1. Every in
On Fri, Dec 5, 2014 at 8:48 PM, David Koslicki wrote:
> Thanks for the suggestions:
>
> On Friday, December 5, 2014 5:40:57 PM UTC-8, Jason Merrill wrote:
>>
>> This is the best you can do if
>>
>> 1. Every input in your space of possibilities is equally likely, and
>> 2. You need to remember eve
Thanks for the suggestions:
On Friday, December 5, 2014 5:40:57 PM UTC-8, Jason Merrill wrote:
>
> This is the best you can do if
>
> 1. Every input in your space of possibilities is equally likely, and
> 2. You need to remember every input that you've seen
> 3. You need to know the order you saw
Being able to suggest an appropriate compression algorithm would require
knowing both how you intend to use the data (so that you don’t kill
performance by packing the data wrong) and what the data looks like (where
the “entropy” http://en.wikipedia.org/wiki/Entropy_%28information_theory%29
exists
This is the best you can do if
1. Every input in your space of possibilities is equally likely, and
2. You need to remember every input that you've seen
3. You need to know the order you saw them in
If you just need to know "have I seen this input before," and you can
accept some false positives
Duly noted, though I did get my answer (no) pretty quickly! ;)
Of course, the main problem is still an issue. But then again, it's kind of
an "open problem" in bioinformatics (so I don't think this would be the
correct forum to ask it in).
I appreciate your help!
On Friday, December 5, 2014 5:2
Good suggestion, but I've tried that already, and besides the fact that the
HDF5 package (https://github.com/timholy/HDF5.jl) doesn't yet support
Int128, this would result in file sizes upwards of 750Gb (too large for my
purposes).
On Friday, December 5, 2014 5:19:00 PM UTC-8, Jason Merrill wro
As a meta point, beware the XY problem:
http://meta.stackexchange.com/a/66378
In other words, you'll typically get better answers faster if you start
with the broad context, like
On Friday, December 5, 2014 5:13:49 PM UTC-8, David Koslicki wrote:
>>
>> I have strings (on the alphabet {A,C,T,G})
Here's one possibility:
Interpret A, C, T, G as two bit integers, i.e. A=00, C=01, T=10, G=11. A
string of up to 50 of these has 2*50=100 bits, so you could store any such
string as a unique Int128.
On Friday, December 5, 2014 5:13:49 PM UTC-8, David Koslicki wrote:
>
> I have strings (on the a
I have strings (on the alphabet {A,C,T,G}) of length 30 to 50. I am trying
to hash them to save on space (as I have a few million to billion of them).
I know I should be using a bloom filter
(http://en.wikipedia.org/wiki/Bloom_filter) or some other such space-saving
data structure, but I'm too
So a better question to ask would have been: "Is the built-in julia
function for hashing strings a perfect hash function". I assume the answer
is no...
On Friday, December 5, 2014 5:08:08 PM UTC-8, John Myles White wrote:
>
> For specialized cases it is possible to achieve 1-1-ness:
> http://en
There might be a good solution to the particular problem you're trying to
solve, though. What are you trying to do?
On Friday, December 5, 2014 5:08:08 PM UTC-8, John Myles White wrote:
>
> For specialized cases it is possible to achieve 1-1-ness:
> http://en.wikipedia.org/wiki/Perfect_hash_func
For specialized cases it is possible to achieve 1-1-ness:
http://en.wikipedia.org/wiki/Perfect_hash_function
But this is not something that most people aspire to do for most types since
1-1-ness isn't essential in most applications and is costly to achieve.
-- John
On Dec 5, 2014, at 5:03 PM,
Ah, of course! I was hoping that on certain data types it was 1-1, but I
guess that was a long shot. Thanks for clarifying.
On Friday, December 5, 2014 4:57:41 PM UTC-8, Jason Merrill wrote:
>
> If the space of possible hashes is smaller than the space of possible
> inputs (e.g. the hash is repr
If the space of possible hashes is smaller than the space of possible
inputs (e.g. the hash is represented with fewer bits than the input data
is), which is typically the case, then you can use the Pigeonhole Principle
to prove what John wrote:
https://en.wikipedia.org/wiki/Pigeonhole_principle
This function is impossible to write in generality since hash functions aren't
one-to-one.
-- John
On Dec 5, 2014, at 4:32 PM, David Koslicki wrote:
> Hello,
>
> Is there a built in function that will undo hash()?
>
> i.e. I am looking for a function "dehash()" such that
> dehash(hash("ACTG
Hello,
Is there a built in function that will undo hash()?
i.e. I am looking for a function "dehash()" such that
dehash(hash("ACTG")) == "ACTG"
I can't seem to find this anywhere (documentation, google, this user group,
etc).
Thanks,
~David
20 matches
Mail list logo