[issue14621] Hash function is not randomized properly

2016-04-22 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : -- Removed message: http://bugs.python.org/msg263978 ___ Python tracker ___

[issue14621] Hash function is not randomized properly

2016-04-22 Thread SilentGhost
Changes by SilentGhost : -- nosy: +Arfrever, Bob.Ziuchkovski, Giovanni.Bajo, PaulMcMillan, ReneSac, Vlado.Boza, alex, arigo, benjamin.peterson, bkabrda, camara, christian.heimes, cvrebert, dmalcolm, dstufft, gregory.p.smith, haypo, iElectric, isoschiz, konk,

[issue14621] Hash function is not randomized properly

2016-04-22 Thread lissacoffey
lissacoffey added the comment: How about Python grows a additional btree implementation in its collections module? I know that it's not going to fix existing code. However in the long run it's the best safeguard against hash collision attacks. I'm thinking about a simple, self balancing btree

[issue14621] Hash function is not randomized properly

2013-12-09 Thread Nick Coghlan
Nick Coghlan added the comment: This issue has belatedly had a CVE assigned: CVE-2013-7040 (CPython hash secret can be recovered remotely) 3.4 is not affected (due to PEP 456), but 3.3 and 2.7 are still affected. -- nosy: +bkabrda status: pending - open versions: +Python 2.7, Python

[issue14621] Hash function is not randomized properly

2013-12-09 Thread Benjamin Peterson
Benjamin Peterson added the comment: I think that's just WONTFIX at this point. -- resolution: - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621 ___

[issue14621] Hash function is not randomized properly

2013-11-20 Thread Christian Heimes
Christian Heimes added the comment: The issue has been solved for Python 3.4 with the integration of PEP 456. -- stage: - committed/rejected status: open - pending ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621

[issue14621] Hash function is not randomized properly

2013-06-01 Thread Donald Stufft
Changes by Donald Stufft donald.stu...@gmail.com: -- nosy: +dstufft ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621 ___ ___ Python-bugs-list

[issue14621] Hash function is not randomized properly

2013-04-20 Thread Martin Morrison
Changes by Martin Morrison m...@ensoft.co.uk: -- nosy: +isoschiz, pconnell ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621 ___ ___

[issue14621] Hash function is not randomized properly

2013-01-02 Thread Giovanni Bajo
Giovanni Bajo added the comment: Il giorno 02/gen/2013, alle ore 00:20, Domen Kožar rep...@bugs.python.org ha scritto: Domen Kožar added the comment: According to talk at 29c3: http://events.ccc.de/congress/2012/Fahrplan/events/5152.en.html Quote: We also describe a vulnerability of

[issue14621] Hash function is not randomized properly

2013-01-02 Thread Giovanni Bajo
Giovanni Bajo added the comment: Il giorno 02/gen/2013, alle ore 06:52, Christian Heimes rep...@bugs.python.org ha scritto: Christian Heimes added the comment: Thanks for the information! I'm working on a PEP for the issue at hand. Since you're collecting ideas on this, I would like to

[issue14621] Hash function is not randomized properly

2013-01-02 Thread Benjamin Peterson
Benjamin Peterson added the comment: 2013/1/2 Giovanni Bajo rep...@bugs.python.org: Giovanni Bajo added the comment: Il giorno 02/gen/2013, alle ore 06:52, Christian Heimes rep...@bugs.python.org ha scritto: Christian Heimes added the comment: Thanks for the information! I'm working

[issue14621] Hash function is not randomized properly

2013-01-02 Thread Christian Heimes
Christian Heimes added the comment: Giovanni, why do you think that hashing of unicode strings is slower than byte strings? First of all ASCII only unicode strings are packed and use just one byte per char. CPython's FNV implementation processes one element in each cycle, that is one byte

[issue14621] Hash function is not randomized properly

2013-01-02 Thread Giovanni Bajo
Giovanni Bajo added the comment: Il giorno 02/gen/2013, alle ore 19:51, Christian Heimes rep...@bugs.python.org ha scritto: Christian Heimes added the comment: Giovanni, why do you think that hashing of unicode strings is slower than byte strings? First of all ASCII only unicode

[issue14621] Hash function is not randomized properly

2013-01-02 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: See microbenchmark results in issue16427. Really hashing of ASCII/UCS1 strings more than 2x slower than bytes hashing. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621

[issue14621] Hash function is not randomized properly

2013-01-01 Thread Domen Kožar
Domen Kožar added the comment: According to talk at 29c3: http://events.ccc.de/congress/2012/Fahrplan/events/5152.en.html Quote: We also describe a vulnerability of Python's new randomized hash, allowing an attacker to easily recover the 128-bit secret seed. As a reliable fix to

[issue14621] Hash function is not randomized properly

2013-01-01 Thread Christian Heimes
Christian Heimes added the comment: Thanks for the information! I'm working on a PEP for the issue at hand. -- assignee: - christian.heimes ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621

[issue14621] Hash function is not randomized properly

2013-01-01 Thread Nick Coghlan
Nick Coghlan added the comment: Bob, the hash invariant isn't a mere implementation detail, it is critical to making hash based data structures work properly - if two equal objects (say the integer zero and the float zero) ever end up in different hash bins, then the uniqueness property of

[issue14621] Hash function is not randomized properly

2012-12-02 Thread Bob Ziuchkovski
Bob Ziuchkovski added the comment: Why not redefine -R to mean use secure hashing algorithms for built-in types? When specified, use hashing algorithms that are secure against denial-of-service and other known attacks, at the possible expense of performance. When not specified, use whatever

[issue14621] Hash function is not randomized properly

2012-11-30 Thread René
René added the comment: Christian Heimes: It has always been trivial to artificially generate collisions for fast hashes designed for hash tables, like MurmurHash. I wouldn't call Murmurhash3 busted because of that, as this was never a design goal. It is a known propriety of this type of

[issue14621] Hash function is not randomized properly

2012-11-30 Thread Christian Heimes
Christian Heimes added the comment: No, Murmur3 *is* busted. Some clever people have found a way to perform a universal multicollision attack, that's a key independent attack. An attacker doesn't need to know the seed for an attack. Collision counting as not a solution for the issue, just a

[issue14621] Hash function is not randomized properly

2012-11-30 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: However a tree data structure is vulnerable to sorted data. May be worth it to have the two hash functions (switchable by interpreter option or environment variable), strong for applications which can be attacked, and fast for applications which run in safe

[issue14621] Hash function is not randomized properly

2012-11-30 Thread René
René added the comment: Serhiy Storchaka: I said a O(log n) data structure, so I was referring to balanced trees, like AVL, RBT or B+-tree. They are not vulnerable to sorted data. The downside is that they need the keys to provide robust comparison methods (like, if z y x, then z x).

[issue14621] Hash function is not randomized properly

2012-11-30 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: René, a balanced tree requires rebalancing on every (or almost every) item for some special (sorted) data sequences. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621

[issue14621] Hash function is not randomized properly

2012-11-30 Thread Michal Petrucha
Michal Petrucha added the comment: On Fri, Nov 30, 2012 at 08:06:32PM +, Serhiy Storchaka wrote: René, a balanced tree requires rebalancing on every (or almost every) item for some special (sorted) data sequences. That's perfectly true and it holds for most unsorted sequences as well --

[issue14621] Hash function is not randomized properly

2012-11-30 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 30.11.2012 21:06, Serhiy Storchaka wrote: Serhiy Storchaka added the comment: René, a balanced tree requires rebalancing on every (or almost every) item for some special (sorted) data sequences. Sure, but that's still O(N*log N) for an attack

[issue14621] Hash function is not randomized properly

2012-11-30 Thread René
René added the comment: Serhiy Storchaka: Yes, but it is still O(log n) worst case. Even in the worst case rebalancing, you only need to walk up/down rotating/spliting every node in your path. As the tree height is guaranteed to be x * log n (x from 1 to 2, depending on the algorithm), the

[issue14621] Hash function is not randomized properly

2012-11-30 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Serhiy Storchaka: Yes, but it is still O(log n) worst case. Even in the worst case rebalancing, you only need to walk up/down rotating/spliting every node in your path. As the tree height is guaranteed to be x * log n (x from 1 to 2, depending on the

[issue14621] Hash function is not randomized properly

2012-11-30 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: try: mapping = {} mapping.max_collisions = 100 mapping.update(source) except CollisionLimitError: return 'no thank you' May be use a more general solution? try: with run_with_timeout(timeout=100, timer=collisions_count):

[issue14621] Hash function is not randomized properly

2012-11-30 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 30.11.2012 22:27, Serhiy Storchaka wrote: Serhiy Storchaka added the comment: try: mapping = {} mapping.max_collisions = 100 mapping.update(source) except CollisionLimitError: return 'no thank you' May be use a more general

[issue14621] Hash function is not randomized properly

2012-11-29 Thread Łukasz Rekucki
Łukasz Rekucki added the comment: Note that a few weeks ago, Ruby has switched from Murmur to SipHash for this exact reason: http://www.ruby-lang.org/en/news/2012/11/09/ruby19-hashdos-cve-2012-5371/ -- nosy: +Łukasz.Rekucki ___ Python tracker

[issue14621] Hash function is not randomized properly

2012-11-11 Thread STINNER Victor
STINNER Victor added the comment: Did qomeone start to write a PEP? Le 11 nov. 2012 05:56, Chris Rebert rep...@bugs.python.org a écrit : Chris Rebert added the comment: What about CityHash? (http://code.google.com/p/cityhash/ ; unofficial C port: http://code.google.com/p/cityhash-c/ )

[issue14621] Hash function is not randomized properly

2012-11-11 Thread Giovanni Bajo
Giovanni Bajo added the comment: Il giorno 11/nov/2012, alle ore 05:56, Chris Rebert rep...@bugs.python.org ha scritto: Chris Rebert added the comment: What about CityHash? (http://code.google.com/p/cityhash/ ; unofficial C port: http://code.google.com/p/cityhash-c/ ) It's good enough

[issue14621] Hash function is not randomized properly

2012-11-10 Thread Gregory P. Smith
Gregory P. Smith added the comment: People have been posting micro-benchmarks (often run wrong) rather than actual useful benchmarks. Running our real world benchmarks would be more interesting. -- nosy: +gregory.p.smith ___ Python tracker

[issue14621] Hash function is not randomized properly

2012-11-10 Thread Chris Rebert
Chris Rebert added the comment: What about CityHash? (http://code.google.com/p/cityhash/ ; unofficial C port: http://code.google.com/p/cityhash-c/ ) It's good enough for Google... -- nosy: +cvrebert ___ Python tracker rep...@bugs.python.org

[issue14621] Hash function is not randomized properly

2012-11-08 Thread Sasha B
Sasha B added the comment: Ruby uses the Murmur hash for some types (string integer at least): http://murmurhash.googlepages.com/ src: http://stackoverflow.com/a/3270836/1332819 The Perl hash implementation: http://cpansearch.perl.org/src/NWCLARK/perl-5.8.8/hv.c PHP5 hash implementation:

[issue14621] Hash function is not randomized properly

2012-11-08 Thread Christian Heimes
Christian Heimes added the comment: I considered MurMur a year ago, too. Nowadays I don't think it's an option anymore. JPA and DJB have released a C++ program that is able to generate lots of collisions: https://www.131002.net/siphash/ C++ program to find universal (key-independent)

[issue14621] Hash function is not randomized properly

2012-11-08 Thread Christian Heimes
Christian Heimes added the comment: From the header of murmurcollisions.cc: * multicollisions for MurmurHash3 * * MurmurHash3 C++ implementation is available at * http://code.google.com/p/smhasher/wiki/MurmurHash3 * * the function Murmur3Multicollisions finds many different inputs *

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Here's a demo patch (against Python 2.7) which counts hash value collisions and slot collisions. I had posted that in the original ticket where we discussed the hash problem (http://bugs.python.org/issue14621). This avoids issues like attack 1 mentioned

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 07.11.2012 09:34, Marc-Andre Lemburg wrote: Here's a demo patch (against Python 2.7) which counts hash value collisions and slot collisions. I had posted that in the original ticket where we discussed the hash problem

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Giovanni Bajo
Giovanni Bajo added the comment: Until it's broken with a yet-unknown attack, SipHash is a pseudo-random function and as such it does uniformly distribute values across the output space, and never leak any information on the key (the randomized seed). Being designed by cryptographers, it is

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Armin Rigo
Armin Rigo added the comment: Marc-André: estimating the risks of giving up on a valid query for a truly random hash, at an overestimated one billion queries per second, in a 2/3 full dictionary: * for 1000: 4E159 years between mistakes * for 100: 12.9 years between mistakes * for 150: 8E9

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 07.11.2012 12:06, Armin Rigo wrote: Armin Rigo added the comment: Marc-André: estimating the risks of giving up on a valid query for a truly random hash, at an overestimated one billion queries per second, in a 2/3 full dictionary: * for

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Giovanni Bajo
Giovanni Bajo added the comment: Il giorno 07/nov/2012, alle ore 08:40, Serhiy Storchaka rep...@bugs.python.org ha scritto: Serhiy Storchaka added the comment: I tested different kind of strings. $ ./python -m timeit -n 1 -s t = b'a' * 10**8 hash(t) $ ./python -m timeit -n 1 -s t =

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: can you please attach the generated assembly code for the siphash function with your compiler and your optimization flags (that is, the one that produces the above results)? GCC (Ubuntu 4.4.3-4ubuntu5.1) options: -pthread -c -Wno-unused-result -DNDEBUG

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Christian Heimes
Christian Heimes added the comment: Serhiy, the performance of hash() for long strings isn't very relevant for the general performance of a Python program. Short strings dominate. I've modified the timeit to create a new string object every time. for I in 5 10 15 20 30 40 50 60; do echo -ne

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Mark Dickinson
Mark Dickinson added the comment: [MAL] I don't understand why we are only trying to fix the string problem and completely ignore other key types. [Armin] estimating the risks of giving up on a valid query for a truly random hash, at an overestimated one billion queries per second ...

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 07.11.2012 12:55, Mark Dickinson wrote: Mark Dickinson added the comment: [MAL] I don't understand why we are only trying to fix the string problem and completely ignore other key types. [Armin] estimating the risks of giving up on a valid

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Mark Dickinson
Mark Dickinson added the comment: And I'm probably repeating myself too, but: the predictability of (and difficulty of changing of) hashing for numeric types is why I'm strongly opposed to hash collision / slot collision limits: they'd end up disallowing reasonably natural looking Python

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 07.11.2012 13:06, Mark Dickinson wrote: Mark Dickinson added the comment: And I'm probably repeating myself too, but: the predictability of (and difficulty of changing of) hashing for numeric types is why I'm strongly opposed to hash collision /

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: See issue16427. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621 ___ ___ Python-bugs-list mailing list

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Armin Rigo
Armin Rigo added the comment: I won't try to influence the outcome of this discussion, but I'd like to correct myself: in the measures I posted, true randomness is not needed at all. The exact criterion might be hard to pin down, but as a first approximation, we get the same answers as long

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Serhiy, the performance of hash() for long strings isn't very relevant for the general performance of a Python program. It exposes the raw speed of hashing algorithm. It is good as a first estimate, because more real cases require more sophisticated

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Armin Rigo
Armin Rigo added the comment: Wrong, sorry. On a 32-bit Python 2.7, (2**32-1)*n has the same hash -2, for any value of n. Of course if you build a dict containing thousands of such integers as keys, then right now you get unexpectedly bad performance. I wonder if I should open another bug

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Mark Dickinson
Mark Dickinson added the comment: [Armin] You can build a sequence of (long) integers that have all exactly the same hash, but doing that is not as easy as 2**k. Sure it is. The hash for integers is (by design) repeated modulo a number of the form 2**n - 1: we use 2**61 - 1 on 64-bit

[issue14621] Hash function is not randomized properly

2012-11-07 Thread Giovanni Bajo
Giovanni Bajo added the comment: Il giorno 07/nov/2012, alle ore 12:59, Marc-Andre Lemburg rep...@bugs.python.org ha scritto: Marc-Andre Lemburg added the comment: On 07.11.2012 12:55, Mark Dickinson wrote: Mark Dickinson added the comment: [MAL] I don't understand why we are only

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Benjamin Peterson
Benjamin Peterson added the comment: Here's the message that helped convince us to go against collision counting originally: http://mail.python.org/pipermail/python-dev/2012-January/115726.html -- ___ Python tracker rep...@bugs.python.org

[issue14621] Hash function is not randomized properly

2012-11-06 Thread John M Camara
John M Camara added the comment: How about using a secure hash algorithm that's implemented in HW when available. It doesn't eliminate the issue on systems that lack this support but at least it limits the scope of the problem. Of course some testing would need to be done to make sure the

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Alex Gaynor
Changes by Alex Gaynor alex.gay...@gmail.com: -- nosy: +alex ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621 ___ ___ Python-bugs-list mailing

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Christian Heimes
Christian Heimes added the comment: Our hash randomization will always leak some information about the randomization keys. The only way to properly secure our secrets is a cryptographic secure algorithms, for example a crypto hashing function in combination with a message authentication code

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Christian Heimes
Christian Heimes added the comment: I deem hash randomization and collision counting as a poor man's workaround for the actual issue. Perhaps we shouldn't try too hard to fix an unsuitable data type. Hash maps have a known worst case complexity of O(n). A O(log n) algorithm should be used to

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Giovanni Bajo
Giovanni Bajo added the comment: Christian, there are good semi-crypto hash functions that don't leak as bad as Python's own modified FNV hash, without going all the way to HMAC. SipHash has very good collision resistance and doesn't leak anything: https://www.131002.net/siphash/ (notice: they

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Christian Heimes
Christian Heimes added the comment: Thanks! SipHash looks interesting. It's using a XOR + ROT approach with a seed. And it's written by DJB which is usually a good sign. He writes secure code with good quality. Just his coding style tends to be ... unique. :) I'm going to try the algorithm.

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Christian Heimes
Christian Heimes added the comment: I modified crypto_auth() a bit: Py_uhash_t crypto_auth(const unsigned char *in, unsigned long long inlen) ... u64 k0 = _Py_HashSecret.prefix; u64 k1 = _Py_HashSecret.suffix; ... return (Py_uhash_t)b; and replaced the loop in _Py_HashBytes() with a

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Giovanni Bajo
Giovanni Bajo added the comment: For short strings, you might want to have a look at the way you fetch the final partial word from memory. If the string is = 8 bytes, you can fetch the last partial word as an unaligned memory fetch followed by a shift, instead of using a switch like in the

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Christian Heimes
Christian Heimes added the comment: We can explore the various optimization options later. Also unaligned memory address is not allowed on some architectures like SPARC. If somebody likes to play with the algorithm: http://hg.python.org/sandbox/cheimes/shortlog/2cb7e97ca8d0 --

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: $ ./python -m timeit -s x = b'a' * int(1E7) hash(x) Note that hash calculated only once. Add -n 1 option and use a larger data. If somebody likes to play with the algorithm: $ ./python -m timeit -n 1 -s t = 'abcdefgh' * 10**8 hash(t) 1 loops, best of 3:

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Christian Heimes
Christian Heimes added the comment: Thanks to Snakebit I was able to tests the code on a 32bit BSD installation with GCC 4.2. The ASCII unicode and bytes performance is about 8% slower, UCS2 unicode is about 37% slower. There might be room for improvements, though. % ./python -m timeit -r20

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Christian Heimes
Christian Heimes added the comment: Serhiy's trick #define U8TO64_LE(p) ((u64)((u32 *)(p))[0] | ((u64)((u32 *)(p))[1] 32)) gives a nice speedup. Now UCS2 is down to 0.133 usec (12% slower than the current algorithm) and ASCII down to 0.105 usec (3% faster). --

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: % ./python -m timeit -r20 -n100 -s h = hash; x = 'a' * 10**7 -- h(x) Here is only one hash calculation and 99 cached calls. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621

[issue14621] Hash function is not randomized properly

2012-11-06 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I tested different kind of strings. $ ./python -m timeit -n 1 -s t = b'a' * 10**8 hash(t) $ ./python -m timeit -n 1 -s t = 'a' * 10**8 hash(t) $ ./python -m timeit -n 1 -s t = '\u0100' * 10**8 hash(t) $ ./python -m timeit -n 1 -s t = '\U0001' * 10**8

[issue14621] Hash function is not randomized properly

2012-10-22 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 21.10.2012 23:42, STINNER Victor wrote: STINNER Victor added the comment: It's interesting to note how this whole -R discussion made very long threads on python-dev, and python-dev has subsequently ignored (for the past 6 months!) the fact that

[issue14621] Hash function is not randomized properly

2012-10-21 Thread Armin Rigo
Armin Rigo added the comment: Just to make it extra clear: Vlado showed that the -R switch of Python can easily be made fully pointless, with only a bit of extra work. Here is how: * Assume you have an algo that gives you as many strings with colliding hashes as you want, provided you know

[issue14621] Hash function is not randomized properly

2012-10-21 Thread Armin Rigo
Armin Rigo added the comment: For reference, the above means that we can implement -R support for PyPy as a dummy ignored flag, and get security that is very close to CPython's. :-) -- keywords: +easy ___ Python tracker rep...@bugs.python.org

[issue14621] Hash function is not randomized properly

2012-10-21 Thread Benjamin Peterson
Benjamin Peterson added the comment: That doesn't make it any easy CPython issue. :) -- keywords: -easy ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621 ___

[issue14621] Hash function is not randomized properly

2012-10-21 Thread Christian Heimes
Christian Heimes added the comment: As far as my understanding goes the issue can't be solved with our current hash algorithm. We'd have to use a crypto hash function or at least a hash algorithm that has an increased avalanche effect on the outcome. The current hash algorithm is designed and

[issue14621] Hash function is not randomized properly

2012-10-21 Thread STINNER Victor
STINNER Victor added the comment: It's interesting to note how this whole -R discussion made very long threads on python-dev, and python-dev has subsequently ignored (for the past 6 months!) the fact that their fix can be worked around in a matter of minutes. No, this issue has no been

[issue14621] Hash function is not randomized properly

2012-10-21 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: $ ./python -m timeit -s t = 'abcdefgh' * int(1E8) hash(t) I got another numbers (32-bit Linux, AMD Athlon 64 X2 4600+). Python's current hash algorithm: 10 loops, best of 3: 343 msec per loop V8's algorithm: 10 loops, best of 3: 244 msec per loop

[issue14621] Hash function is not randomized properly

2012-10-17 Thread Christian Heimes
Christian Heimes added the comment: I've modified unicodeobject's unicode_hash() function. V8's algorithm is about 55% slower for a 800 MB ASCII string on my box. Python's current hash algorithm for bytes and unicode: while (--len = 0) x = (_PyHASH_MULTIPLIER * x) ^ (Py_uhash_t)

[issue14621] Hash function is not randomized properly

2012-10-11 Thread Christian Heimes
Changes by Christian Heimes li...@cheimes.de: -- nosy: +christian.heimes ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621 ___ ___

[issue14621] Hash function is not randomized properly

2012-04-26 Thread STINNER Victor
STINNER Victor victor.stin...@gmail.com added the comment: One possible fix: ... Main loop looks like this: .. Well, it was decided to not impact performances to workaround one specific class of attack, whereas there are many other ways to DoS Python. So we chose to keep the main loop

[issue14621] Hash function is not randomized properly

2012-04-26 Thread STINNER Victor
Changes by STINNER Victor victor.stin...@gmail.com: Removed file: http://bugs.python.org/file25282/hash.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621 ___

[issue14621] Hash function is not randomized properly

2012-04-26 Thread STINNER Victor
STINNER Victor victor.stin...@gmail.com added the comment: Oops, I attached the wrong hash.py file. -- Added file: http://bugs.python.org/file25375/hash.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621

[issue14621] Hash function is not randomized properly

2012-04-26 Thread Benjamin Peterson
Benjamin Peterson benja...@python.org added the comment: We should see if more security can be gained without sacrificing speed. -- nosy: +benjamin.peterson ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621

[issue14621] Hash function is not randomized properly

2012-04-26 Thread STINNER Victor
STINNER Victor victor.stin...@gmail.com added the comment: Problem with current randomization of hash function is following: Suffix does not influence whether two keys have some hash or not (it is xor-ed after everything). Yes, the suffix is used to protect the secret. Without the suffix, it

[issue14621] Hash function is not randomized properly

2012-04-20 Thread Vlado Boza
Vlado Boza us...@ksp.sk added the comment: One possible fix: Look for StringHasher in google v8 code (http://code.google.com/p/v8/source/search?q=stringhasherorigq=stringhasherbtnG=Search+Trunk). Main loop looks like this: raw_running_hash_ += c;

[issue14621] Hash function is not randomized properly

2012-04-19 Thread Vlado Boza
New submission from Vlado Boza us...@ksp.sk: Fix of this http://bugs.python.org/issue13703 is broken. tl;dr: There only 256 different hash functions (compare it to size of _Py_HashSecret prefix and suffix). And whether keys collide or not depends only on the last 8 bits of prefix. Problem

[issue14621] Hash function is not randomized properly

2012-04-19 Thread Vlado Boza
Vlado Boza us...@ksp.sk added the comment: E.g this strings collide for every prefix ending on 0xcd: 0x27fd5a18, 0x26fe78fa -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621 ___

[issue14621] Hash function is not randomized properly

2012-04-19 Thread Dave Malcolm
Changes by Dave Malcolm dmalc...@redhat.com: -- nosy: +dmalcolm ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621 ___ ___ Python-bugs-list

[issue14621] Hash function is not randomized properly

2012-04-19 Thread Michal Petrucha
Changes by Michal Petrucha michal.petru...@ksp.sk: -- nosy: +konk ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621 ___ ___ Python-bugs-list

[issue14621] Hash function is not randomized properly

2012-04-19 Thread Antoine Pitrou
Changes by Antoine Pitrou pit...@free.fr: -- nosy: +haypo ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621 ___ ___ Python-bugs-list mailing

[issue14621] Hash function is not randomized properly

2012-04-19 Thread Dave Malcolm
Dave Malcolm dmalc...@redhat.com added the comment: Thanks for filing this bug report. I'm not seeing the equal hashes you describe. I'm using this recipe to hardcode a specific prefix and print the hashes using it: $ gdb --eval-command=break _PyRandom_Init --eval-command=run

[issue14621] Hash function is not randomized properly

2012-04-19 Thread Antoine Pitrou
Changes by Antoine Pitrou pit...@free.fr: -- nosy: +PaulMcMillan ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14621 ___ ___ Python-bugs-list

[issue14621] Hash function is not randomized properly

2012-04-19 Thread STINNER Victor
STINNER Victor victor.stin...@gmail.com added the comment: I don't understand this issue: can you write a short script to test a collision? E.g this strings collide for every prefix ending on 0xcd Do you mean that prefix 0xff == 0xcd? 0x27fd5a18, 0x26fe78fa Is it a byte string or an

[issue14621] Hash function is not randomized properly

2012-04-19 Thread Vlado Boza
Vlado Boza us...@ksp.sk added the comment: My bad (I checked only function in C++, not result in python). This should work on 32bit: Prefix: anything ending on 0x00 Suffix: anything Strings: \x00\xcf\x0b\x96\x19, \x00\x6d\x29\x45\x18 -- ___ Python

[issue14621] Hash function is not randomized properly

2012-04-19 Thread Vlado Boza
Vlado Boza us...@ksp.sk added the comment: For example take this script (on 32bit): ha = hash(\x00\xcf\x0b\x96\x19) hb = hash(\x00\x6d\x29\x45\x18) if ha == hb: print collision And run following: for i in `seq 0 25`; do echo $i; for j in `seq 0 100`; do ./python -R x.py; done; done; It

[issue14621] Hash function is not randomized properly

2012-04-19 Thread Dave Malcolm
Dave Malcolm dmalc...@redhat.com added the comment: $ gdb --eval-command=break _PyRandom_Init --eval-command=run --eval-command=print _Py_HashSecret --eval-command=set _Py_HashSecret.prefix=0xcdcdcd00 --eval-command=print _Py_HashSecret --eval-command=continue -eval-command=continue --args

[issue14621] Hash function is not randomized properly

2012-04-19 Thread STINNER Victor
STINNER Victor victor.stin...@gmail.com added the comment: For example take this script (on 32bit): (...) It gives collison too many times (around 9 out of 2500). I tried this script on Linux 32 bits and Linux 64 bits: I didn't see any collision. What is your operating system and the version

[issue14621] Hash function is not randomized properly

2012-04-19 Thread Michal Petrucha
Michal Petrucha michal.petru...@ksp.sk added the comment: @dmalcolm: As for the gdb example, you need to add --eval-command=set _Py_HashSecret_Initialized=1, otherwise _Py_HashSecret will get overwritten immediately after it is set by gdb, either to 0 if run without the -R switch, or to a

[issue14621] Hash function is not randomized properly

2012-04-19 Thread Vlado Boza
Vlado Boza us...@ksp.sk added the comment: I tried this script on Linux 32 bits and Linux 64 bits: I didn't see any collision. What is your operating system and the version of your operating system please? uname -a Linux 3.0.0-17-generic #30-Ubuntu SMP Thu Mar 8 20:45:39 UTC 2012 x86_64

[issue14621] Hash function is not randomized properly

2012-04-19 Thread STINNER Victor
STINNER Victor victor.stin...@gmail.com added the comment: hash.py: Python implementation of the 32-bit version of hash(bytes). Ok, I now see that only the lower 8 bits are really mixed with the input string. -- Added file: http://bugs.python.org/file25282/hash.py

  1   2   >