[issue32797] Tracebacks from Cython modules no longer work

2018-08-05 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > The one possible fly in the ointment is if there are use cases that we need to support where a single .so/.pyd file is built from *multiple* source files, as get_source doesn't allow for that. Yes, we must support that. A cython module may have multi

[issue32797] Tracebacks from Cython modules no longer work

2018-08-05 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > So, where is the filename coming from? Python 3.7.0 (default, Jul 23 2018, 10:07:21) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from sage.all

[issue32797] Tracebacks from Cython modules no longer work

2018-08-05 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > Now, though, we need a way for SageMath to get it working on releases up to > and including 3.7, *without* any help from Cython or CPython That's not really what I need. We already have a work-around in SageMath: try: from importlib.machinery

[issue32797] Tracebacks from Cython modules no longer work

2018-08-05 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > One other thing I will say is that the PEP 302 APIs were seemingly designed > for working with Python source and were not necessarily oriented towards > alternative code representations that were executed That's also my impression. I'm perfe

[issue32797] Tracebacks from Cython modules no longer work

2018-08-05 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > If you're OK with that, I don't see the problem. I recall that we already agreed several months ago, when I submitted https://github.com/python/cpython/pull/6653 -- ___ Python tracker <https://bugs.pyth

[issue32797] Tracebacks from Cython modules no longer work

2018-08-05 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: I just realized the existence of https://docs.python.org/3/library/importlib.html#importlib.abc.ResourceReader Isn't this *exactly* what we need here? -- ___ Python tracker <https://bugs.python.org/issue32

[issue32797] Tracebacks from Cython modules no longer work

2018-08-05 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: I think I could make linecache use the ResourceReader API to find the source file instead of looking in the sys.path entries. However, that's orthogonal to PR 6653: we still need PR 6653 to continue looking for the source file in the first place

[issue29502] Should PyObject_Call() call the profiler on C functions, use C_TRACE() macro?

2018-08-02 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: I would prefer to wait with this issue until PEP 580 has been decided. If it's accepted, it will change function calling code a lot. As I wrote in PEP 580, I was planning to have a PEP on profiling anyway

[issue29259] Add tp_fastcall to PyTypeObject: support FASTCALL calling convention for all callable objects

2018-07-31 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > For compatibility with extensions built with older Pythons you should define > new type flag and read tp_fastcall only if the flag is set. Can you comment on https://github.com/python/cpython/pull/4944 why you think that such compatibility

[issue32797] Tracebacks from Cython modules no longer work

2018-08-06 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > Or, define a new "get_sourcemap()" method that could return additional > metadata, e.g. a line number mapping. What's your use case for that? I am asking because it might make more sense for get_source() (or whatever its replaceme

[issue32797] Tracebacks from Cython modules no longer work

2018-08-06 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > I would love, for example, to be able to get stack traces within extension > modules integrated into Python tracebacks if they are compiled with some > appropriate debug flags. Awesome idea, I want to work on that :-) This should be possi

[issue34126] Profiling dict.get() crashes Python

2018-07-16 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: Simple patch coming up... -- ___ Python tracker <https://bugs.python.org/issue34126> ___ ___ Python-bugs-list mailin

[issue34126] Profiling certain invalid calls crash Python

2018-07-16 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: Also: >>> import sys; sys.setprofile(lambda *args: None) >>> dict.get([], "foo") Segmentation fault -- title: Profiling dict.get() crashes Python -> Profiling cert

[issue34126] Profiling dict.get() crashes Python

2018-07-16 Thread Jeroen Demeyer
New submission from Jeroen Demeyer : >>> import sys; sys.setprofile(lambda *args:None) >>> dict.get() Segmentation fault -- components: Interpreter Core messages: 321747 nosy: jdemeyer priority: normal severity: normal status: open title: Profiling dict.get() crash

[issue34125] Profiling depends on whether **kwargs is given

2018-07-16 Thread Jeroen Demeyer
New submission from Jeroen Demeyer : Enable profiling for C functions: >>> def prof(frame, typ, arg): ... if typ.startswith("c_"): ... print(arg, typ) >>> import sys; sys.setprofile(prof) and notice how profiling depends on **kwargs:

[issue34126] Profiling certain invalid calls crash Python

2018-07-16 Thread Jeroen Demeyer
Change by Jeroen Demeyer : -- keywords: +patch pull_requests: +7834 stage: -> patch review ___ Python tracker <https://bugs.python.org/issue34126> ___ ___ Py

[issue34190] x86 Gentoo Refleaks 3.x: test_sys_setprofile leaked [1, 1, 1] references, sum=3

2018-07-23 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: Just to avoid duplicate work: I can have a look at this. -- ___ Python tracker <https://bugs.python.org/issue34190> ___ ___

[issue34190] x86 Gentoo Refleaks 3.x: test_sys_setprofile leaked [1, 1, 1] references, sum=3

2018-07-23 Thread Jeroen Demeyer
Change by Jeroen Demeyer : -- keywords: +patch pull_requests: +7939 stage: -> patch review ___ Python tracker <https://bugs.python.org/issue34190> ___ ___ Py

[issue34125] Profiling depends on whether **kwargs is given

2018-07-23 Thread Jeroen Demeyer
Change by Jeroen Demeyer : -- keywords: +patch pull_requests: +7942 stage: -> patch review ___ Python tracker <https://bugs.python.org/issue34125> ___ ___ Py

[issue25750] tp_descr_get(self, obj, type) is called without owning a reference to "self"

2018-09-06 Thread Jeroen Demeyer
Change by Jeroen Demeyer : -- pull_requests: +8542 ___ Python tracker <https://bugs.python.org/issue25750> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue19660] decorator syntax: allow testlist instead of just dotted_name

2018-07-09 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: Real world example where this actually came up: https://github.com/jupyter-widgets/ipywidgets/issues/430#issuecomment-247016263 -- nosy: +jdemeyer ___ Python tracker <https://bugs.python.org/issue19

[issue30537] Using PyNumber_AsSsize_t in itertools.islice

2018-01-12 Thread Jeroen Demeyer
Jeroen Demeyer <jdeme...@cage.ugent.be> added the comment: Just to note that this bug affects SageMath too: https://trac.sagemath.org/ticket/24528 Thanks for fixing! -- nosy: +jdemeyer ___ Python tracker <rep...@bugs.python.or

[issue30071] Duck-typing inspect.isfunction()

2018-04-04 Thread Jeroen Demeyer
Jeroen Demeyer <j.deme...@ugent.be> added the comment: Superseded by https://www.python.org/dev/peps/pep-0575/ -- stage: test needed -> resolved status: open -> closed ___ Python tracker <rep...@bugs.python.org> <https://bugs.

[issue33222] Various test failures if PYTHONUSERBASE is not canonicalized

2018-04-04 Thread Jeroen Demeyer
New submission from Jeroen Demeyer <j.deme...@ugent.be>: Setting PYTHONUSERBASE=/tmp/x/.. causes the Python test suite to fail: == FAIL: test_user_similar (test.test_sysconfig.TestSys

[issue33261] inspect.isgeneratorfunction fails on hand-created methods

2018-04-11 Thread Jeroen Demeyer
New submission from Jeroen Demeyer <j.deme...@ugent.be>: The inspect functions isgeneratorfunction, iscoroutinefunction, isasyncgenfunction can fail on methods that do not have a __code__ attribute: >>> from types import MethodType >>> class Callable: ...

[issue33261] inspect.isgeneratorfunction fails on hand-created methods

2018-04-11 Thread Jeroen Demeyer
Change by Jeroen Demeyer <j.deme...@ugent.be>: -- keywords: +patch pull_requests: +6144 stage: -> patch review ___ Python tracker <rep...@bugs.python.org> <https://bugs.pyt

[issue13585] Add contextlib.ExitStack

2018-04-11 Thread Jeroen Demeyer
Jeroen Demeyer <j.deme...@ugent.be> added the comment: Why this? _exit_wrapper.__self__ = cm It seems that you are trying to create something which is exactly like a method except that it's not a method. Is there any reason to not use an actual method? It would actually simplify the co

[issue33261] inspect.isgeneratorfunction fails on hand-created methods

2018-04-14 Thread Jeroen Demeyer
Jeroen Demeyer <j.deme...@ugent.be> added the comment: > I am wondering if, instead, the bug is in m, the object returned by > MethodType, or in attribute lookup thereupon. What would you expect m.__code__ to return then? Methods support arbitrary callables and certainly not a

[issue33261] inspect.isgeneratorfunction fails on hand-created methods

2018-04-14 Thread Jeroen Demeyer
Jeroen Demeyer <j.deme...@ugent.be> added the comment: The only attributes that a method is guaranteed to have are __func__ and __self__ (which are the arguments passed to the MethodType constructor). All other attributes are looked up through __func__ by the C version of the following

[issue13585] Add contextlib.ExitStack

2018-04-11 Thread Jeroen Demeyer
Jeroen Demeyer <j.deme...@ugent.be> added the comment: Follow-up: https://bugs.python.org/issue33265 -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python

[issue33265] contextlib.ExitStack abuses __self__

2018-04-11 Thread Jeroen Demeyer
New submission from Jeroen Demeyer <j.deme...@ugent.be>: In contextlib, there is code which roughly looks like def _exit_wrapper(exc_type, exc, tb): return cm_exit(cm, exc_type, exc, tb) _exit_wrapper.__self__ = cm This creates a new function _exit_wrappe

[issue33265] contextlib.ExitStack abuses __self__

2018-04-11 Thread Jeroen Demeyer
Change by Jeroen Demeyer <j.deme...@ugent.be>: -- keywords: +patch pull_requests: +6151 stage: -> patch review ___ Python tracker <rep...@bugs.python.org> <https://bugs.pyt

[issue33261] inspect.isgeneratorfunction fails on hand-created methods

2018-04-14 Thread Jeroen Demeyer
Jeroen Demeyer <j.deme...@ugent.be> added the comment: > I would like python_coded_callable.__code__ to be the code object executed > when python_coded_callable is called First of all, that doesn't answer the question of what to do with non-Python coded callables where there is

[issue33261] inspect.isgeneratorfunction fails on hand-created methods

2018-04-18 Thread Jeroen Demeyer
Jeroen Demeyer <j.deme...@ugent.be> added the comment: Can we please go back to the original issue? If you think that __code__ should be an alias for __call__.__code__, that is a different issue. On https://github.com/python/cpython/pull/6448#issuecomment-381507329 I posted an alter

[issue32797] Tracebacks from Cython modules no longer work

2018-04-24 Thread Jeroen Demeyer
Jeroen Demeyer <j.deme...@ugent.be> added the comment: Alternatively, instead of not implementing `loader.get_source()`, we could define a new semantic: if it returns `NotImplemented`, assume that the loader doesn't know how to get the sources. In that case, linecache would fal

[issue11566] hypot define in pyconfig.h clashes with g++'s cmath

2018-03-20 Thread Jeroen Demeyer
Change by Jeroen Demeyer <j.deme...@ugent.be>: -- nosy: +jdemeyer ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue11566> ___ _

[issue30071] Duck-typing inspect.isfunction()

2018-03-22 Thread Jeroen Demeyer
Jeroen Demeyer <j.deme...@ugent.be> added the comment: See https://mail.python.org/pipermail/python-ideas/2018-March/049398.html -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python

[issue33939] Raise TypeError in __length_hint__ for consistently infinite iterators

2018-06-22 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: Why TypeError? Wouldn't OverflowError be more suitable? -- nosy: +jdemeyer ___ Python tracker <https://bugs.python.org/issue33

[issue33939] Raise TypeError in __length_hint__ for consistently infinite iterators

2018-06-22 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: I'm certainly in favor of adding a way for infinite iterators to state that they are infinite. I just feel like TypeError is way overused in Python. I also disagree that it's a category error: it's an iterable, so it does make sense to ask for its length

[issue33939] Raise TypeError in __length_hint__ for consistently infinite iterators

2018-06-22 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: By the way, there is a precedent for OverflowError on infinities: >>> import math >>> int(math.inf) Traceback (most recent call last): File "", line 1, in OverflowError: cannot conve

[issue25750] tp_descr_get(self, obj, type) is called without owning a reference to "self"

2018-11-05 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > Is it necessary to use METH_FASTCALL? In Python 3, the bug only occurs with METH_FASTCALL. The issue is a reference counting bug and the temporary tuple used for a METH_VARARGS method avoids the

[issue34751] Hash collisions for tuples

2018-11-05 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: Many thanks! -- ___ Python tracker <https://bugs.python.org/issue34751> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue34751] Hash collisions for tuples

2018-10-04 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > Taking an algorithm in wide use that's already known to get a top score on > SMHasher and fiddling it to make a "slight" improvement in one tiny Python > test doesn't make sense to me. OK, I won't do that. The difference is n

[issue25592] distutils docs: data_files always uses sys.prefix

2018-10-10 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > Did you try with a minimal project containing a C extension? > Did you install in a system where sys.prefix != sys.exec_prefix? Yes to both questions. -- ___ Python tracker <https://bugs.python.org/i

[issue34751] Hash collisions for tuples

2018-10-09 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > Changes initialization to add in the length: What's the rationale for that change? You always asked me to stay as close as possible to the "official" hash function which adds in the length at the end. Is there an actual benef

[issue34751] Hash collisions for tuples

2018-10-09 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: I pushed an update at PR 9471. I think I took into account all your comments, except for moving the length addition from the end to the begin of the function. -- ___ Python tracker <https://bugs.python.

[issue32797] Tracebacks from Cython modules no longer work

2018-10-09 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: I pushed a documentation-only patch on PR 9540 to better document status quo. Can somebody please review either PR 6653 or PR 9540? -- ___ Python tracker <https://bugs.python.org/issue32

[issue25592] distutils docs: data_files always uses sys.prefix

2018-10-09 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: There is also commit fa2f4b6d8e297eda09d8ee52dc4a3600b7d458e7 Author: Greg Ward Date: Sat Jun 24 17:22:39 2000 + Changed the default installation directory for data files (used by the "install_data" command to the installation b

[issue25592] distutils docs: data_files always uses sys.prefix

2018-10-09 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: Well, I did try it on a minimal Python project. I also read the distutils sources and understood why it installs data_files in sys.prefix by default. So what more do you need to be convinced? -- ___ Python

[issue25592] distutils docs: data_files always uses sys.prefix

2018-10-09 Thread Jeroen Demeyer
Change by Jeroen Demeyer : -- pull_requests: +9154 ___ Python tracker <https://bugs.python.org/issue25592> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue33261] inspect.isgeneratorfunction fails on hand-created methods

2018-10-09 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: Can somebody please review PR 6448? -- ___ Python tracker <https://bugs.python.org/issue33261> ___ ___ Python-bugs-list mailin

[issue25592] distutils docs: data_files always uses sys.prefix

2018-10-09 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > If you’re not sure about the reason for that sentence, I think you should not > remove it from the docs If the docs are wrong, their history doesn't matter that much: the docs should be fixed regardless. > test the conditions (package with

[issue25592] distutils docs: data_files always uses sys.prefix

2018-10-09 Thread Jeroen Demeyer
Change by Jeroen Demeyer : Removed file: https://bugs.python.org/file40993/data_files_doc.patch ___ Python tracker <https://bugs.python.org/issue25592> ___ ___ Python-bug

[issue25592] distutils docs: data_files always uses sys.prefix

2018-10-09 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: Just for fun, let's look at the history. That piece of documentation goes back to commit 632bda3aa06879396561dde5ed3d93ee8fb8900c Author: Fred Drake Date: Fri Mar 8 22:02:06 2002 + Add more explanation of how data_files is used (esp. where

[issue34751] Hash collisions for tuples

2018-10-10 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > it will typically change only the last two bits of the final result Which is great if all that you care about is avoiding collisions. -- ___ Python tracker <https://bugs.python.org/issu

[issue34751] Hash collisions for tuples

2018-10-02 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: SeaHash seems to be designed for 64 bits. I'm guessing that replacing the shifts by x ^= ((x >> 16) >> (x >> 29)) would be what you'd do for a 32-bit hash. Alternatively, we could always compute the hash with 64 bits (using uint64_t)

[issue34751] Hash collisions for tuples

2018-10-02 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: Correction: the FNV variant of SeaHash only fails the new testsuite, not the old one. The DJB variant of SeaHash fails both. -- ___ Python tracker <https://bugs.python.org/issue34

[issue34751] Hash collisions for tuples

2018-10-02 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > 100% pure SeaHash does x ^= t at the start first, instead of `t ^ (t << 1)` > on the RHS. Indeed. Some initial testing shows that this kind of "input mangling" (applying such a permutation on the inputs) actually plays a much more im

[issue34751] Hash collisions for tuples

2018-10-02 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > I've noted before, e.g., that sticking to a prime eliminates a world of > regular bit patterns in the multiplier. Why do you think this? 0x1fff is prime :-) Having regular bit patterns and being prime are independent properties. To be

[issue34751] Hash collisions for tuples

2018-10-02 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > For that reason, I've only been looking at those that scored 10 (best > possible) on Appleby's SMHasher[1] test suite, which is used by everyone who > does recognized work in this field. So it seems that this SMHasher test suite doesn't catch th

[issue34751] Hash collisions for tuples

2018-10-02 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > For that reason, I've only been looking at those that scored 10 (best > possible) on Appleby's SMHasher[1] test suite Do you have a list of such hash functions? -- ___ Python tracker <https://bugs.p

[issue34751] Hash collisions for tuples

2018-10-02 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: >>> from itertools import product >>> len(set(map(hash, product([0.5, 0.25], repeat=20 32 Good catch! Would you like me to add this to the testsuite? -- ___ Python tracker <https://bugs.

[issue34751] Hash collisions for tuples

2018-09-28 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > Replacing DJBX33A's multiplier of 33 is also a different algorithm. So is > working with inputs other than unsigned bytes. I would argue that this is just extending the parameters of the algorithm. If the algorithm is general enough, then that sho

[issue34751] Hash collisions for tuples

2018-10-02 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > the author wants this transformation to be easily invertible, so a prime is > necessary A multiplication by any odd number modulo 2**64 is invertible. As I argued before, the concept of primes is meaningless (except for the prime 2) when computing

[issue34751] Hash collisions for tuples

2018-10-02 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: This weekend I realized something important which I didn't realize before: some hash functions which I assumed to be good (i.e. small chance of collisions between any given two tuples) turned out to often fail the tests. This is because you don't just want

[issue34751] Hash collisions for tuples

2018-10-04 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > Note: I'm assuming that by "PRIME32_2" you mean 2246822519U Yes indeed. > and that "MULTIPLIER" means 2654435761U. No, I mean a randomly chosen multiplier which is 3 mod 8. --

[issue34751] Hash collisions for tuples

2018-10-04 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > In the 64-bit build there are no collisions across my tests except for 11 in > the new tuple test. That's pretty bad actually. With 64 bits, you statistically expect something in the order of 10**-8 collisions. So what you're seeing is 9

[issue34751] Hash collisions for tuples

2018-10-04 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > people already wrote substantial test suites dedicated to that sole purpose, > and we should aim to be "mere consumers" of functions that pass _those_ tests. There are hash functions that pass those tests which are still bad in practice wh

[issue34751] Hash collisions for tuples

2018-10-04 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > Taking an algorithm in wide use that's already known to get a top score on > SMHasher and fiddling it to make a "slight" improvement in one tiny Python > test doesn't make sense to me. What I'm doing is the most innocent change: ju

[issue34751] Hash collisions for tuples

2018-10-04 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > I've posted several SeaHash cores that suffer no collisions at all in any of > our tests (including across every "bad example" in these 100+ messages), > except for "the new" tuple test. Which it also passed, most

[issue34751] Hash collisions for tuples

2018-10-03 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: A (simplified and slightly modified version of) xxHash seems to work very well, much better than SeaHash. Just like SeaHash, xxHash also works in parallel. But I'm not doing that and just using this for the loop: for y in t: y ^= y * (PRIME32_2

[issue34751] Hash collisions for tuples

2018-10-03 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > I know of no such hash functions short of crypto-strength ones. Being crypto-strength and having few collisions statistically are different properties. For non-crypto hash functions it's typically very easy to generate collisions once you k

[issue34751] Hash collisions for tuples

2018-10-03 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: I'm having a look at xxHash, the second-fastest hash mentioned on https://docs.rs/seahash/3.0.5/seahash/ -- ___ Python tracker <https://bugs.python.org/issue34

[issue34751] Hash collisions for tuples

2018-10-07 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: I updated PR 9471 with a tuple hash function based on xxHash. The only change w.r.t. the official xxHash specification is that I'm not using parallellism and just using 1 accumulator. Please have a look

[issue34751] Hash collisions for tuples

2018-09-20 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > Since I don't believe we've had other reports of real-life pathologies since You just did in this bug report. Why doesn't that count? -- ___ Python tracker <https://bugs.python.org/issu

[issue34751] Hash collisions for tuples

2018-09-21 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: To come back to the topic of hashing, with "Bernstein hash" for tuples I did indeed mean the simple (in Python pseudocode): # t is a tuple h = INITIAL_VALUE for x in t: z = hash(x) h = (h * MULTIPLIER) + z return h I actually start

[issue34751] Hash collisions for tuples

2018-09-21 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: For the record: my collision is not contrived. It came up in actual code where I was hashing some custom class using tuples and found an unexpected high number of collisions, which I eventually traced back to the collision I reported here. By the way, I

[issue34751] Hash collisions for tuples

2018-09-20 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > ISTM, you're being somewhat aggressive and condescending. Interestingly, I felt the same with your reply, which explains my rapid reversion of your closing of the ticket. The way I see it: I pointed out a bug in tuple hashing and you just clo

[issue34751] Hash collisions for tuples

2018-09-21 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > So, jdemeyer, if it's possible to show (or describe) to us an example of a > problem you had, such that we could repeat it, that would be helpful (and > necessary) to evaluate any proposed changes. What were the inputs to hash() > that caus

[issue34751] Hash collisions for tuples

2018-09-21 Thread Jeroen Demeyer
Change by Jeroen Demeyer : -- keywords: +patch pull_requests: +8884 stage: resolved -> patch review ___ Python tracker <https://bugs.python.org/issu

[issue34751] Hash collisions for tuples

2018-09-21 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: I pushed a minor update to the PR, changing the MULTIPLIER and explaining the chosen constants. -- ___ Python tracker <https://bugs.python.org/issue34

[issue34751] Hash collisions for tuples

2018-09-21 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: I should also make it clear that the collision hash((1,0,0)) == hash((1,-2,-2)) that I reported is due to the algorithm, it's not due to some bad luck that 2 numbers happen to be the same. There are also many more similar hash collisions for tuples (all

[issue34751] Hash collisions for tuples

2018-09-21 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > Further, that example specifically exploits that "hash(-1) == hash(-2)" No, it does not. There is no hashing of -1 involved in the example hash((1,0,0)) == hash((1,-2,-2)). -- ___ Python tr

[issue34751] Hash collisions for tuples

2018-09-21 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: Concerning the MULTIPLIER: > Why do you claim the original was "too small"? Too small for what purpose? If the multiplier is too small, then the resulting hash values are small too. This causes collisions to appear for smaller numbers:

[issue34751] Hash collisions for tuples

2018-09-21 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > FWIW, the current algorithm also has some advantages that we don't want to > lose. In particular, the combination of the int hash and tuple hash are good > at producing distinct values for non-negative numbers: >>> from iterto

[issue34751] Hash collisions for tuples

2018-09-24 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > stuff like "t += t >> 16" is a many-to-one function, not a permutation Yes, I am aware of that. However, the number of collisions here is really quite small. It's very unlikely to hit one by accident. I also chose >> over <<

[issue34751] Hash collisions for tuples

2018-09-24 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: While writing up the analysis above, it occurred to me that collisions already happen for 2-tuples: >>> hash((3, -2)) == hash((-3, 0)) True These kind of 2-tuples of small integers don't look contrived at all. I can easily see them

[issue34751] Hash collisions for tuples

2018-09-24 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > Has anyone figured out the real source of the degeneration when mixing in > negative integers? The underlying reason for the collisions is the following mathematical relation: x ^ -x = -2 << i where i is the index of the smallest

[issue34751] Hash collisions for tuples

2018-09-22 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > Which was your original suggestion. Which you appear to be opposed to now? > I'm unclear about why, if so. I'm not strictly opposed to that. It's just that I have less confidence in the current ad-hoc hash compared to something following the D

[issue34751] Hash collisions for tuples

2018-09-22 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > I don't know that primes are important here, but neither do I know that > they're _not_ important here. Hashes are effectively computed modulo 2**N. "Primes" are meaningless in that setting (except for the prime 2 which does have a meani

[issue34751] Hash collisions for tuples

2018-09-24 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > In the absence of a real analysis, the intuition is simply that "t ^= t << 7" > will clear masses of leading sign bits when hashing "small" negative integers. That's a clever solution. If you want to go that route, I would r

[issue34751] Hash collisions for tuples

2018-09-24 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > For example, FNV-1a has far better "avalanche" behavior than Bernstein We don't care about that though. We want to have no collisions, not a random output. -- ___ Python tracker <https:/

[issue34751] Hash collisions for tuples

2018-09-22 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > I'm not aware of any research papers about picking multipliers in this > context, but would love to see one. The only real condition that I can think of is that the order should be large: we do not want MULTIPLIER**n = 1 (mod 2**N) for a small nu

[issue34751] Hash collisions for tuples

2018-09-24 Thread Jeroen Demeyer
Change by Jeroen Demeyer : -- pull_requests: +8937 ___ Python tracker <https://bugs.python.org/issue34751> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue34751] Hash collisions for tuples

2018-09-24 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: I created a new PR based on Tim's t ^= t << 7 idea, except that I'm using << 1 instead, to have more mixing in the lower bits. With the standard FNV multiplier on 64 bits, I did get collisions while testing. I haven't figured out exactly why th

[issue32797] Tracebacks from Cython modules no longer work

2018-09-24 Thread Jeroen Demeyer
Change by Jeroen Demeyer : -- pull_requests: +8942 ___ Python tracker <https://bugs.python.org/issue32797> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue34751] Hash collisions for tuples

2018-09-26 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > Do you believe any other multiplier would work better toward that end? Absolutely. Ideally, the multiplier should just be a random 64-bit number. -- ___ Python tracker <https://bugs.python.org/issu

[issue34751] Hash collisions for tuples

2018-09-26 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > please restore the original tuple hash test. Sure. I wasn't sure what to do and was I afraid that having 2 tests for tuple hashes would be too much. If that's OK for you, then surely I will restore the t

[issue34751] Hash collisions for tuples

2018-09-26 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > The two-liner above with the xor in the second line is exactly Bernstein 33A, > followed by a permutation of 33A's _output_ space. Not output space, but internal state (I assume that you do that operation inside the loop). It's replacing D

[issue34751] Hash collisions for tuples

2018-09-25 Thread Jeroen Demeyer
Jeroen Demeyer added the comment: > When testing what, specifically? And the standard 32-bit FNV multiplier, or > the standard 64-bit FNV multiplier? FNV-1a with the t ^= 2 * t mangling running my new testsuite on either PR 9471 or PR 9534 using the 64-bit FNV multiplier to produce

<    1   2   3   4   5   6   >