Jeroen Demeyer added the comment:
> The one possible fly in the ointment is if there are use cases that we
need to support where a single .so/.pyd file is built from *multiple*
source files, as get_source doesn't allow for that.
Yes, we must support that. A cython module may have multi
Jeroen Demeyer added the comment:
> So, where is the filename coming from?
Python 3.7.0 (default, Jul 23 2018, 10:07:21)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from sage.all
Jeroen Demeyer added the comment:
> Now, though, we need a way for SageMath to get it working on releases up to
> and including 3.7, *without* any help from Cython or CPython
That's not really what I need. We already have a work-around in SageMath:
try:
from importlib.machinery
Jeroen Demeyer added the comment:
> One other thing I will say is that the PEP 302 APIs were seemingly designed
> for working with Python source and were not necessarily oriented towards
> alternative code representations that were executed
That's also my impression. I'm perfe
Jeroen Demeyer added the comment:
> If you're OK with that, I don't see the problem.
I recall that we already agreed several months ago, when I submitted
https://github.com/python/cpython/pull/6653
--
___
Python tracker
<https://bugs.pyth
Jeroen Demeyer added the comment:
I just realized the existence of
https://docs.python.org/3/library/importlib.html#importlib.abc.ResourceReader
Isn't this *exactly* what we need here?
--
___
Python tracker
<https://bugs.python.org/issue32
Jeroen Demeyer added the comment:
I think I could make linecache use the ResourceReader API to find the source
file instead of looking in the sys.path entries. However, that's orthogonal to
PR 6653: we still need PR 6653 to continue looking for the source file in the
first place
Jeroen Demeyer added the comment:
I would prefer to wait with this issue until PEP 580 has been decided. If it's
accepted, it will change function calling code a lot. As I wrote in PEP 580, I
was planning to have a PEP on profiling anyway
Jeroen Demeyer added the comment:
> For compatibility with extensions built with older Pythons you should define
> new type flag and read tp_fastcall only if the flag is set.
Can you comment on https://github.com/python/cpython/pull/4944 why you think
that such compatibility
Jeroen Demeyer added the comment:
> Or, define a new "get_sourcemap()" method that could return additional
> metadata, e.g. a line number mapping.
What's your use case for that? I am asking because it might make more sense for
get_source() (or whatever its replaceme
Jeroen Demeyer added the comment:
> I would love, for example, to be able to get stack traces within extension
> modules integrated into Python tracebacks if they are compiled with some
> appropriate debug flags.
Awesome idea, I want to work on that :-)
This should be possi
Jeroen Demeyer added the comment:
Simple patch coming up...
--
___
Python tracker
<https://bugs.python.org/issue34126>
___
___
Python-bugs-list mailin
Jeroen Demeyer added the comment:
Also:
>>> import sys; sys.setprofile(lambda *args: None)
>>> dict.get([], "foo")
Segmentation fault
--
title: Profiling dict.get() crashes Python -> Profiling cert
New submission from Jeroen Demeyer :
>>> import sys; sys.setprofile(lambda *args:None)
>>> dict.get()
Segmentation fault
--
components: Interpreter Core
messages: 321747
nosy: jdemeyer
priority: normal
severity: normal
status: open
title: Profiling dict.get() crash
New submission from Jeroen Demeyer :
Enable profiling for C functions:
>>> def prof(frame, typ, arg):
... if typ.startswith("c_"):
... print(arg, typ)
>>> import sys; sys.setprofile(prof)
and notice how profiling depends on **kwargs:
Change by Jeroen Demeyer :
--
keywords: +patch
pull_requests: +7834
stage: -> patch review
___
Python tracker
<https://bugs.python.org/issue34126>
___
___
Py
Jeroen Demeyer added the comment:
Just to avoid duplicate work: I can have a look at this.
--
___
Python tracker
<https://bugs.python.org/issue34190>
___
___
Change by Jeroen Demeyer :
--
keywords: +patch
pull_requests: +7939
stage: -> patch review
___
Python tracker
<https://bugs.python.org/issue34190>
___
___
Py
Change by Jeroen Demeyer :
--
keywords: +patch
pull_requests: +7942
stage: -> patch review
___
Python tracker
<https://bugs.python.org/issue34125>
___
___
Py
Change by Jeroen Demeyer :
--
pull_requests: +8542
___
Python tracker
<https://bugs.python.org/issue25750>
___
___
Python-bugs-list mailing list
Unsubscribe:
Jeroen Demeyer added the comment:
Real world example where this actually came up:
https://github.com/jupyter-widgets/ipywidgets/issues/430#issuecomment-247016263
--
nosy: +jdemeyer
___
Python tracker
<https://bugs.python.org/issue19
Jeroen Demeyer <jdeme...@cage.ugent.be> added the comment:
Just to note that this bug affects SageMath too:
https://trac.sagemath.org/ticket/24528
Thanks for fixing!
--
nosy: +jdemeyer
___
Python tracker <rep...@bugs.python.or
Jeroen Demeyer <j.deme...@ugent.be> added the comment:
Superseded by https://www.python.org/dev/peps/pep-0575/
--
stage: test needed -> resolved
status: open -> closed
___
Python tracker <rep...@bugs.python.org>
<https://bugs.
New submission from Jeroen Demeyer <j.deme...@ugent.be>:
Setting PYTHONUSERBASE=/tmp/x/.. causes the Python test suite to fail:
==
FAIL: test_user_similar (test.test_sysconfig.TestSys
New submission from Jeroen Demeyer <j.deme...@ugent.be>:
The inspect functions isgeneratorfunction, iscoroutinefunction,
isasyncgenfunction can fail on methods that do not have a __code__ attribute:
>>> from types import MethodType
>>> class Callable:
...
Change by Jeroen Demeyer <j.deme...@ugent.be>:
--
keywords: +patch
pull_requests: +6144
stage: -> patch review
___
Python tracker <rep...@bugs.python.org>
<https://bugs.pyt
Jeroen Demeyer <j.deme...@ugent.be> added the comment:
Why this?
_exit_wrapper.__self__ = cm
It seems that you are trying to create something which is exactly like a method
except that it's not a method. Is there any reason to not use an actual method?
It would actually simplify the co
Jeroen Demeyer <j.deme...@ugent.be> added the comment:
> I am wondering if, instead, the bug is in m, the object returned by
> MethodType, or in attribute lookup thereupon.
What would you expect m.__code__ to return then? Methods support arbitrary
callables and certainly not a
Jeroen Demeyer <j.deme...@ugent.be> added the comment:
The only attributes that a method is guaranteed to have are __func__ and
__self__ (which are the arguments passed to the MethodType constructor).
All other attributes are looked up through __func__ by the C version of the
following
Jeroen Demeyer <j.deme...@ugent.be> added the comment:
Follow-up: https://bugs.python.org/issue33265
--
___
Python tracker <rep...@bugs.python.org>
<https://bugs.python
New submission from Jeroen Demeyer <j.deme...@ugent.be>:
In contextlib, there is code which roughly looks like
def _exit_wrapper(exc_type, exc, tb):
return cm_exit(cm, exc_type, exc, tb)
_exit_wrapper.__self__ = cm
This creates a new function _exit_wrappe
Change by Jeroen Demeyer <j.deme...@ugent.be>:
--
keywords: +patch
pull_requests: +6151
stage: -> patch review
___
Python tracker <rep...@bugs.python.org>
<https://bugs.pyt
Jeroen Demeyer <j.deme...@ugent.be> added the comment:
> I would like python_coded_callable.__code__ to be the code object executed
> when python_coded_callable is called
First of all, that doesn't answer the question of what to do with non-Python
coded callables where there is
Jeroen Demeyer <j.deme...@ugent.be> added the comment:
Can we please go back to the original issue? If you think that __code__ should
be an alias for __call__.__code__, that is a different issue.
On https://github.com/python/cpython/pull/6448#issuecomment-381507329 I posted
an alter
Jeroen Demeyer <j.deme...@ugent.be> added the comment:
Alternatively, instead of not implementing `loader.get_source()`, we could
define a new semantic: if it returns `NotImplemented`, assume that the loader
doesn't know how to get the sources. In that case, linecache would fal
Change by Jeroen Demeyer <j.deme...@ugent.be>:
--
nosy: +jdemeyer
___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue11566>
___
_
Jeroen Demeyer <j.deme...@ugent.be> added the comment:
See https://mail.python.org/pipermail/python-ideas/2018-March/049398.html
--
___
Python tracker <rep...@bugs.python.org>
<https://bugs.python
Jeroen Demeyer added the comment:
Why TypeError? Wouldn't OverflowError be more suitable?
--
nosy: +jdemeyer
___
Python tracker
<https://bugs.python.org/issue33
Jeroen Demeyer added the comment:
I'm certainly in favor of adding a way for infinite iterators to state that
they are infinite. I just feel like TypeError is way overused in Python. I also
disagree that it's a category error: it's an iterable, so it does make sense to
ask for its length
Jeroen Demeyer added the comment:
By the way, there is a precedent for OverflowError on infinities:
>>> import math
>>> int(math.inf)
Traceback (most recent call last):
File "", line 1, in
OverflowError: cannot conve
Jeroen Demeyer added the comment:
> Is it necessary to use METH_FASTCALL?
In Python 3, the bug only occurs with METH_FASTCALL. The issue is a reference
counting bug and the temporary tuple used for a METH_VARARGS method avoids the
Jeroen Demeyer added the comment:
Many thanks!
--
___
Python tracker
<https://bugs.python.org/issue34751>
___
___
Python-bugs-list mailing list
Unsubscribe:
Jeroen Demeyer added the comment:
> Taking an algorithm in wide use that's already known to get a top score on
> SMHasher and fiddling it to make a "slight" improvement in one tiny Python
> test doesn't make sense to me.
OK, I won't do that. The difference is n
Jeroen Demeyer added the comment:
> Did you try with a minimal project containing a C extension?
> Did you install in a system where sys.prefix != sys.exec_prefix?
Yes to both questions.
--
___
Python tracker
<https://bugs.python.org/i
Jeroen Demeyer added the comment:
> Changes initialization to add in the length:
What's the rationale for that change? You always asked me to stay as close as
possible to the "official" hash function which adds in the length at the end.
Is there an actual benef
Jeroen Demeyer added the comment:
I pushed an update at PR 9471. I think I took into account all your comments,
except for moving the length addition from the end to the begin of the function.
--
___
Python tracker
<https://bugs.python.
Jeroen Demeyer added the comment:
I pushed a documentation-only patch on PR 9540 to better document status quo.
Can somebody please review either PR 6653 or PR 9540?
--
___
Python tracker
<https://bugs.python.org/issue32
Jeroen Demeyer added the comment:
There is also
commit fa2f4b6d8e297eda09d8ee52dc4a3600b7d458e7
Author: Greg Ward
Date: Sat Jun 24 17:22:39 2000 +
Changed the default installation directory for data files (used by
the "install_data" command to the installation b
Jeroen Demeyer added the comment:
Well, I did try it on a minimal Python project. I also read the distutils
sources and understood why it installs data_files in sys.prefix by default. So
what more do you need to be convinced?
--
___
Python
Change by Jeroen Demeyer :
--
pull_requests: +9154
___
Python tracker
<https://bugs.python.org/issue25592>
___
___
Python-bugs-list mailing list
Unsubscribe:
Jeroen Demeyer added the comment:
Can somebody please review PR 6448?
--
___
Python tracker
<https://bugs.python.org/issue33261>
___
___
Python-bugs-list mailin
Jeroen Demeyer added the comment:
> If you’re not sure about the reason for that sentence, I think you should not
> remove it from the docs
If the docs are wrong, their history doesn't matter that much: the docs should
be fixed regardless.
> test the conditions (package with
Change by Jeroen Demeyer :
Removed file: https://bugs.python.org/file40993/data_files_doc.patch
___
Python tracker
<https://bugs.python.org/issue25592>
___
___
Python-bug
Jeroen Demeyer added the comment:
Just for fun, let's look at the history. That piece of documentation goes back
to
commit 632bda3aa06879396561dde5ed3d93ee8fb8900c
Author: Fred Drake
Date: Fri Mar 8 22:02:06 2002 +
Add more explanation of how data_files is used (esp. where
Jeroen Demeyer added the comment:
> it will typically change only the last two bits of the final result
Which is great if all that you care about is avoiding collisions.
--
___
Python tracker
<https://bugs.python.org/issu
Jeroen Demeyer added the comment:
SeaHash seems to be designed for 64 bits. I'm guessing that replacing the
shifts by
x ^= ((x >> 16) >> (x >> 29))
would be what you'd do for a 32-bit hash. Alternatively, we could always
compute the hash with 64 bits (using uint64_t)
Jeroen Demeyer added the comment:
Correction: the FNV variant of SeaHash only fails the new testsuite, not the
old one. The DJB variant of SeaHash fails both.
--
___
Python tracker
<https://bugs.python.org/issue34
Jeroen Demeyer added the comment:
> 100% pure SeaHash does x ^= t at the start first, instead of `t ^ (t << 1)`
> on the RHS.
Indeed. Some initial testing shows that this kind of "input mangling" (applying
such a permutation on the inputs) actually plays a much more im
Jeroen Demeyer added the comment:
> I've noted before, e.g., that sticking to a prime eliminates a world of
> regular bit patterns in the multiplier.
Why do you think this? 0x1fff is prime :-)
Having regular bit patterns and being prime are independent properties.
To be
Jeroen Demeyer added the comment:
> For that reason, I've only been looking at those that scored 10 (best
> possible) on Appleby's SMHasher[1] test suite, which is used by everyone who
> does recognized work in this field.
So it seems that this SMHasher test suite doesn't catch th
Jeroen Demeyer added the comment:
> For that reason, I've only been looking at those that scored 10 (best
> possible) on Appleby's SMHasher[1] test suite
Do you have a list of such hash functions?
--
___
Python tracker
<https://bugs.p
Jeroen Demeyer added the comment:
>>> from itertools import product
>>> len(set(map(hash, product([0.5, 0.25], repeat=20
32
Good catch! Would you like me to add this to the testsuite?
--
___
Python tracker
<https://bugs.
Jeroen Demeyer added the comment:
> Replacing DJBX33A's multiplier of 33 is also a different algorithm. So is
> working with inputs other than unsigned bytes.
I would argue that this is just extending the parameters of the algorithm. If
the algorithm is general enough, then that sho
Jeroen Demeyer added the comment:
> the author wants this transformation to be easily invertible, so a prime is
> necessary
A multiplication by any odd number modulo 2**64 is invertible. As I argued
before, the concept of primes is meaningless (except for the prime 2) when
computing
Jeroen Demeyer added the comment:
This weekend I realized something important which I didn't realize before: some
hash functions which I assumed to be good (i.e. small chance of collisions
between any given two tuples) turned out to often fail the tests. This is
because you don't just want
Jeroen Demeyer added the comment:
> Note: I'm assuming that by "PRIME32_2" you mean 2246822519U
Yes indeed.
> and that "MULTIPLIER" means 2654435761U.
No, I mean a randomly chosen multiplier which is 3 mod 8.
--
Jeroen Demeyer added the comment:
> In the 64-bit build there are no collisions across my tests except for 11 in
> the new tuple test.
That's pretty bad actually. With 64 bits, you statistically expect something in
the order of 10**-8 collisions. So what you're seeing is 9
Jeroen Demeyer added the comment:
> people already wrote substantial test suites dedicated to that sole purpose,
> and we should aim to be "mere consumers" of functions that pass _those_ tests.
There are hash functions that pass those tests which are still bad in practice
wh
Jeroen Demeyer added the comment:
> Taking an algorithm in wide use that's already known to get a top score on
> SMHasher and fiddling it to make a "slight" improvement in one tiny Python
> test doesn't make sense to me.
What I'm doing is the most innocent change: ju
Jeroen Demeyer added the comment:
> I've posted several SeaHash cores that suffer no collisions at all in any of
> our tests (including across every "bad example" in these 100+ messages),
> except for "the new" tuple test. Which it also passed, most
Jeroen Demeyer added the comment:
A (simplified and slightly modified version of) xxHash seems to work very well,
much better than SeaHash. Just like SeaHash, xxHash also works in parallel. But
I'm not doing that and just using this for the loop:
for y in t:
y ^= y * (PRIME32_2
Jeroen Demeyer added the comment:
> I know of no such hash functions short of crypto-strength ones.
Being crypto-strength and having few collisions statistically are different
properties.
For non-crypto hash functions it's typically very easy to generate collisions
once you k
Jeroen Demeyer added the comment:
I'm having a look at xxHash, the second-fastest hash mentioned on
https://docs.rs/seahash/3.0.5/seahash/
--
___
Python tracker
<https://bugs.python.org/issue34
Jeroen Demeyer added the comment:
I updated PR 9471 with a tuple hash function based on xxHash. The only change
w.r.t. the official xxHash specification is that I'm not using parallellism and
just using 1 accumulator. Please have a look
Jeroen Demeyer added the comment:
> Since I don't believe we've had other reports of real-life pathologies since
You just did in this bug report. Why doesn't that count?
--
___
Python tracker
<https://bugs.python.org/issu
Jeroen Demeyer added the comment:
To come back to the topic of hashing, with "Bernstein hash" for tuples I did
indeed mean the simple (in Python pseudocode):
# t is a tuple
h = INITIAL_VALUE
for x in t:
z = hash(x)
h = (h * MULTIPLIER) + z
return h
I actually start
Jeroen Demeyer added the comment:
For the record: my collision is not contrived. It came up in actual code where
I was hashing some custom class using tuples and found an unexpected high
number of collisions, which I eventually traced back to the collision I
reported here.
By the way, I
Jeroen Demeyer added the comment:
> ISTM, you're being somewhat aggressive and condescending.
Interestingly, I felt the same with your reply, which explains my rapid
reversion of your closing of the ticket. The way I see it: I pointed out a bug
in tuple hashing and you just clo
Jeroen Demeyer added the comment:
> So, jdemeyer, if it's possible to show (or describe) to us an example of a
> problem you had, such that we could repeat it, that would be helpful (and
> necessary) to evaluate any proposed changes. What were the inputs to hash()
> that caus
Change by Jeroen Demeyer :
--
keywords: +patch
pull_requests: +8884
stage: resolved -> patch review
___
Python tracker
<https://bugs.python.org/issu
Jeroen Demeyer added the comment:
I pushed a minor update to the PR, changing the MULTIPLIER and explaining the
chosen constants.
--
___
Python tracker
<https://bugs.python.org/issue34
Jeroen Demeyer added the comment:
I should also make it clear that the collision hash((1,0,0)) == hash((1,-2,-2))
that I reported is due to the algorithm, it's not due to some bad luck that 2
numbers happen to be the same. There are also many more similar hash collisions
for tuples (all
Jeroen Demeyer added the comment:
> Further, that example specifically exploits that "hash(-1) == hash(-2)"
No, it does not. There is no hashing of -1 involved in the example
hash((1,0,0)) == hash((1,-2,-2)).
--
___
Python tr
Jeroen Demeyer added the comment:
Concerning the MULTIPLIER:
> Why do you claim the original was "too small"? Too small for what purpose?
If the multiplier is too small, then the resulting hash values are small too.
This causes collisions to appear for smaller numbers:
Jeroen Demeyer added the comment:
> FWIW, the current algorithm also has some advantages that we don't want to
> lose. In particular, the combination of the int hash and tuple hash are good
> at producing distinct values for non-negative numbers:
>>> from iterto
Jeroen Demeyer added the comment:
> stuff like "t += t >> 16" is a many-to-one function, not a permutation
Yes, I am aware of that. However, the number of collisions here is really quite
small. It's very unlikely to hit one by accident.
I also chose >> over <<
Jeroen Demeyer added the comment:
While writing up the analysis above, it occurred to me that collisions already
happen for 2-tuples:
>>> hash((3, -2)) == hash((-3, 0))
True
These kind of 2-tuples of small integers don't look contrived at all. I can
easily see them
Jeroen Demeyer added the comment:
> Has anyone figured out the real source of the degeneration when mixing in
> negative integers?
The underlying reason for the collisions is the following mathematical relation:
x ^ -x = -2 << i
where i is the index of the smallest
Jeroen Demeyer added the comment:
> Which was your original suggestion. Which you appear to be opposed to now?
> I'm unclear about why, if so.
I'm not strictly opposed to that. It's just that I have less confidence in the
current ad-hoc hash compared to something following the D
Jeroen Demeyer added the comment:
> I don't know that primes are important here, but neither do I know that
> they're _not_ important here.
Hashes are effectively computed modulo 2**N. "Primes" are meaningless in that
setting (except for the prime 2 which does have a meani
Jeroen Demeyer added the comment:
> In the absence of a real analysis, the intuition is simply that "t ^= t << 7"
> will clear masses of leading sign bits when hashing "small" negative integers.
That's a clever solution. If you want to go that route, I would r
Jeroen Demeyer added the comment:
> For example, FNV-1a has far better "avalanche" behavior than Bernstein
We don't care about that though. We want to have no collisions, not a random
output.
--
___
Python tracker
<https:/
Jeroen Demeyer added the comment:
> I'm not aware of any research papers about picking multipliers in this
> context, but would love to see one.
The only real condition that I can think of is that the order should be large:
we do not want MULTIPLIER**n = 1 (mod 2**N) for a small nu
Change by Jeroen Demeyer :
--
pull_requests: +8937
___
Python tracker
<https://bugs.python.org/issue34751>
___
___
Python-bugs-list mailing list
Unsubscribe:
Jeroen Demeyer added the comment:
I created a new PR based on Tim's t ^= t << 7 idea, except that I'm using << 1
instead, to have more mixing in the lower bits.
With the standard FNV multiplier on 64 bits, I did get collisions while
testing. I haven't figured out exactly why th
Change by Jeroen Demeyer :
--
pull_requests: +8942
___
Python tracker
<https://bugs.python.org/issue32797>
___
___
Python-bugs-list mailing list
Unsubscribe:
Jeroen Demeyer added the comment:
> Do you believe any other multiplier would work better toward that end?
Absolutely. Ideally, the multiplier should just be a random 64-bit number.
--
___
Python tracker
<https://bugs.python.org/issu
Jeroen Demeyer added the comment:
> please restore the original tuple hash test.
Sure. I wasn't sure what to do and was I afraid that having 2 tests for tuple
hashes would be too much. If that's OK for you, then surely I will restore the
t
Jeroen Demeyer added the comment:
> The two-liner above with the xor in the second line is exactly Bernstein 33A,
> followed by a permutation of 33A's _output_ space.
Not output space, but internal state (I assume that you do that operation
inside the loop). It's replacing D
Jeroen Demeyer added the comment:
> When testing what, specifically? And the standard 32-bit FNV multiplier, or
> the standard 64-bit FNV multiplier?
FNV-1a with the t ^= 2 * t mangling running my new testsuite on either PR 9471
or PR 9534 using the 64-bit FNV multiplier to produce
101 - 200 of 586 matches
Mail list logo