[issue28199] Compact dict resizing is doing too much work

2016-11-06 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Re-committed. It might be dangerous to commit this in 3.6 at that stage. -- resolution: -> fixed stage: commit review -> resolved status: open -> closed ___ Python tracker

[issue28199] Compact dict resizing is doing too much work

2016-11-06 Thread Roundup Robot
Roundup Robot added the comment: New changeset 39f33c15243b by Serhiy Storchaka in branch 'default': Issue #28199: Microoptimized dict resizing. Based on patch by Naoki Inada. https://hg.python.org/cpython/rev/39f33c15243b -- ___ Python tracker

[issue28199] Compact dict resizing is doing too much work

2016-11-04 Thread Xiang Zhang
Xiang Zhang added the comment: #28580 and #28583 are resolved now. I think dictresize4 can be recommited now. -- stage: needs patch -> commit review ___ Python tracker

[issue28199] Compact dict resizing is doing too much work

2016-11-02 Thread INADA Naoki
INADA Naoki added the comment: Thanks, Xiang. Shard-key dict is very hard to maintain... -- ___ Python tracker ___

[issue28199] Compact dict resizing is doing too much work

2016-11-01 Thread Xiang Zhang
Xiang Zhang added the comment: Open #28583 and #28580 to tackle this. -- dependencies: +Optimize _PyDict_Next for split table, PyDict_SetDefault doesn't combine split table when needed ___ Python tracker

[issue28199] Compact dict resizing is doing too much work

2016-11-01 Thread Xiang Zhang
Xiang Zhang added the comment: I use gdb to run setuptools test suite and find the assumption, split tables are always dense is broken for both dictresize3 and dictresize4. #0 0x771171c7 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55 #1 0x77118e2a

[issue28199] Compact dict resizing is doing too much work

2016-10-31 Thread Ned Deily
Ned Deily added the comment: Excellent, thanks everyone! I'll leave this open for re-evaluation for 3.7. -- priority: release blocker -> resolution: duplicate -> stage: test needed -> needs patch ___ Python tracker

[issue28199] Compact dict resizing is doing too much work

2016-10-31 Thread Jason R. Coombs
Jason R. Coombs added the comment: Testing now... -- ___ Python tracker ___ ___ Python-bugs-list mailing list

[issue28199] Compact dict resizing is doing too much work

2016-10-31 Thread Jason R. Coombs
Jason R. Coombs added the comment: Confirmed. Tests are now passing where they were failing before. Thanks for the quick response! -- ___ Python tracker

[issue28199] Compact dict resizing is doing too much work

2016-10-31 Thread Ned Deily
Ned Deily added the comment: Thanks, Serhiy! Jason, can you verify that there is no longer a 3.6 regression with your tests? -- ___ Python tracker ___

[issue28199] Compact dict resizing is doing too much work

2016-10-31 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I have rolled back the changes. -- ___ Python tracker ___ ___

[issue28199] Compact dict resizing is doing too much work

2016-10-31 Thread Xiang Zhang
Changes by Xiang Zhang : -- Removed message: http://bugs.python.org/msg279811 ___ Python tracker ___

[issue28199] Compact dict resizing is doing too much work

2016-10-31 Thread Xiang Zhang
Xiang Zhang added the comment: The bug seems to lies in https://hg.python.org/cpython/file/tip/Objects/dictobject.c#l1291. We should use oldkeys->dk_nentries instead of numentries. -- ___ Python tracker

[issue28199] Compact dict resizing is doing too much work

2016-10-31 Thread Ned Deily
Ned Deily added the comment: Thanks, Jason, for the heads-up. Serhiy, can you take a look at this quickly? I'm going to hold 360b3 until we have a better idea what's going on. -- priority: normal -> release blocker resolution: fixed -> duplicate stage: resolved -> test needed status:

[issue28199] Compact dict resizing is doing too much work

2016-10-31 Thread Jason R. Coombs
Jason R. Coombs added the comment: In https://github.com/pypa/setuptools/issues/836, I've pinpointed this commit as implicated in dictionaries spontaneously losing keys. I have not yet attempted to replicate the issue in a standalone environment, and I'm hoping someone with a better

[issue28199] Compact dict resizing is doing too much work

2016-10-29 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: I don't have strong preferences, but at this moment dictresize4.patch looks a little better to me. Maybe I wrong and in future optimizations we will returned to insert_index(). Yet one opportunity for future optimization -- inline dk_get_index() and

[issue28199] Compact dict resizing is doing too much work

2016-10-29 Thread Roundup Robot
Roundup Robot added the comment: New changeset 6b88dfc7b25d by Serhiy Storchaka in branch '3.6': Issue #28199: Microoptimized dict resizing. Based on patch by Naoki Inada. https://hg.python.org/cpython/rev/6b88dfc7b25d New changeset f0fbc6071d7e by Serhiy Storchaka in branch 'default': Issue

[issue28199] Compact dict resizing is doing too much work

2016-10-29 Thread INADA Naoki
INADA Naoki added the comment: Serhiy, would you commit it by 3.6b3? -- sent from mobile -- ___ Python tracker ___

[issue28199] Compact dict resizing is doing too much work

2016-10-29 Thread Xiang Zhang
Xiang Zhang added the comment: If you inline insert_index, dictresize3 is not that bad. ./python3 -m perf timeit -s 'x = list(range(1000))' -- 'dict.fromkeys(x)' dictresize3: Median +- std dev: 43.9 us +- 0.7 us dictresize3(insert_index inlined): Median +- std dev: 41.6 us +- 0.6 us

[issue28199] Compact dict resizing is doing too much work

2016-10-28 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Yes, the difference is pretty small. The largest difference I got is 7% in following microbenchmark: $ ./python -m perf timeit -s 'x = list(range(1000))' -- 'dict.fromkeys(x)' Unpatched: Median +- std dev: 80.6 us +- 0.5 us dictresize3.patch:

[issue28199] Compact dict resizing is doing too much work

2016-10-28 Thread STINNER Victor
STINNER Victor added the comment: Remark about perf timeout: you can use --compare-to with the patched Python binary to check if the result is significant and compute the "x.xx faster" number. -- ___ Python tracker

[issue28199] Compact dict resizing is doing too much work

2016-10-28 Thread Xiang Zhang
Xiang Zhang added the comment: I doubt how many memcpy could benefit. Two pass does not necessarily make faster. I make a simple test: With dictresize3, (I make insert_index inline): [bin]$ ./python3 -m perf timeit -s 'd = {i:i for i in range(6)}' 'dict(d)' Median +- std

[issue28199] Compact dict resizing is doing too much work

2016-10-28 Thread INADA Naoki
INADA Naoki added the comment: Current code and my patch called insertdict_clean() or insert_index() for each entry. On the other hand, Serhiy's patch calls build_indices() once. This may be faster when compiler doesn't inlining the helper function. As a bonus, we can use memcpy to copy

[issue28199] Compact dict resizing is doing too much work

2016-10-28 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: The advantage is using memcpy in case of combined table without deletions. This is common case of creating dict without pre-resizing: dict(list), dict(iterator), etc. In future, when will be possible to reuse old entries array, this also could help in case

[issue28199] Compact dict resizing is doing too much work

2016-10-28 Thread INADA Naoki
INADA Naoki added the comment: LGTM. -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue28199] Compact dict resizing is doing too much work

2016-10-28 Thread Xiang Zhang
Xiang Zhang added the comment: Hmm, what's the advantage? -- ___ Python tracker ___ ___ Python-bugs-list

[issue28199] Compact dict resizing is doing too much work

2016-10-28 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: What if split copying entries from inserting indices? First fill a continuous array of entries (could use memcpy if there were not deletions), then build a hashtable of continuous indices. Following patch implements this idea. -- Added file:

[issue28199] Compact dict resizing is doing too much work

2016-10-25 Thread INADA Naoki
INADA Naoki added the comment: @haypo, could you review this? -- ___ Python tracker ___ ___ Python-bugs-list

[issue28199] Compact dict resizing is doing too much work

2016-10-06 Thread INADA Naoki
INADA Naoki added the comment: Since entries array is embedded in PyDictKeysObject, we can't realloc entries. And while values are split array, dictresize() convert split table into combine table. Split table may have enough size of ma_values at first in typical case. And in not typical case,

[issue28199] Compact dict resizing is doing too much work

2016-10-06 Thread Raymond Hettinger
Raymond Hettinger added the comment: For the simple case with no dummy entries, I was expecting a fast path that just realloced the keys/values/hashes arrays and then updated the index table with reinsertion logic that only touches the indices. Use realloc() is nice because it makes it

[issue28199] Compact dict resizing is doing too much work

2016-10-06 Thread INADA Naoki
Changes by INADA Naoki : Added file: http://bugs.python.org/file44983/dictresize3.patch ___ Python tracker ___

[issue28199] Compact dict resizing is doing too much work

2016-09-30 Thread INADA Naoki
INADA Naoki added the comment: Ah, I'm sorry. I forget to remove some changes relating to inplace compaction (reusing oldkeys when oldsize==newsize). -- Added file: http://bugs.python.org/file44893/dictresize2.patch ___ Python tracker

[issue28199] Compact dict resizing is doing too much work

2016-09-30 Thread STINNER Victor
STINNER Victor added the comment: dictresize.patch is quite big, it seems like half of the patch is more cleanup/refactoring. Can you please split the patch into two parts, to only a smaller patch just for the optimization part? -- ___ Python

[issue28199] Compact dict resizing is doing too much work

2016-09-30 Thread STINNER Victor
STINNER Victor added the comment: Oh, your message reminded me that I always wanted an option in the timeit module to run the benchmark on two Python versions and then directly compare the result. I just added the feature to perf and then I released perf 0.7.11, enjoy! The output is more

[issue28199] Compact dict resizing is doing too much work

2016-09-30 Thread INADA Naoki
INADA Naoki added the comment: $ ./install/bin/python3.6-default -m perf timeit -s 'x = range(1000); d={}' 'for i in x: d[i]=i; del d[i];' Median +- std dev: 363 us +- 11 us $ ./install/bin/python3.6 -m perf timeit -s 'x = range(1000); d={}' 'for i in x: d[i]=i; del d[i];'

[issue28199] Compact dict resizing is doing too much work

2016-09-29 Thread INADA Naoki
INADA Naoki added the comment: As written in comment, reusing keys object breaks odict implementation. I think we can't avoid copy for Python 3.6. -- keywords: +patch Added file: http://bugs.python.org/file44888/dictresize.patch ___ Python tracker

[issue28199] Compact dict resizing is doing too much work

2016-09-18 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > (After OrderedDict implementation is improved, functools.lru_cache can use > OrderedDict and remove doubly linked list too.) functools.lru_cache can use just ordered dict. But simple implementation is 1.5 times slower. I'm working on this. I think that

[issue28199] Compact dict resizing is doing too much work

2016-09-18 Thread INADA Naoki
INADA Naoki added the comment: > We can still clean this up for Python 3.6. We're in feature freeze, not > development freeze. Does it mean there is a chance to improve OrderedDict to use new dict implementation, if it seems safe enough? Is new implementation a feature? (After OrderedDict

[issue28199] Compact dict resizing is doing too much work

2016-09-18 Thread INADA Naoki
INADA Naoki added the comment: Current compact ordered dict implementation is bit different from yours. When there was removed item, there are NULL entries in ma_entries, instead of swapping last item and deleted item. It's important to keep insertion order. But it's easy to detect clean dict.

[issue28199] Compact dict resizing is doing too much work

2016-09-18 Thread Xiang Zhang
Xiang Zhang added the comment: Then how about entries(key, value pairs)? The size of entries does not match the available hash slots. For example, an empty dict allocates a hash array with 5 available slots and 5 entries. Now we resize the hash array to size 16 and it can afford 10 entries

[issue28199] Compact dict resizing is doing too much work

2016-09-18 Thread Raymond Hettinger
Raymond Hettinger added the comment: Just before the re-insertion, we should also do a compaction-in-place for the keys/values/hashes array if it has a significant number of holes for previously deleted entries. -- ___ Python tracker

[issue28199] Compact dict resizing is doing too much work

2016-09-18 Thread Raymond Hettinger
New submission from Raymond Hettinger: The dictresize() method unnecessarily copies the keys, values, and hashes as well as making insertions in the index table. Only the latter step is necessary or desirable. Here in the pure Python code for resizing taking from the original