[issue3944] faster long multiplication

2009-11-28 Thread Mark Dickinson
Mark Dickinson added the comment: I'm going to close this: it's a really nice idea, but after the 30-bit long digits were implemented, the speedup doesn't seem to be worth the extra code complication. The only situation I've found where this optimization really does make a big difference is

[issue3944] faster long multiplication

2009-11-15 Thread Mark Dickinson
Changes by Mark Dickinson : -- priority: high -> normal ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://m

[issue3944] faster long multiplication

2009-03-24 Thread Mark Dickinson
Mark Dickinson added the comment: Updated version of longobject_diff1: - add mul1 back in - rename MAX_PARTIALS to the more descriptive BLOCK_MUL_SIZE - rewrite digits_multiply so that the call to digits_multiply_add always has b_size=BLOCK_MUL_SIZE, then hard-code this and get rid of

[issue3944] faster long multiplication

2009-03-24 Thread Mark Dickinson
Mark Dickinson added the comment: Thanks! Unfortunately, it looks like I messed this up yesterday by removing mul1 (after the division patch went in, mul1 wasn't used any more). I'll fix this. -- ___ Python tracker

[issue3944] faster long multiplication

2009-03-22 Thread Pernici Mario
Pernici Mario added the comment: This patch comes from 30bit_longdigit13+optimizations1.patch in issue4258 with modification for the case of multiplication by 0; it passes test_long.py and pidigits is a bit faster. -- Added file: http://bugs.python.org/file13394/longobject_diff1 __

[issue3944] faster long multiplication

2008-11-11 Thread Mark Dickinson
Mark Dickinson <[EMAIL PROTECTED]> added the comment: See issue 4258 for a patch that combines 30-bit digits with this multiplication optimization. The code is quite different from the patch posted here, but the basic idea is the same. ___ Python tracker <[EMA

[issue3944] faster long multiplication

2008-11-10 Thread STINNER Victor
Changes by STINNER Victor <[EMAIL PROTECTED]>: Removed file: http://bugs.python.org/file11595/longobject1.diff ___ Python tracker <[EMAIL PROTECTED]> ___ __

[issue3944] faster long multiplication

2008-11-10 Thread STINNER Victor
Changes by STINNER Victor <[EMAIL PROTECTED]>: Removed file: http://bugs.python.org/file11567/longobject_diff ___ Python tracker <[EMAIL PROTECTED]> ___ ___

[issue3944] faster long multiplication

2008-11-04 Thread Mark Dickinson
Changes by Mark Dickinson <[EMAIL PROTECTED]>: Removed file: http://bugs.python.org/file11724/30bit.patch ___ Python tracker <[EMAIL PROTECTED]> ___ ___

[issue3944] faster long multiplication

2008-11-04 Thread Mark Dickinson
Mark Dickinson <[EMAIL PROTECTED]> added the comment: I've opened a separate issue (issue 4258) for the idea of using 30-bit long digits instead of 15-bit ones. I'll remove the patch from this issue. Pernici Mario's idea applies even better to base 2**30 longs: one can clump together 16 (!) o

[issue3944] faster long multiplication

2008-10-07 Thread Mark Dickinson
Mark Dickinson <[EMAIL PROTECTED]> added the comment: It looks as though changing PyLong_SHIFT to 30 everywhere is much simpler than I feared. Here's a short patch that does exactly that. It: - changes the definitions in longintrepr.h - changes marshal.c to write digits as longs, not shorts

[issue3944] faster long multiplication

2008-09-29 Thread Christian Heimes
Christian Heimes <[EMAIL PROTECTED]> added the comment: Nice work :) I'm changing the target versions to 2.7 and 3.1. The proposed changes are too large for a maintenance release. -- components: +Interpreter Core nosy: +christian.heimes priority: -> high versions: +Python 2.7, Python 3

[issue3944] faster long multiplication

2008-09-29 Thread Mark Dickinson
Mark Dickinson <[EMAIL PROTECTED]> added the comment: Nice work! Seems like we're going to be able to look forward to faster integer arithmetic in Python 2.7 / 3.1. I'll try to find time to review your patch properly sometime soon. Regarding the HAVE_INT64, there's a standard autoconf macro

[issue3944] faster long multiplication

2008-09-29 Thread Pernici Mario
Pernici Mario <[EMAIL PROTECTED]> added the comment: Mark, following your suggestions about using bigger integer types, I added code to convert Python numbers to arrays of twodigits, when a 64 bit integer type is supported, and for numbers with size larger than 20; otherwise the code of the previ

[issue3944] faster long multiplication

2008-09-26 Thread Fredrik Johansson
Changes by Fredrik Johansson <[EMAIL PROTECTED]>: -- nosy: +fredrikj ___ Python tracker <[EMAIL PROTECTED]> ___ ___ Python-bugs-list mai

[issue3944] faster long multiplication

2008-09-25 Thread Mark Dickinson
Mark Dickinson <[EMAIL PROTECTED]> added the comment: Thanks for the updated patch! Looks good, on a quick scan. (One comment typo I noticed: there's a line BASE - 3 = 2*MASK - 1 presumably this should be 2*BASE - 3 on the LHS.) Just out of interest, is it possible to go further, and combine 4

[issue3944] faster long multiplication

2008-09-24 Thread Pernici Mario
Pernici Mario <[EMAIL PROTECTED]> added the comment: Yes, I think that the speed-up is due to reducing the number of shifts and masks. Changing PyLong_SHIFT to 16 would be complicated; for instance in v_iadd() carry could not be a digit of 16 bits anymore; writing code specific for 64 bit mach

[issue3944] faster long multiplication

2008-09-24 Thread Mark Dickinson
Changes by Mark Dickinson <[EMAIL PROTECTED]>: -- assignee: -> marketdickinson ___ Python tracker <[EMAIL PROTECTED]> ___ ___ Python-bu

[issue3944] faster long multiplication

2008-09-24 Thread Mark Dickinson
Mark Dickinson <[EMAIL PROTECTED]> added the comment: Just to be clear: this patch halves the number of shifts and masks, asymptotically; it doesn't affect the number of adds and multiplies (except for saving a few additions to zero by setting the initial carry intelligently). Is that corre

[issue3944] faster long multiplication

2008-09-23 Thread Pernici Mario
New submission from Pernici Mario <[EMAIL PROTECTED]>: In this patch x_mul(a, b) uses fewer bit operations for a != b, asymptotically half of them. On the three computers I tried the speed-up is around 5% for size=4 and it increases up to 45-60% just below the Karatsuba cutoff, then it decreases