John Machin wrote:
The factor of 30 indeed does not seem right -- I have done somewhat
similar stuff (calculating Levenshtein distance [edit distance] on words
read from very large files), coded the same algorithm in pure Python and
C++ (using linked lists in C++) and Python version was 2.5
Johannes Bauer dfnsonfsdu...@gmx.de writes:
Yup, I changed the Python code to behave the same way the C code did -
however overall it's not much of an improvement: Takes about 15 minutes
to execute (still factor 23).
Not sure this is completely fair if you're only looking for a pure
Python
On Mon, 12 Jan 2009 21:26:27 -0500, Steve Holden wrote:
The very idea of mapping part of a process's virtual address space onto
an area in which low-level system code resides, so writing to this
region may corrupt the system, with potentially catastrophic
consequences seems to be asking for
Grant Edwards inva...@invalid wrote:
On 2009-01-09, Sion Arrowsmith si...@chiark.greenend.org.uk wrote:
Grant Edwards inva...@invalid wrote:
If I were you, I'd try mmap()ing the file instead of reading it
into string objects one chunk at a time.
You've snipped the bit further on in that
On Jan 9, 6:41 pm, Sion Arrowsmith si...@chiark.greenend.org.uk
wrote:
You've snipped the bit further on in that sentence where the OP
says that the file of interest is 2GB. Do you still want to try
mmap'ing it?
Python's mmap object does not take an offset parameter. If it did, one
could mmap
In case the cancel didn't get through:
Sion Arrowsmith si...@chiark.greenend.org.uk wrote:
Grant Edwards inva...@invalid wrote:
2GB should easily fit within the process's virtual memory
space.
Assuming you're in a 64bit world. Me, I've only got 2GB of address
space available to play in --
On Jan 12, 1:52 pm, Sion Arrowsmith si...@chiark.greenend.org.uk
wrote:
And today's moral is: try it before posting. Yeah, I can map a 2GB
file no problem, complete with associated 2GB+ allocated VM. The
addressing is clearly not working how I was expecting it too.
The virtual memory space of
sturlamolden sturlamol...@yahoo.no writes:
On Jan 9, 6:41 pm, Sion Arrowsmith si...@chiark.greenend.org.uk
wrote:
You've snipped the bit further on in that sentence where the OP
says that the file of interest is 2GB. Do you still want to try
mmap'ing it?
Python's mmap object does not take
On 2009-01-12, Sion Arrowsmith si...@chiark.greenend.org.uk wrote:
Grant Edwards inva...@invalid wrote:
On 2009-01-09, Sion Arrowsmith si...@chiark.greenend.org.uk wrote:
Grant Edwards inva...@invalid wrote:
If I were you, I'd try mmap()ing the file instead of reading it
into string objects
On 2009-01-12, Sion Arrowsmith si...@chiark.greenend.org.uk wrote:
In case the cancel didn't get through:
Sion Arrowsmith si...@chiark.greenend.org.uk wrote:
Grant Edwards inva...@invalid wrote:
2GB should easily fit within the process's virtual memory
space.
Assuming you're in a 64bit world.
sturlamolden wrote:
On Jan 12, 1:52 pm, Sion Arrowsmith si...@chiark.greenend.org.uk
wrote:
And today's moral is: try it before posting. Yeah, I can map a 2GB
file no problem, complete with associated 2GB+ allocated VM. The
addressing is clearly not working how I was expecting it too.
sturlamolden wrote:
On Jan 12, 1:52 pm, Sion Arrowsmith si...@chiark.greenend.org.uk
wrote:
And today's moral is: try it before posting. Yeah, I can map a 2GB
file no problem, complete with associated 2GB+ allocated VM. The
addressing is clearly not working how I was expecting it too.
On 2009-01-13, Steve Holden st...@holdenweb.com wrote:
sturlamolden wrote:
On Jan 12, 1:52 pm, Sion Arrowsmith si...@chiark.greenend.org.uk
wrote:
And today's moral is: try it before posting. Yeah, I can map a 2GB
file no problem, complete with associated 2GB+ allocated VM. The
addressing
On Fri, 09 Jan 2009 15:34:17 +, MRAB wrote:
Marc 'BlackJack' Rintsch wrote:
On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote:
As this was horribly slow (20 Minutes for a 2GB file) I coded the whole
thing in C also:
Yours took ~37 minutes for 2 GiB here. This just ~15
On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote:
I've first tried Python. Please don't beat me, it's slow as hell and
probably a horrible solution:
#!/usr/bin/python
import sys
import os
f = open(sys.argv[1], r)
Mode should be 'rb'.
filesize = os.stat(sys.argv[1])[6]
On Fri, Jan 9, 2009 at 7:15 PM, Marc 'BlackJack' Rintsch bj_...@gmx.net wrote:
print(Filesize : %d % (filesize)) print(Image size : %dx%d
% (width, height)) print(Bytes per Pixel: %d % (blocksize))
Why parentheses around ``print``\s argument? In Python 3 ``print`` is
a statement
On Fri, 09 Jan 2009 19:33:53 +1000, James Mills wrote:
On Fri, Jan 9, 2009 at 7:15 PM, Marc 'BlackJack' Rintsch
bj_...@gmx.net wrote:
Why parentheses around ``print``\s argument? In Python 3 ``print``
is a statement and not a function.
Not true as of 2.6+ and 3.0+
print is now a
On Fri, Jan 9, 2009 at 7:41 PM, Marc 'BlackJack' Rintsch bj_...@gmx.net wrote:
Please read again what I wrote.
Lol I thought 3 was a smiley! :)
Sorry!
cheers
James
--
http://mail.python.org/mailman/listinfo/python-list
On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote:
datamap = { }
for i in range(len(data)):
datamap[ord(data[i])] = datamap.get(data[i], 0) + 1
Here is an error by the way: You call `ord()` just on the left side of
the ``=``, so all keys in the dictionary
On Fri, 09 Jan 2009 19:33:53 +1000, James Mills wrote:
On Fri, Jan 9, 2009 at 7:15 PM, Marc 'BlackJack' Rintsch
bj_...@gmx.net wrote:
print(Filesize : %d % (filesize)) print(Image size :
%dx%d % (width, height)) print(Bytes per Pixel: %d % (blocksize))
Why parentheses around
On Fri, 09 Jan 2009 09:15:20 +, Marc 'BlackJack' Rintsch wrote:
picture = { }
havepixels = 0
while True:
data = f.read(blocksize)
if len(data) = 0: break
if data:
break
is enough.
You've reversed the sense of the test. The OP exits the loop when data is
On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote:
As this was horribly slow (20 Minutes for a 2GB file) I coded the whole
thing in C also:
Yours took ~37 minutes for 2 GiB here. This just ~15 minutes:
#!/usr/bin/env python
from __future__ import division, with_statement
import os
Marc 'BlackJack' Rintsch wrote:
On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote:
[...]
print(Filesize : %d % (filesize)) print(Image size : %dx%d
% (width, height)) print(Bytes per Pixel: %d % (blocksize))
Why parentheses around ``print``\s argument? In Python 3 ``print``
Steven D'Aprano wrote:
On Fri, 09 Jan 2009 19:33:53 +1000, James Mills wrote:
On Fri, Jan 9, 2009 at 7:15 PM, Marc 'BlackJack' Rintsch
bj_...@gmx.net wrote:
print(Filesize : %d % (filesize)) print(Image size :
%dx%d % (width, height)) print(Bytes per Pixel: %d % (blocksize))
Why
Johannes Bauer wrote:
Which takes about 40 seconds. I want the niceness of Python but a little
more speed than I'm getting (I'd settle for factor 2 or 3 slower, but
factor 30 is just too much).
This probably doesn't contribute much, but have you tried using Python
profiler? You might have
Marc 'BlackJack' Rintsch schrieb:
f = open(sys.argv[1], r)
Mode should be 'rb'.
Check.
filesize = os.stat(sys.argv[1])[6]
`os.path.getsize()` is a little bit more readable.
Check.
print(Filesize : %d % (filesize)) print(Image size : %dx%d
% (width, height)) print(Bytes per
James Mills schrieb:
What does this little tool do anyway ?
It's very interesting the images it creates
out of files. What is this called ?
It has no particular name. I was toying around with the Princeton Cold
Boot Attack (http://citp.princeton.edu/memory/). In particular I was
interested in
Marc 'BlackJack' Rintsch schrieb:
On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote:
As this was horribly slow (20 Minutes for a 2GB file) I coded the whole
thing in C also:
Yours took ~37 minutes for 2 GiB here. This just ~15 minutes:
Ah, ok... when implementing your suggestions
mk schrieb:
Johannes Bauer wrote:
Which takes about 40 seconds. I want the niceness of Python but a little
more speed than I'm getting (I'd settle for factor 2 or 3 slower, but
factor 30 is just too much).
This probably doesn't contribute much, but have you tried using Python
profiler?
On Jan 9, 8:48 am, Johannes Bauer dfnsonfsdu...@gmx.de wrote:
No - and I've not known there was a profiler yet have found anything
meaningful (there seems to be an profiling C interface, but that won't
get me anywhere). Is that a seperate tool or something? Could you
provide a link?
Thanks,
Marc 'BlackJack' Rintsch wrote:
On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote:
As this was horribly slow (20 Minutes for a 2GB file) I coded the whole
thing in C also:
Yours took ~37 minutes for 2 GiB here. This just ~15 minutes:
#!/usr/bin/env python
from __future__ import
On Jan 9, 6:48 am, Johannes Bauer dfnsonfsdu...@gmx.de wrote:
mk schrieb:
The factor of 30 indeed does not seem right -- I have done somewhat
similar stuff (calculating Levenshtein distance [edit distance] on words
read from very large files), coded the same algorithm in pure Python and
On 2009-01-09, Johannes Bauer dfnsonfsdu...@gmx.de wrote:
I've come from C/C++ and am now trying to code some Python because I
absolutely love the language. However I still have trouble getting
Python code to run efficiently. Right now I have a easy task: Get a
file,
If I were you, I'd try
Johannes Bauer, I was about to start writing a faster version. I think
with some care and Psyco you can go about as 5 times slower than C or
something like that.
To do that you need to use almost the same code for the C version,
with a list of 256 ints for the frequencies, not using max() but a
Grant Edwards inva...@invalid wrote:
On 2009-01-09, Johannes Bauer dfnsonfsdu...@gmx.de wrote:
I've come from C/C++ and am now trying to code some Python because I
absolutely love the language. However I still have trouble getting
Python code to run efficiently. Right now I have a easy task:
On 2009-01-09, Sion Arrowsmith si...@chiark.greenend.org.uk wrote:
Grant Edwards inva...@invalid wrote:
On 2009-01-09, Johannes Bauer dfnsonfsdu...@gmx.de wrote:
I've come from C/C++ and am now trying to code some Python because I
absolutely love the language. However I still have trouble
On Fri, 09 Jan 2009 15:34:17 +, MRAB wrote:
Marc 'BlackJack' Rintsch wrote:
def iter_max_values(blocks, block_count):
for i, block in enumerate(blocks):
histogram = defaultdict(int)
for byte in block:
histogram[byte] += 1
yield
On 2009-01-09, Marc 'BlackJack' Rintsch bj_...@gmx.net wrote:
On Fri, 09 Jan 2009 15:34:17 +, MRAB wrote:
Marc 'BlackJack' Rintsch wrote:
def iter_max_values(blocks, block_count):
for i, block in enumerate(blocks):
histogram = defaultdict(int)
for byte in block:
On Jan 9, 9:56 pm, mk mrk...@gmail.com wrote:
The factor of 30 indeed does not seem right -- I have done somewhat
similar stuff (calculating Levenshtein distance [edit distance] on words
read from very large files), coded the same algorithm in pure Python and
C++ (using linked lists in C++)
On Jan 9, 2:14 pm, Marc 'BlackJack' Rintsch bj_...@gmx.net wrote:
On Fri, 09 Jan 2009 15:34:17 +, MRAB wrote:
Marc 'BlackJack' Rintsch wrote:
def iter_max_values(blocks, block_count):
for i, block in enumerate(blocks):
histogram = defaultdict(int)
for byte in
Johannes Bauer wrote:
Hello group,
I've come from C/C++ and am now trying to code some Python because I
absolutely love the language. However I still have trouble getting
Python code to run efficiently. Right now I have a easy task: Get a
file, split it up into a million chunks, count the most
On Fri, Jan 9, 2009 at 1:04 PM, Johannes Bauer dfnsonfsdu...@gmx.de wrote:
Hello group,
Hello.
(...)
Which takes about 40 seconds. I want the niceness of Python but a little
more speed than I'm getting (I'd settle for factor 2 or 3 slower, but
factor 30 is just too much).
Can anyone point
James Mills schrieb:
I have tested this against a randomly generated
file from /dev/urandom (10M). Yes the Python
one is much slower, but I believe it's bebcause
the Python implementation is _correct_ where
teh C one is _wrong_ :)
The resulting test.bin.pgm from python is exactly
3.5M
On Fri, Jan 9, 2009 at 3:13 PM, Johannes Bauer dfnsonfsdu...@gmx.de wrote:
Uhh, yes, you're right there... I must admit that I was too lazy to
include all the stat headers and to a proper st_size check in the C
version (just a quick hack), so it's practically hardcoded.
With files of exactly
On Fri, Jan 9, 2009 at 2:29 PM, James Mills
prolo...@shortcircuit.net.au wrote:
I shall attempt to optimize this :)
I have a funny feeling you might be caught up with
some features of Python - one notable one being that
some things in Python are immutable.
psyco might help here though ...
MRAB wrote:
Johannes Bauer wrote:
Hello group,
[and about 200 other lines there was no need to quote]
[...]
Have a look at psyco: http://psyco.sourceforge.net/
Have a little consideration for others when making a short reply to a
long post, please. Trim what isn't necessary. Thanks.
regards
46 matches
Mail list logo