Re: Text Processing

2011-12-22 Thread Yigit Turgut
On Dec 21, 2:01 am, Alexander Kapps alex.ka...@web.de wrote: On 20.12.2011 22:04, Nick Dokos wrote: I have a text file containing such data ;          A                B                C --- -2.0100e-01    8.000e-02    

Text Processing

2011-12-20 Thread Yigit Turgut
Hi all, I have a text file containing such data ; ABC --- -2.0100e-018.000e-028.000e-05 -2.e-010.000e+00 4.800e-04 -1.9900e-014.000e-021.600e-04 But I only need Section B, and I

Re: Text Processing

2011-12-20 Thread Dave Angel
On 12/20/2011 02:17 PM, Yigit Turgut wrote: Hi all, I have a text file containing such data ; ABC --- -2.0100e-018.000e-028.000e-05 -2.e-010.000e+00 4.800e-04 -1.9900e-014.000e-02

Re: Text Processing

2011-12-20 Thread Jérôme
Tue, 20 Dec 2011 11:17:15 -0800 (PST) Yigit Turgut a écrit: Hi all, I have a text file containing such data ; ABC --- -2.0100e-018.000e-028.000e-05 -2.e-010.000e+00 4.800e-04

Re: Text Processing

2011-12-20 Thread Nick Dokos
Jérôme jer...@jolimont.fr wrote: Tue, 20 Dec 2011 11:17:15 -0800 (PST) Yigit Turgut a écrit: Hi all, I have a text file containing such data ; ABC --- -2.0100e-018.000e-02

Re: Text Processing

2011-12-20 Thread Alexander Kapps
On 20.12.2011 22:04, Nick Dokos wrote: I have a text file containing such data ; ABC --- -2.0100e-018.000e-028.000e-05 -2.e-010.000e+00 4.800e-04 -1.9900e-014.000e-021.600e-04

Re: emacs lisp text processing example (html5 figure/figcaption)

2011-07-05 Thread Ian Kelly
On Mon, Jul 4, 2011 at 12:36 AM, Xah Lee xah...@gmail.com wrote: So, a solution by regex is out. Actually, none of the complications you listed appear to exclude regexes. Here's a possible (untested) solution: div class=img ((?:\s*img src=[^.]+\.(?:jpg|png|gif) alt=[^]+ width=[0-9]+

Re: emacs lisp text processing example (html5 figure/figcaption)

2011-07-05 Thread Xah Lee
On Jul 4, 12:13 pm, S.Mandl stefanma...@web.de wrote: Nice. I guess that XSLT would be another (the official) approach for such a task. Is there an XSLT-engine for Emacs? -- Stefan haven't used XSLT, and don't know if there's one in emacs... it'd be nice if someone actually give a

Re: emacs lisp text processing example (html5 figure/figcaption)

2011-07-05 Thread Xah Lee
On Jul 5, 12:17 pm, Ian Kelly ian.g.ke...@gmail.com wrote: On Mon, Jul 4, 2011 at 12:36 AM, Xah Lee xah...@gmail.com wrote: So, a solution by regex is out. Actually, none of the complications you listed appear to exclude regexes.  Here's a possible (untested) solution: div class=img

Re: emacs lisp text processing example (html5 figure/figcaption)

2011-07-05 Thread Xah Lee
On Jul 5, 12:17 pm, Ian Kelly ian.g.ke...@gmail.com wrote: On Mon, Jul 4, 2011 at 12:36 AM, Xah Lee xah...@gmail.com wrote: So, a solution by regex is out. Actually, none of the complications you listed appear to exclude regexes.  Here's a possible (untested) solution: div class=img

Re: emacs lisp text processing example (html5 figure/figcaption)

2011-07-05 Thread Ian Kelly
On Tue, Jul 5, 2011 at 2:37 PM, Xah Lee xah...@gmail.com wrote: but in anycase, i can't see how this part would work p class=cpt((?:[^]|(?!/p))+)/p It's not that different from the pattern 「alt=[^]+」 earlier in the regex. The capture group accepts one or more characters that either aren't '',

Re: emacs lisp text processing example (html5 figure/figcaption)

2011-07-05 Thread S.Mandl
haven't used XSLT, and don't know if there's one in emacs... it'd be nice if someone actually give a example... Hi Xah, actually I have to correct myself. HTML is not XML. If it were, you could use a stylesheet like this: ?xml version=1.0 encoding=ISO-8859-1? xsl:stylesheet version=1.0

emacs lisp text processing example (html5 figure/figcaption)

2011-07-04 Thread Xah Lee
. -- Emacs Lisp: Processing HTML: Transform Tags to HTML5 “figure” and “figcaption” Tags Xah Lee, 2011-07-03 Another triumph of using elisp for text processing over perl/python. The Problem -- Summary I want batch transform

Re: emacs lisp text processing example (html5 figure/figcaption)

2011-07-04 Thread S.Mandl
Nice. I guess that XSLT would be another (the official) approach for such a task. Is there an XSLT-engine for Emacs? -- Stefan -- http://mail.python.org/mailman/listinfo/python-list

Is text processing with dicts a good use case for Python cross-compilers like Cython/Pyrex or ShedSkin?

2010-12-16 Thread python
Is text processing with dicts a good use case for Python cross-compilers like Cython/Pyrex or ShedSkin? (I've read the cross compiler claims about massive increases in pure numeric performance). I have 3 use cases I'm considering for Python-to-C++ cross-compilers for generating 32-bit Python

Re: Is text processing with dicts a good use case for Python cross-compilers like Cython/Pyrex or ShedSkin?

2010-12-16 Thread Stefan Behnel
pyt...@bdurham.com, 16.12.2010 21:03: Is text processing with dicts a good use case for Python cross-compilers like Cython/Pyrex or ShedSkin? (I've read the cross compiler claims about massive increases in pure numeric performance). Cython is generally a good choice for string processing

Simple Text Processing

2009-09-10 Thread AJAskey
New to Python. I can solve the problem in perl by using split() to an array. Can't figure it out in Python. I'm reading variable lines of text. I want to use the first number I find. The problem is the lines are variable. Input example: this is a number: 1 here are some numbers 1 2 3 4

Re: Simple Text Processing

2009-09-10 Thread Benjamin Kaplan
On Thu, Sep 10, 2009 at 11:36 AM, AJAskey aske...@gmail.com wrote: New to Python. I can solve the problem in perl by using split() to an array. Can't figure it out in Python. I'm reading variable lines of text. I want to use the first number I find. The problem is the lines are variable.

Re: Simple Text Processing

2009-09-10 Thread AJAskey
Never mind. I guess I had been trying to make it more difficult than it is. As a note, I can work on something for 10 hours and not figure it out. But the second I post to a group, then I immediately figure it out myself. Strange snake this Python... Example for anyone else interested: line =

Re: text processing SOLVED

2008-09-27 Thread [EMAIL PROTECTED]
Thanks Black Jack Working -- http://mail.python.org/mailman/listinfo/python-list

text processing

2008-09-25 Thread [EMAIL PROTECTED]
I have string like follow 12560/ABC,12567/BC,123,567,890/JK I want above string to group like as follow (12560,ABC) (12567,BC) (123,567,890,JK) i try regular expression i am able to get first two not the third one. can regular expression given data in different groups --

Re: text processing

2008-09-25 Thread Marc 'BlackJack' Rintsch
On Thu, 25 Sep 2008 15:51:28 +0100, [EMAIL PROTECTED] wrote: I have string like follow 12560/ABC,12567/BC,123,567,890/JK I want above string to group like as follow (12560,ABC) (12567,BC) (123,567,890,JK) i try regular expression i am able to get first two not the third one. can

Re: text processing

2008-09-25 Thread kib2
You can do it with regexps too : -- import re to_watch = re.compile(r(?Pnumber\d+)[/](?Pletter[A-Z]+)) final_list = to_watch.findall(12560/ABC,12567/BC,123,567,890/JK) for number,word in final_list : print number:%s -- word:

Re: text processing

2008-09-25 Thread MRAB
On Sep 25, 6:34 pm, Marc 'BlackJack' Rintsch [EMAIL PROTECTED] wrote: On Thu, 25 Sep 2008 15:51:28 +0100, [EMAIL PROTECTED] wrote: I have string like follow 12560/ABC,12567/BC,123,567,890/JK I want above string to group like as follow (12560,ABC) (12567,BC) (123,567,890,JK) i try

Re: text processing

2008-09-25 Thread Paul McGuire
On Sep 25, 9:51 am, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I have string like follow 12560/ABC,12567/BC,123,567,890/JK I want above string to group like as follow (12560,ABC) (12567,BC) (123,567,890,JK) i try regular expression i am able to get first two not the third one. can

emacs lisp as text processing language...

2007-10-29 Thread Xah Lee
Text Processing with Emacs Lisp Xah Lee, 2007-10-29 This page gives a outline of how to use emacs lisp to do text processing, using a specific real-world problem as example. If you don't know elisp, first take a gander at Emacs Lisp Basics. HTML version with links and colors is at: http

Re: emacs lisp as text processing language...

2007-10-29 Thread Xah Lee
... continued from previous post. PS I'm cross-posting this post to perl and python groups because i find that it being a little know fact that emacs lisp's power in the area of text processing, are far beyond Perl (or Python). ... i worked as a professional perl programer since 1998. I started

Re: Simple Text Processing Help

2007-10-17 Thread Tim Roberts
[EMAIL PROTECTED] wrote: And now for something completely different... I've been reading up a bit about Python and Excel and I quickly told the program to output to Excel quite easily. However, what if the input file were a Word document? I can't seem to find much information about parsing

Re: Simple Text Processing Help

2007-10-16 Thread Peter Otten
patrick.waldo wrote: manipulation? Also, I conceptually get it, but would you mind walking me through for key, group in groupby(instream, unicode.isspace): if not key: yield .join(group) itertools.groupby() splits a sequence into groups with the same key; e. g. to

Re: Simple Text Processing Help

2007-10-16 Thread patrick . waldo
And now for something completely different... I see a lot of COM stuff with Python for excel...and I quickly made the same program output to excel. What if the input file were a Word document? Where is there information about manipulating word documents, or what could I add to make the same

Re: Simple Text Processing Help

2007-10-16 Thread patrick . waldo
And now for something completely different... I've been reading up a bit about Python and Excel and I quickly told the program to output to Excel quite easily. However, what if the input file were a Word document? I can't seem to find much information about parsing Word files. What could I add

Re: Simple Text Processing Help

2007-10-15 Thread patrick . waldo
lines = open('your_file.txt').readlines()[:4] print lines print map(len, lines) gave me: ['\xef\xbb\xbf200-720-769-93-2\n', 'kyselina mo\xc4\x8dov \xc3\xa1 C5H4N4O3\n', '\n', '200-001-8\t50-00-0\n'] [28, 32, 1, 18] I think it means that I'm still at option 3. I got

Re: Simple Text Processing Help

2007-10-15 Thread patrick . waldo
lines = open('your_file.txt').readlines()[:4] print lines print map(len, lines) gave me: ['\xef\xbb\xbf200-720-769-93-2\n', 'kyselina mo\xc4\x8dov \xc3\xa1 C5H4N4O3\n', '\n', '200-001-8\t50-00-0\n'] [28, 32, 1, 18] I think it means that I'm still at option 3. I got

Re: Simple Text Processing Help

2007-10-15 Thread Marc 'BlackJack' Rintsch
On Mon, 15 Oct 2007 10:47:16 +, patrick.waldo wrote: my sample input file looks like this( not organized,as you see it): 200-720-769-93-2 kyselina mocová C5H4N4O3 200-001-8 50-00-0 formaldehyd CH2O 200-002-3 50-01-1 guanidínium-chlorid CH5N3.ClH

Re: Simple Text Processing Help

2007-10-15 Thread Paul Hankin
On Oct 15, 12:20 pm, Marc 'BlackJack' Rintsch [EMAIL PROTECTED] wrote: On Mon, 15 Oct 2007 10:47:16 +, patrick.waldo wrote: my sample input file looks like this( not organized,as you see it): 200-720-769-93-2 kyselina mocová C5H4N4O3 200-001-8 50-00-0 formaldehyd

Re: Simple Text Processing Help

2007-10-15 Thread Peter Otten
patrick.waldo wrote: my sample input file looks like this( not organized,as you see it): 200-720-769-93-2 kyselina mocová C5H4N4O3 200-001-8 50-00-0 formaldehyd CH2O 200-002-3 50-01-1 guanidínium-chlorid CH5N3.ClH Assuming that the records are always

Re: Simple Text Processing Help

2007-10-15 Thread patrick . waldo
Wow, thank you all. All three work. To output correctly I needed to add: output.write(\r\n) This is really a great help!! Because of my limited Python knowledge, I will need to try to figure out exactly how they work for future text manipulation and for my own knowledge. Could you recommend

Re: Simple Text Processing Help

2007-10-15 Thread Paul Hankin
On Oct 15, 10:08 pm, [EMAIL PROTECTED] wrote: Because of my limited Python knowledge, I will need to try to figure out exactly how they work for future text manipulation and for my own knowledge. Could you recommend some resources for this kind of text manipulation? Also, I conceptually get

Re: Simple Text Processing Help

2007-10-15 Thread Paul McGuire
On Oct 14, 8:48 am, [EMAIL PROTECTED] wrote: Hi all, I started Python just a little while ago and I am stuck on something that is really simple, but I just can't figure out. Essentially I need to take a text document with some chemical information in Czech and organize it into another text

Simple Text Processing Help

2007-10-14 Thread patrick . waldo
Hi all, I started Python just a little while ago and I am stuck on something that is really simple, but I just can't figure out. Essentially I need to take a text document with some chemical information in Czech and organize it into another text file. The information is always EINECS number,

Re: Simple Text Processing Help

2007-10-14 Thread Marc 'BlackJack' Rintsch
On Sun, 14 Oct 2007 13:48:51 +, patrick.waldo wrote: Essentially I need to take a text document with some chemical information in Czech and organize it into another text file. The information is always EINECS number, CAS, chemical name, and formula in tables. I need to organize them

Re: Simple Text Processing Help

2007-10-14 Thread Paul Hankin
On Oct 14, 2:48 pm, [EMAIL PROTECTED] wrote: Hi all, I started Python just a little while ago and I am stuck on something that is really simple, but I just can't figure out. Essentially I need to take a text document with some chemical information in Czech and organize it into another text

Re: Simple Text Processing Help

2007-10-14 Thread patrick . waldo
Thank you both for helping me out. I am still rather new to Python and so I'm probably trying to reinvent the wheel here. When I try to do Paul's response, I get tokens = line.strip().split() [] So I am not quite sure how to read line by line. tokens = input.read().split() gets me all the

Re: Simple Text Processing Help

2007-10-14 Thread Marc 'BlackJack' Rintsch
On Sun, 14 Oct 2007 16:57:06 +, patrick.waldo wrote: Thank you both for helping me out. I am still rather new to Python and so I'm probably trying to reinvent the wheel here. When I try to do Paul's response, I get tokens = line.strip().split() [] What is in `line`? Paul wrote this

Re: Simple Text Processing Help

2007-10-14 Thread John Machin
On Oct 14, 11:48 pm, [EMAIL PROTECTED] wrote: Hi all, I started Python just a little while ago and I am stuck on something that is really simple, but I just can't figure out. Essentially I need to take a text document with some chemical information in Czech and organize it into another text

Re: Text processing and file creation

2007-09-07 Thread Paddy
On Sep 7, 3:50 am, George Sakkis [EMAIL PROTECTED] wrote: On Sep 5, 5:17 pm, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: If this was a code golf challenge, I'd choose the Unix split solution and be both maintainable as well as concise :-) - Paddy. --

Re: Text processing and file creation

2007-09-06 Thread Alberto Griggio
Thanks for making me aware of the (UNIX) split command (split -l 5 inFile.txt), it's short, it's fast, it's beautiful. I am still wondering how to do this efficiently in Python (being kind of new to it... and it's not for homework). Something like this should do the job: def nlines(num,

Re: Text processing and file creation

2007-09-06 Thread Arnau Sanchez
[EMAIL PROTECTED] escribió: I am still wondering how to do this efficiently in Python (being kind of new to it... and it's not for homework). You should post some code anyway, it would be easier to give useful advice (it would also demonstrate that you put some effort on it). Anyway, here is

Re: Text processing and file creation

2007-09-06 Thread Shawn Milochik
Here's my solution, for what it's worth: #!/usr/bin/env python import os input = open(test.txt, r) counter = 0 fileNum = 0 fileName = def newFileName(): global fileNum, fileName while os.path.exists(fileName) or fileName == : fileNum += 1 x = %0.5d % fileNum

Re: Text processing and file creation

2007-09-06 Thread George Sakkis
On Sep 5, 5:17 pm, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: On Sep 5, 1:28 pm, Paddy [EMAIL PROTECTED] wrote: On Sep 5, 5:13 pm, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I have a text source file of about 20.000 lines.From this file, I like to write the first 5 lines to a new

Re: Text processing and file creation

2007-09-06 Thread Ricardo Aráoz
Shawn Milochik wrote: On 9/5/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I have a text source file of about 20.000 lines. From this file, I like to write the first 5 lines to a new file. Close that file, grab the next 5 lines write these to a new file... grabbing 5 lines and creating new

Text processing and file creation

2007-09-05 Thread [EMAIL PROTECTED]
I have a text source file of about 20.000 lines. From this file, I like to write the first 5 lines to a new file. Close that file, grab the next 5 lines write these to a new file... grabbing 5 lines and creating new files until processing of all 20.000 lines is done. Is there an efficient way to

Re: Text processing and file creation

2007-09-05 Thread kyosohma
On Sep 5, 11:13 am, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I have a text source file of about 20.000 lines.From this file, I like to write the first 5 lines to a new file. Close that file, grab the next 5 lines write these to a new file... grabbing 5 lines and creating new files until

Re: Text processing and file creation

2007-09-05 Thread Arnau Sanchez
[EMAIL PROTECTED] escribió: I have a text source file of about 20.000 lines. From this file, I like to write the first 5 lines to a new file. Close that file, grab the next 5 lines write these to a new file... grabbing 5 lines and creating new files until processing of all 20.000 lines is

Re: Text processing and file creation

2007-09-05 Thread Bjoern Schliessmann
[EMAIL PROTECTED] wrote: I would use a counter in a for loop using the readline method to iterate over the 20,000 line file. file objects are iterables themselves, so there's no need to do that by using a method. Reset the counter every 5 lines/ iterations and close the file. I'd use a

Re: Text processing and file creation

2007-09-05 Thread Shawn Milochik
On 9/5/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I have a text source file of about 20.000 lines. From this file, I like to write the first 5 lines to a new file. Close that file, grab the next 5 lines write these to a new file... grabbing 5 lines and creating new files until processing

Re: Text processing and file creation

2007-09-05 Thread kyosohma
On Sep 5, 11:57 am, Bjoern Schliessmann usenet- [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote: I would use a counter in a for loop using the readline method to iterate over the 20,000 line file. file objects are iterables themselves, so there's no need to do that by using a method.

Re: Text processing and file creation

2007-09-05 Thread James Stroud
[EMAIL PROTECTED] wrote: I have a text source file of about 20.000 lines. From this file, I like to write the first 5 lines to a new file. Close that file, grab the next 5 lines write these to a new file... grabbing 5 lines and creating new files until processing of all 20.000 lines is done.

Re: Text processing and file creation

2007-09-05 Thread Paddy
On Sep 5, 5:13 pm, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I have a text source file of about 20.000 lines.From this file, I like to write the first 5 lines to a new file. Close that file, grab the next 5 lines write these to a new file... grabbing 5 lines and creating new files until

Re: Text processing and file creation

2007-09-05 Thread [EMAIL PROTECTED]
On Sep 5, 1:28 pm, Paddy [EMAIL PROTECTED] wrote: On Sep 5, 5:13 pm, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I have a text source file of about 20.000 lines.From this file, I like to write the first 5 lines to a new file. Close that file, grab the next 5 lines write these to a new

Re: Text processing and file creation

2007-09-05 Thread Arnaud Delobelle
On Sep 5, 5:13 pm, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I have a text source file of about 20.000 lines.From this file, I like to write the first 5 lines to a new file. Close that file, grab the next 5 lines write these to a new file... grabbing 5 lines and creating new files until

Re: Text processing and file creation

2007-09-05 Thread Steve Holden
Arnaud Delobelle wrote: [...] from my_useful_functions import new_file, write_first_5_lines, done_processing_file, grab_next_5_lines, another_new_file, write_these in_f = open('myfile') out_f = new_file() write_first_5_lines(in_f, out_f) # write first 5 lines close(out_f) while not

Re: Text processing and file creation

2007-09-05 Thread Ginger
can parse lines from read buffer freely. have fun! - Original Message - From: Shawn Milochik [EMAIL PROTECTED] To: python-list@python.org Sent: Thursday, September 06, 2007 1:03 AM Subject: Re: Text processing and file creation On 9/5/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I

Re: Text processing and file creation

2007-09-05 Thread Arnaud Delobelle
On Sep 6, 12:46 am, Steve Holden [EMAIL PROTECTED] wrote: Arnaud Delobelle wrote: [...] print all done! # All done print Now there are 4000 files in this directory... Python 3.0 - ready (I've used open() instead of file()) bzzt! Python 3.0a1 (py3k:57844, Aug 31 2007, 16:54:27)

Re: On text processing

2007-03-24 Thread Daniel Nogradi
I'm in a process of rewriting a bash/awk/sed script -- that grew to big -- in python. I can rewrite it in a simple line-by-line way but that results in ugly python code and I'm sure there is a simple pythonic way. The bash script processed text files of the form:

On text processing

2007-03-23 Thread Daniel Nogradi
Hi list, I'm in a process of rewriting a bash/awk/sed script -- that grew to big -- in python. I can rewrite it in a simple line-by-line way but that results in ugly python code and I'm sure there is a simple pythonic way. The bash script processed text files of the form:

Re: On text processing

2007-03-23 Thread bearophileHUGS
Daniel Nogradi: Any elegant solution for this? This is my first try: ddata = {} inside_matrix = False for row in file(data.txt): if row.strip(): fields = row.split() if len(fields) == 2: inside_matrix = False ddata[fields[0]] = [fields[1]]

Re: On text processing

2007-03-23 Thread Daniel Nogradi
This is my first try: ddata = {} inside_matrix = False for row in file(data.txt): if row.strip(): fields = row.split() if len(fields) == 2: inside_matrix = False ddata[fields[0]] = [fields[1]] lastkey = fields[0] else:

Re: On text processing

2007-03-23 Thread Paddy
On Mar 23, 10:30 pm, Daniel Nogradi [EMAIL PROTECTED] wrote: Hi list, I'm in a process of rewriting a bash/awk/sed script -- that grew to big -- in python. I can rewrite it in a simple line-by-line way but that results in ugly python code and I'm sure there is a simple pythonic way. The

Re: On text processing

2007-03-23 Thread Paul McGuire
On Mar 23, 5:30 pm, Daniel Nogradi [EMAIL PROTECTED] wrote: Hi list, I'm in a process of rewriting a bash/awk/sed script -- that grew to big -- in python. I can rewrite it in a simple line-by-line way but that results in ugly python code and I'm sure there is a simple pythonic way. The

Suitability for long-running text processing?

2007-01-08 Thread tsuraan
I have a pair of python programs that parse and index files on my computer to make them searchable. The problem that I have is that they continually grow until my system is out of memory, and then things get ugly. I remember, when I was first learning python, reading that the python interpreter

Re: Suitability for long-running text processing?

2007-01-08 Thread tsuraan
After reading http://www.python.org/doc/faq/general/#how-does-python-manage-memory, I tried modifying this program as below: a=[] for i in xrange(33,127): for j in xrange(33,127): for k in xrange(33,127): for l in xrange(33, 127): a.append(chr(i)+chr(j)+chr(k)+chr(l)) import sys

Re: Suitability for long-running text processing?

2007-01-08 Thread Felipe Almeida Lessa
On 1/8/07, tsuraan [EMAIL PROTECTED] wrote: [snip] The loop is deep enough that I always interrupt it once python's size is around 250 MB. Once the gc.collect() call is finished, python's size has not changed a bit. [snip] This has been tried under python 2.4.3 in gentoo linux and python 2.3

Re: Suitability for long-running text processing?

2007-01-08 Thread tsuraan
I just tried on my system (Python is using 2.9 MiB) a = ['a' * (1 20) for i in xrange(300)] (Python is using 304.1 MiB) del a (Python is using 2.9 MiB -- as before) And I didn't even need to tell the garbage collector to do its job. Some info: It looks like the big difference between our

Re: Suitability for long-running text processing?

2007-01-08 Thread Felipe Almeida Lessa
On 1/8/07, tsuraan [EMAIL PROTECTED] wrote: I just tried on my system (Python is using 2.9 MiB) a = ['a' * (1 20) for i in xrange(300)] (Python is using 304.1 MiB) del a (Python is using 2.9 MiB -- as before) And I didn't even need to tell the garbage collector to do its

Re: Suitability for long-running text processing?

2007-01-08 Thread Chris Mellon
On 1/8/07, Felipe Almeida Lessa [EMAIL PROTECTED] wrote: On 1/8/07, tsuraan [EMAIL PROTECTED] wrote: I just tried on my system (Python is using 2.9 MiB) a = ['a' * (1 20) for i in xrange(300)] (Python is using 304.1 MiB) del a (Python is using 2.9 MiB -- as before)

Re: Suitability for long-running text processing?

2007-01-08 Thread tsuraan
$ python Python 2.4.4c1 (#2, Oct 11 2006, 21:51:02) [GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)] on linux2 Type help, copyright, credits or license for more information. # Python is using 2.7 MiB ... a = ['1234' for i in xrange(10 20)] # Python is using 42.9 MiB ... del a #

Re: Suitability for long-running text processing?

2007-01-08 Thread tsuraan
My first thought was that interned strings were causing the growth, but that doesn't seem to be the case. Interned strings, as of 2.3, are no longer immortal, right? The intern doc says you have to keep a reference around to the string now, anyhow. I really wish I could find that thing I

Re: Suitability for long-running text processing?

2007-01-08 Thread Chris Mellon
On 1/8/07, tsuraan [EMAIL PROTECTED] wrote: My first thought was that interned strings were causing the growth, but that doesn't seem to be the case. Interned strings, as of 2.3, are no longer immortal, right? The intern doc says you have to keep a reference around to the string now,

Re: Suitability for long-running text processing?

2007-01-08 Thread tsuraan
I remember something about it coming up in some of the discussions of free lists and better behavior in this regard in 2.5, but I don't remember the details. Under Python 2.5, my original code posting no longer exhibits the bug - upon calling del(a), python's size shrinks back to ~4 MB, which

Beginner question on text processing

2006-12-29 Thread Doran, Harold
I am beginning to use python primarily to organize data into formats needed for input into some statistical packages. I do not have much programming experience outside of LaTeX and R, so some of this is a bit new. I am attempting to write a program that reads in a text file that contains some

Re: Beginner question on text processing

2006-12-29 Thread skip
Harold To illustrate, assume I have a text file, call it test.txt, with Harold the following information: Harold X11 .32 Harold X22 .45 Harold My goal in the python program is to manipulate this file such Harold that a new file would be created that looks like:

fast text processing

2006-02-21 Thread Alexis Gallagher
(I tried to post this yesterday but I think my ISP ate it. Apologies if this is a double-post.) Is it possible to do very fast string processing in python? My bioinformatics application needs to scan very large ASCII files (80GB+), compare adjacent lines, and conditionally do some further

Re: fast text processing

2006-02-21 Thread Steve Holden
Alexis Gallagher wrote: (I tried to post this yesterday but I think my ISP ate it. Apologies if this is a double-post.) Is it possible to do very fast string processing in python? My bioinformatics application needs to scan very large ASCII files (80GB+), compare adjacent lines, and

Re: fast text processing

2006-02-21 Thread Ben Sizer
Maybe this code will be faster? (If it even does the same thing: largely untested) filehandle = open(data,'r',buffering=1000) fileIter = iter(filehandle) lastLine = fileIter.next() lastTokens = lastLine.strip().split(delimiter) lastGeno = extract(lastTokens[0]) for currentLine in fileIter:

Re: fast text processing

2006-02-21 Thread Alexis Gallagher
Steve, First, many thanks! Steve Holden wrote: Alexis Gallagher wrote: filehandle = open(data,'r',buffering=1000) This buffer size seems, shall we say, unadventurous? It's likely to slow things down considerably, since the filesystem is probably going to naturally wnt to use a rather

Re: fast text processing

2006-02-21 Thread Larry Bates
Alexis Gallagher wrote: Steve, First, many thanks! Steve Holden wrote: Alexis Gallagher wrote: filehandle = open(data,'r',buffering=1000) This buffer size seems, shall we say, unadventurous? It's likely to slow things down considerably, since the filesystem is probably going to

Newbie Text Processing Question

2005-10-04 Thread gshepherd281
Hi, I'm a total newbie to Python so any and all advice is greatly appreciated. I'm trying to use regular expressions to process text in an SGML file but only in one section. So the input would look like this: ch-part no=ItitleRESEARCH GUIDE sec-main no=1.01titlecontent paracontent sec-main

Re: Newbie Text Processing Question

2005-10-04 Thread Gregory Piñero
That's how Python works. You read in the whole file, edit it, and write it back out. As far as I know there's no way to edit a file in place which I'm assuming is what you're asking? And now, cue the responses telling you to use a fancy parser (XML?) for your project ;-) -Greg On 4 Oct 2005

Re: Newbie Text Processing Question

2005-10-04 Thread James Stroud
You can edit a file in place, but it is not applicable to what you are doing. As soon as you insert the first biblio, you've shifted everything downstream by those 8 bytes. Since they map to a physically located blocks on a physical drive, you will have to rewrite those blocks. If it is a big

Re: Newbie Text Processing Question

2005-10-04 Thread Mike Meyer
[EMAIL PROTECTED] writes: I'm a total newbie to Python so any and all advice is greatly appreciated. Well, I've got some for you. I'm trying to use regular expressions to process text in an SGML file but only in one section. This is generally a bad idea. SGML family languages aren't easy to

Re: Newbie Text Processing Question

2005-10-04 Thread Fredrik Lundh
Gregory Piñero wrote: That's how Python works. You read in the whole file, edit it, and write it back out. that's how file systems work. if file systems generally supported insert operations, Python would of course support that feature. /F --

Re: Improving my text processing script

2005-09-01 Thread Paul McGuire
Even though you are using re's to try to look for specific substrings (which you sort of fake in by splitting on Identifier, and then prepending Identifier to every list element, so that the re will match...), this program has quite a few holes. What if the word Identifier is inside one of the

Re: Improving my text processing script

2005-09-01 Thread Miki Tebeka
Hello pruebauno, import re f=file('tlst') tlst=f.read().split('\n') f.close() tlst = open(tlst).readlines() f=file('plst') sep=re.compile('Identifier (.*?)') plst=[] for elem in f.read().split('Identifier'): content='Identifier'+elem match=sep.search(content) if

Re: Improving my text processing script

2005-09-01 Thread pruebauno
Paul McGuire wrote: match...), this program has quite a few holes. What if the word Identifier is inside one of the quoted strings? What if the actual value is tablename10? This will match your tablename1 string search, but it is certainly not what you want. Did you know there are trailing

Re: Improving my text processing script

2005-09-01 Thread pruebauno
Miki Tebeka wrote: Look at re.findall, I think it'll be easier. Minor changes aside the interesting thing, as you pointed out, would be using re.findall. I could not figure out how to. -- http://mail.python.org/mailman/listinfo/python-list

Re: Improving my text processing script

2005-09-01 Thread pruebauno
[EMAIL PROTECTED] wrote: Paul McGuire wrote: match...), this program has quite a few holes. tried run it though and it is not working for me. The following code runs but prints nothing at all: import pyparsing as prs And this is the point where I have to post the real stuff because your

Re: Improving my text processing script

2005-09-01 Thread Paul McGuire
Yes indeed, the real data often has surprising differences from the simulations! :) It turns out that pyparsing LineStart()'s are pretty fussy. Usually, pyparsing is very forgiving about whitespace between expressions, but it turns out that LineStart *must* be followed by the next expression,

Improving my text processing script

2005-08-31 Thread pruebauno
I am sure there is a better way of writing this, but how? import re f=file('tlst') tlst=f.read().split('\n') f.close() f=file('plst') sep=re.compile('Identifier (.*?)') plst=[] for elem in f.read().split('Identifier'): content='Identifier'+elem match=sep.search(content) if

Re: text processing problem

2005-04-08 Thread Matt
Maurice LING wrote: Matt wrote: I'd HIGHLY suggest purchasing the excellent a href=http://www.oreilly.com/catalog/regex2/index.html;Mastering Regular Expressions/a by Jeff Friedl. Although it's mostly geared towards Perl, it will answer all your questions about regular expressions.

  1   2   >