"John Gunderman" <[EMAIL PROTECTED]> wrote >I am looking to parse a plaintext from a document.
When you say "a document" what kind of document do you mean? Is the document also in plain text, like HTML, or is it a binary format like MS Word? > some of the words will be multiple digits or characters. > However, I don't know the length of the words before the parse. Look at the regula5r expression module re. regular expressions allow you to define patterns and then search for those patterns within a string. > Is there a way to somehow have open() grab something > until it sees a /t or ' '? open() doesn't grab anything, it simply makes the file available for reading. You can then use read to either read the whole file or a fixed number of characers. You can also use readline() to read a single line or readlines() to read the emntire file into a list of lines. Which you use will depend a lot on the format of your data. > I was thinking I could have it count ahead the > number of spaces till the stopping point and then > parse till that point using read(), but that seems sort > of inefficient. It may or may not be efficiant but its certainly complex since it requires you to know in advance what the next bit of data looks like. If it follows a set pattern that may be OK. If possible you probably would be better reading the data line by line and parsing each line. However if the data spills across lines that will probably not be viable. If the file is not too big(a few MB say) then siomply reading the entire file as a single string and using regular expressions may be the easiest way. HTH, -- Alan Gauld Author of the Learn to Program web site http://www.freenetpages.co.uk/hp/alan.gauld _______________________________________________ Tutor maillist - [email protected] http://mail.python.org/mailman/listinfo/tutor
