Re: [Tutor] splits and pops

Marcel Wunderlich Sat, 12 Jul 2008 09:25:03 -0700

Hi Eric,

I tried following and it seems to work:


fullstring = """l1r1     ll1r2   l1r3    l1
r4      l1r5
l2r1    l2r3    l3
r3      l2r4    l2r5
l3r1    l3r2    l3r3    l3r4    l3r5
"""

# This should be a string like your's. "\t"-seperated columns,"\n"-seperated

# rows, with "\n" in some columns.

rowlength = 5
# for you it would be 9, but I was lazy when I wrote the string

prefetch = ""
lines = []
i = 0
for tab in fullstring.split("\t"):
        if i < rowlength-1:  #i.e. working on all but the last column
                # offtopic: is the last comment correct English?
                prefetch += tab + "\t" # +"\t" because split removes the tab
                i += 1
        else: # last column
                prefetch += tab[:tab.find("\n")]
                lines.append(prefetch)

prefetch = tab[(tab.find("\n")+2):] #adding the first column without the"\n"

                i = 1 #since we already added the first column

# End

After that "print lines" produces following output:

['l1r1\tll1r2\tl1r3\tl1\nr4\tl1r5', '2r1l2r3\tl3\nr3\tl2r4\tl2r5','3r1l3r2\tl3r3\tl3r4\tl3r5']

So you've got a list of the lines. Instead of Strings you could also use

lists, by making prefetch a list and instead of adding the tabs, appendingit.


However, I assumed that the new row is seperated by the first linebreak.

If that's not the case, I think that you have to check for multiplelinebreaks

and if that's true, choose manually which one to select.

Hope this helps,

Marcel

I have a horribly stupid text parsing problem that is driving me crazy,and making me think my Python skills have a long, long way to go...
What I've got is a poorly-though-out SQL dump, in the form of a textfile, where each record is separated by a newline, and each field ineach record is separated by a tab. BUT, and this is what sinks me, thereare also newlines within some of the fields. Newlines are not 'safe' –they could appear anywhere – but tabs are 'safe' – they only appear asfield delimiters.
There are nine fields per record. All I can think to do is read the filein as a string, then split on tabs. That gives me a list where everyeighth item is a string like this: u'last-field\nfirst-field'. Now Iwant to iterate through the list of strings, taking every eighth item,splitting it on '\n', and replacing it with the two resulting strings.Then I'll have the proper flat list where every nine list itemsconstitutes one complete record, and I'm good to go from there.
I've been fooling around with variations on the following (assumingsplitlist = fullstring.split('\t')):
for x in xrange(8, sys.maxint, 8):
     try:
         splitlist[x:x] = splitlist.pop(x).split('\n')
     except IndexError:
         break
The first line correctly steps over all the list items that need to besplit, but I can't come up with a line that correctly replaces thoselist items with the two strings I want. Either the cycle goes off andsplits the wrong strings, or I get nested list items, which is not whatI want. Can someone please point me in the right direction here?
Thanks,
Eric
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor



_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] splits and pops

Reply via email to