Jay Mutter III wrote: > Luke; I'm a bit pressed for time right now and I can't look over this e-mail. Please reply on-list in the future using the 'reply-all' feature. You're more likely to get a prompt response. (this e-mail is carbon copied to the list, so don't worry about sending another.) > > Actually it did help but the following > > for line in text: > if len(line) > 1 and line[-2] in ';,-': > line = line.rstrip() > output.write(line) > else: output.write(line) > > does not have any apparent effect on my data. > > I start with lines > > > A.-C. Manufacturing Company. (See Sebastian, A. A., > and Capes, assignors.) > A. G. A. Railway Light & Signal Co. (See Meden, Elof > H„ assignor.) > A-N Company, The. (See Alexander and Nasb, as- > signors.; > AN Company, The. (See Nash, It. J., and Alexander, as- > signors.) > A/S. Arendal Smelteverk. (See Kaaten, Einar, assignor.) > A/S. Bjorgums Gevaei'kompani. (See Bjorguni, Nils, as- > signor.) > A/S Mekano. (Sec Schepeler, Herman A., assignor.) > A/S Myrens Verkstad. (See Klling, Jens W. A., assignor.) > A/S Stordo Kisgruber. (See Nielsen, C., and Ilelleland, > assignors.) > > and I end up with the same. > My goal is to strip out the CR or LF or whatever so that all info for > one entity is on 1 line. > > Any ideas of where i am going wrong? > > Thanks > > Jay > > > On Mar 21, 2007, at 1:41 AM, Luke Paireepinart wrote: > >> >>> # The next 5 lines are so I have an idea of how many lines i started >>> with in the file. >>> >>> in_filename = raw_input('What is the COMPLETE name of the file you >>> want to open: ') >>> in_file = open(in_filename, 'r') >>> text = in_file.read() >> read() returns a one-dimensional list with all the data, not a >> 2-dimensional one with each element a line. >> Use readlines() for this functionality. >> (Eg. A file with contents 'hello\nhoware\nyou?' would have this >> string returned by read(), but >> readlines() would return ['hello\n','howare\n','you?'].) >>> num_lines = text.count('\n') >> or just len(text) if you're using readlines() >>> print 'There are', num_lines, 'lines in the file', in_filename >>> >>> output = open("cleandata.txt","a") # file for writing data to >>> after stripping newline character >> You might want to open this file in 'write' mode while you're >> testing, so previous test results don't confuse you. >>> >>> # read file, copying each line to new file >>> for line in text: >> since read() returns a 1-dimensional list, you're looping over every >> character in the file, not every line. >>> if line[:-1] in '-': >> In this case this is the same as "if line == '-':" because your >> 'line' variable only contains characters. >>> line = line.rstrip() >>> output.write(line) >>> else: output.write(line) >>> >>> print "Data written to cleandata.txt." >>> >>> # close the files >>> in_file.close() >>> output.close() >>> >>> The above ran with no erros, gave me the number of lines in my >>> orginal file but then when i opened the cleandata.txt file >>> I got: >>> >>> A.-C.䴀愀渀甀昀愀挀琀甀爀椀渀最 �Company.⠀匀攀攀�Sebastian,䄀⸀�A., >>> �and 䌀愀瀀攀猀Ⰰ�assignors.)�A.䜀⸀�A.刀愀椀氀眀愀礀 �Light☀�Signal䌀 >>> 漀⸀� (See䴀攀搀攀渀Ⰰ�Elof�Hassignor.)�A-N䌀漀洀瀀愀渀礀Ⰰ�The.⠀匀攀 >>> 攀 �Alexander愀渀搀�Nasb,愀猀ⴀ�猀椀最渀漀爀猀⸀㬀�䄀一�Company,吀栀攀 >>> ⸀� (See一愀猀栀Ⰰ�It.䨀⸀Ⰰ�and䄀氀攀砀愀渀搀攀爀Ⰰ�as-� >> Not sure what caused all of those characters. >> HTH, >> -Luke > >
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor