On 04/03/2019 11:44, Edward Kanja wrote: > Hi there , > Earlier i had sent an email on how to use re.sub function to eliminate > square brackets. I have simplified the statements. Attached txt file named > unon.Txt has the data im extracting from.
Thankyou, that's much better. Although the code is now short enough to just paste into the mail: ####################### import pandas as pd from pandas import DataFrame #creating my dataframes manually. import re #import textfile with open ('unon.txt') as csvfile: mydata=pd.read_csv('unon.txt') for line in csvfile: index_no=re.findall(r'(\|\s\d{5,8}\s)',line) names=re.findall(r'(\|[A-Za-z]\w*\s\w*\s\w*\s\w*\s)',line) #Address=re.findall(r'\|\s([A-Z0-9-,/]\w*\s\w*\s)',line) duty_station=re.findall(r'\|\s[A-Z]*\d{2}\-\w\w\w\w\w\w\w\s',line) print((index_no),(names),(duty_station)) ############ You don't need the pandas line since you don't do anything with mydata. Also your data is not really a csv file so I'm not sure how Pandas will cope with it. > codes I'm using to extract the data.The regular expression works fine but > my output has too many square brackets. How do i do away with them thanks. Can you show us the output? I can't see how you can have any square brackets since your data has none. I'm guessing you maybe mean the vertical bar symbols? I suspect the best way to remove them is to improve the regexes used for extraction. There are web sites that allow you to paste sample data and then create different regex and see the output, that may be a useful way forward. Here is one such: https://regex101.com/ Assuming that you have multiple input records like the one shown I'd suggest you put the code to read one record into a function and then call that repeatedly until the file is processed. You can also create a function to write one record at a time to a file(using whatever format you prefer(csv, json, yaml etc). I suspect csv may not be ideal for this data, but you may want to process it elsewhere that requires csv, I don;t know. The functions will probably want to use the file as an iterable since it needs to process multiple lines per record. Finally, the parens in the last print line are not needed, just use: print(index_no, names, duty_station) Although I assume this is just debug information so not really too important. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor