On 04/03/2019 11:44, Edward Kanja wrote:
> Hi there ,
> Earlier i had sent an email on how to use re.sub function to eliminate
> square brackets. I have simplified the statements. Attached txt file named
> unon.Txt has the data im extracting from. 

Thankyou, that's much better. Although the code is now short enough to
just paste into the mail:

#######################
import pandas as pd
from pandas import DataFrame #creating my dataframes manually.
import re
#import textfile

with open ('unon.txt') as csvfile:
   mydata=pd.read_csv('unon.txt')
   for line in csvfile:
      index_no=re.findall(r'(\|\s\d{5,8}\s)',line)
      names=re.findall(r'(\|[A-Za-z]\w*\s\w*\s\w*\s\w*\s)',line)
      #Address=re.findall(r'\|\s([A-Z0-9-,/]\w*\s\w*\s)',line)
      duty_station=re.findall(r'\|\s[A-Z]*\d{2}\-\w\w\w\w\w\w\w\s',line)
      print((index_no),(names),(duty_station))
############

You don't need the pandas line since you don't do anything
with mydata.

Also your data is not really a csv file so I'm not sure how Pandas
will cope with it.

> codes I'm using to extract the data.The regular expression works fine but
> my output has too many square brackets. How do i do away with them thanks.

Can you show us the output? I can't see how you can have any square
brackets since your data has none. I'm guessing you maybe mean the
vertical bar symbols? I suspect the best way to remove them is to
improve the regexes used for extraction. There are web sites that allow
you to paste sample data and then create different regex and see the
output, that may be a useful way forward.

Here is one such:

https://regex101.com/

Assuming that you have multiple input records like the one shown I'd
suggest you put the code to read one record into a function and then
call that repeatedly until the file is processed. You can also create
a function to write one record at a time to a file(using whatever
format you prefer(csv, json, yaml etc). I suspect csv may not be
ideal for this data, but you may want to process it elsewhere
that requires csv, I don;t know. The functions will probably want
to use the file as an iterable since it needs to process multiple
lines per record.

Finally, the parens in the last print line are not needed,
just use:

print(index_no, names, duty_station)

Although I assume this is just debug information so not
really too important.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to