On 3/21/19 11:54 PM, Edward Kanja wrote: > Greetings, > I'm referring to my question i sent earlier, kindly if you have a hint on > how i can solve > my problem i will really appreciate. After running regular expressions > using python > my output has lot of square brackets i.e. [][][][][][][][][]. How do i > substitute this with empty > string so as to have a clear output which i will latter export to an excel > file. > Thanks a lot.
I think you got the key part of the answer already: you're getting empty lists as matches, which when printed, look like []. Let's try to be more explicit: $ python3 Python 3.7.2 (default, Jan 16 2019, 19:49:22) [GCC 8.2.1 20181215 (Red Hat 8.2.1-6)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import re >>> help(re.findall) Help on function findall in module re: findall(pattern, string, flags=0) Return a list of all non-overlapping matches in the string. If one or more capturing groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result. re.findall *always* returns a list, even if there is no match. If we add more debug prints in your code so it looks like this: import re with open ('unon.txt') as csvfile: for line in csvfile: print("line=", line) index_no=re.findall(r'(\|\s\d{5,8}\s)',line) print("index_no (type %s)" % type(index_no), index_no) names=re.findall(r'(\|[A-Za-z]\w*\s\w*\s\w*\s\w*\s)',line) print("names (type %s)" % type(names), names) #Address=re.findall(r'\|\s([A-Z0-9-,/]\w*\s\w*\s)',line) duty_station=re.findall(r'\|\s[A-Z]*\d{2}\-\w\w\w\w\w\w\w\s',line) print("duty_station (type %s)" % type(duty_station), duty_station) You can easily see what happens as your data is processed - I ran this on your data file and the first few times through looks like this: line= ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- index_no (type <class 'list'>) [] names (type <class 'list'>) [] duty_station (type <class 'list'>) [] line= |Rawzeea NLKPP | VE11-Nairobi | 20002254-MADIZ | 00 | 00 |Regular Scheme B | 15-JAN-2019 To 31-DEC-2019 | No | index_no (type <class 'list'>) [] names (type <class 'list'>) ['|Rawzeea NLKPP '] duty_station (type <class 'list'>) ['| VE11-Nairobi '] line= |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| index_no (type <class 'list'>) [] names (type <class 'list'>) [] duty_station (type <class 'list'>) [] You see each result of re.findall has given you a list, and most are empty. The first and third lines are separators, containing no useful data, and you get no matches at all. The second line provided you with a match for "names" and for "duty_station", but not for "index_no". Your code will need to be prepared for those sorts of outcomes. Just looking at the data, it's table data, presumably from a spreadsheet, but does not really present in a format that is easy to process, because individual lines are not complete. A separator line with all dashes seems to be the marker between complete entries, which then take up 14 lines, including additional marker lines which follow slightly different patterns - they may contain | marks or leading spaces. You will need to decide how regular your table data is and how to work with it, most examples of handling table data assume that one row is a complete entry, so you probably won't find a lot of information on this. In your case I'm looking at line 2 containing 8 fields, line 4 containing 9 fields, line 6 10 fields, and then lines 8-14 being relatively free-form consisting of multiple lines. Is there any chance you can generate your data file in a different way to make it easier to process? _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor