Sending to list. Please use reply-all when responding to the list. -------- Original Message -------- Subject: RE: [Tutor] Data pattern query. From: mhysnm1...@gmail.com Sent: Monday, 7 January 2019 11:58 To: 'Alan Gauld' <alan.ga...@yahoo.co.uk> CC:
Alan, Thanks for responding. I am storing the spreadsheet into a dictionary using a two dimension list. The key for the dictionary is the sheet name. Possibly in the future I might just have a simple list. 6 columns across with variable rows. Max is over 1000 rows. Below is the code: import openpyxl # module to import spreadsheet. # module found on the net to remove duplicates. from more_itertools import unique_everseen # Loding the workbook and names for the sheets. wb = openpyxl.load_workbook("test.xlsx") # Storing sheet names into a list for the loop. wss = wb.get_sheet_names() # defining variables for data structure transaction = {} # dictionary of bank transactions. description = [] # the descriptions of transactions for ws_name in wss: # sheet being worked upon. ws = wb.get_sheet_by_name(ws_name) data = []# temp variable to store 2 dimension list. for column in ws.columns: c = [] # temp variable for current column for cell in column: c.append(cell.value) # is this correct? # end for cell loop if c[0] == "Description": #Populating the description list description.extend(c) data.append(c) # creating two dimension list # end for for column # creating dictionary with 2 dimension list transaction[ws_name] = data #end for /*** the below code is not working ***/ If I sort the description list. I get untyped errors. When I use the debug. The items (elements) in the description list at the end are 'None" value. Not sure if this is due to the location of defining the empty list or the method of using append with the list c. If the below code could sort, then the remaining lines would perform the duplicate on the whole value of the item. While I want to identify duplicates with the vendor name only. description.sort() for result in unique_everseen (description): print (result) Some example data structure. I have modified for privacy reasons. PAY/SALARY FROM vendor-name EFTPOS vendor-name ####HORNSBY country-code bank-name BANKING FUNDS TFER TRANSFER ###### TO ###### vendor-name suburb state vendor-name #### suburb DEPOSIT vendor-name PAYMENT ######## state-name WITHDRAWAL BY EFTPOS #### vendor-name #### suburb 12/06 DEPOSIT-vendor-name Aust organisation_name ######### -----Original Message----- From: Tutor <tutor-bounces+mhysnm1964=gmail....@python.org> On Behalf Of Alan Gauld via Tutor Sent: Monday, 7 January 2019 8:20 PM To: tutor@python.org Subject: Re: [Tutor] Data pattern query. On 07/01/2019 02:38, mhysnm1...@gmail.com wrote: > All the descriptions of the transactions are in a single column. I am > trying to work out the easiest method of identifying the same pattern > of text in the fields. What does a singe column mean? That presumably is how it appears in the spreadsheet? But how is it stored in your Python code? A list? a list of lists? a dictionary? We don't know what your data looks like. Post a sample along with an explanation of how it is structured. In general when looking for patterns in text a regular expression is the tool of choice. But only if you know what the pattern looks like. Identifying patterns as you go is a much more difficult challenge > Then I am going to group these vendors by categories. And how do you categorize them? Is the category also in the data or is it some arbitrary thing that you have devised? > In the field, there is the vendor name, suburb/town, type of transaction, etc. etc is kind of vague! Show us some data and tel;l us which field is which. Without that its difficult to impossible to tell you how to extract anything! The important thing is not how it looked in the spreadsheet but how it looks now you have it in Python. > How can I teach the program to learn new vendor names? Usually you would use a set or dictionary and add new names as you find them. > I was thinking of removing all the duplicate entries Using a set would do that for you automatically > Was thinking of using dictionaries for this. > But not sure if this is the best approach. If you make the vendor name the key of a dictionary then it has the same effect as using a set. But whether a set or dict is best depends on what else you need to store. If its only the vendor names then a set is best. If you want to store associated data then a dict is better. You need to be much more specific about what your data looks like, how you identify the fields you want, and how you will categorize them. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor