As part of my learning, I was participating in a discussion at:
https://discuss.codecademy.com/t/python-lists-of-duplicated-elements/78151 It’s about making a function that returns a list of lists, with each list being all of the elements that are the same as another element in the original list. function([1, 2, 3, 4, 1, 1, 2, 3]) should return [[1, 1, 1], [2, 2], [3, 3], [4]] I wrote and submitted the following function: def sublist_duplicates(lst): sublists = [] for item in set(lst): sublists.append([item] * lst.count(item)) return sublists lst_of_duplicates = [1, 2, 10, 3, 4, 1, 's', 2, 3, 1, 4, 's'] print sublist_duplicates(lst_of_duplicates) A participant told me that that'll grind into a near-halt if there are many different elements, for example range(1_000_000) (1 million unique values, so it checks the whole list (length 1 million) for the count, a million times, a total of 1000 billion operations… I thought he was right. I wrote another version of the function, this time using collections.Counter. from collections import Counter def sublist_duplicates(lst): counts = Counter(lst) sublists = [[k] * counts[k] for k in counts] return sublists _lst = list(range(1000000)) import timeit time = timeit.timeit("sublist_duplicates(_lst)", setup = "from __main__ import sublist_duplicates, _lst", number=1) print("Number of elements in the list: {}".format(len(_lst))) print("Execution time of the function: {} seconds".format(round(time, 2))) I found that the first version works better with a small list, but the second version outperforms the first one when it’s given a considerably big list of data. I’m asking for your comments on these two functions. I’m not even sure if this is how it should be coded. Sri _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor