Re: [Tutor] deleting elements out of a list.

Cameron Simpson Sat, 15 Jun 2019 00:58:15 -0700

On 15Jun2019 14:51, Sean Murphy <mhysnm1...@gmail.com> wrote:

I am not sure how to tackle this issue. I am using Windows 10 andPython 3.6 from Activestate.


I have a list of x number of elements. Some of the elements are have similar
words in them. For example:

Dog food Pal
Dog Food Pal qx1323
Cat food kitty
Absolute cleaning inv123
Absolute Domestic cleaning inv 222
Absolute d 3333
Fitness first 02/19
Fitness first

I'm going to assume that you have a list of strings, each being a linefrom a file.

I wish to remove duplicates. I could use the collection.Count method. This
fails due to the strings are not unique, only some of the words are.

You need to define this more tightly. Suppose the above were your input.What would it look like after "removing duplicates"? By providing anexplicit example of what you expect afterwards it is easier for us tounderstand you, and will also help you with your implementation.

Do you intend to discard the second occurence of every word, turningline 2 above into "qx1323"? Or to remove similar lines, for somedefinition of "similar",

which might discard line 2 above?

Your code examples below seem to suggest that your want to discard wordsyou've already seen.

My
thinking and is only rough sudo code as I am not sure how to do this and


Aside: "pseudo", not "sudo".

wish to learn and not sure how to do without causing gtraceback errors. I
want to delete the match pattern from the list of strings. Below is my
attempt and I hope this makes sense.

description = load_files() # returns a list
for text in description:
   words = text.split()
   for i in enumerate(words):


enumerate() yields a sequence of (i, v), so you need i, v in the loop:

 for i, word in enumerate(words):

Or you need the loop variable to be a tuple and to pull out theenumeration counter and the associated value inside the loop:


 for x in enumerate(words):
   i, word = x

       Word = ' '.join(words[:i])

Variable names in Python are case sensitive. You want "word", not"Word".

However, if you really want each word of the line you've got that fromtext.split(). The expression "words[:i]" means the letters of word fromindex 0 through to i-1. For example, "kitt" if "i" were 4.

The join string operation joins an iterable of strings. Unfortunatelyfor you, a string is itself iterable: you get each character, but as astring (Python does not have a distinct "character" type, it just hassingle character strings). So if "word" were "kitt" above, you get:


 "k i t t"

from the join. Likely not what you want.

What _do_ you want?

       print (word)
       answer = input('Keep word?')
       if answer == 'n':
           continue
       for i, v in enumerate(description):
           if word in description[i]:
               description.pop[i]

There are some problems here. The big one is that you're modifying alist while you're iterating over it. This is always hazardous - itusually leading to accidentally skipping elements. Or not, depending howthe iteration happens.

It is generally safer to iterate over the list and construct a distinctnew line to replace it, without modifying the original list. This waythe enumerate cannot get confused. So instead of discarding from thelist, you conditionally add to the new list:


 new_description = []
 for i, word in enumerate(description):
   if word not in description[i]:
     new_description.append(word)

Note the "not" above. We invert the condition ("not in" instead of "in")because we're inverting the action (appending something instead ofdiscarding it).

However, I think you have some fundamental confusion about what youriterating over.

I recommend that you adopt better variable names, and more formallydescribe your data.

If "description" is actualy a list of descriptions then give it a pluralname like "descriptions". When you iterate over it, you can then use thesingular form for each element i.e. "description" instead of "text".


Instead of writing loops like:

 for i, v in enumerate(descriptions):

give "v" a better name, like "description". That way your code insidethe loop is better described, and mistakes more obvious because the codewill suddenly read badly in some way.

The initial issues I see with the above is the popping of an elementfrom
description list will cause a error.

It often won't. Instead if will mangle your iteration because after thepop the index "i" no longer refers to what you expect, it now points oneword further along.

Towards the _end_ of the loop you'll get an error, but only once "i"starts to exceed the length of the list (because you've been shorteningit).

If I copy the description list into a
new list. And use the new list for the outer loop. I will receive multiple
occurrences of the same text. This could be addressed by a if test. But I am
wondering if there is a better method.

The common idom is to leave the original unchanged and copy into a newlist as in my example above. But taking a copy and iterating over thatis also reasonable.

You will still have issues with the popping, because the index "i" willno longer be aligned with the modified list.

If you really want to modify in place, avoid enumerate. Instead, make"i" an index into the list as you do, but maintain it yourself. Loopfrom left to right in the list until you come off the end:


 i = 0
 while i < len(description):
   if ... we want to pop the element ...:
     description.pop(i)
   else:
     i = i + 1

Here we _either_ discard from the list and _do not_ advance "i", or weadvance "i". Either way "i" then points at the next word, in the formercase because the next word has shuffled down once position and in thelatter because "i" has moved forwards. Either way "i" gets closer to theend of the list. We leave the loop when "i" gets past the end.

2nd code example:

description = load_files() # returns a list
search_txt = description.copy() # I have not verify if this is the right
syntax for the copy method.]


A quick way is:

 search_text = description[:]

but lists have a .copy method which does the same thing.

for text in search_txt:
   words = text.split()
   for i in enumerate(words):
       Word = ' '.join(words[:i])
       print (word)
       answer = input('Keep word (ynq)?')
       if answer == 'n':
           continue
       elif answer = 'q':
           break
       for i, v in enumerate(description):
           if word in description[i]:
               description.pop[i]

The inner for loop still has all the same issues as before. The outerloop is now more robust because you've iterating over the copy.


Cheers,
Cameron Simpson <c...@cskk.id.au>
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] deleting elements out of a list.

Reply via email to