Re: [Tutor] Understanding a linear runtime implementation of anagram detection

Oscar Benjamin Mon, 14 Dec 2015 04:12:38 -0800

On 10 December 2015 at 12:38, Spyros Charonis <s.charo...@gmail.com> wrote:
> Dear All,
>
> I am learning about analysis of algorithms (python 2.7.6). I am reading a
> book (Problem solving with Algorithms and Data Structures) where Python is
> the language used for implementations. The author introduces algorithm
> analysis in a clear and understandable way, and uses an anagram detection
> program as a template to compare different runtime implementations
> (quadratic, log linear, linear). In the linear, and most efficient
> implementation, the code is as follows (comments added by me):


This is a good example problem for demonstrating the idea of
computational complexity but having actually wanted to write a program
that used anagram detection recently I concluded that a quadratic
performance can be acceptable or maybe even preferable for this. Here
is my own anagram function:

def anagram(s1, s2):
    seen = [False] * len(s2)
    for c1 in s1:
        for n2, c2 in enumerate(s2):
            if c1 == c2 and not seen[n2]:
                seen[n2] = TrueI would do this by using a dict
                break
        else:
            return False
    else:
        return all(seen)

This has O(N**2) complexity with N the string lengths. However the
real average complexity depends on the distribution of strings. The
loops in the above only run to completion if the two strings are
anagrams. Most of the time if you need to check whether two strings
are anagrams the probability that they are is small. When two strings
are not anagrams then the chances are typically high that the first
character of s1 is not in s2 at all. What that means is that the most
common runtime behaviour is to take the first character c1 from s1
then loop once through s2 verifying that none of the characters in s2
is equal to c1. So in the most common case we have a single iteration
through each character of one of the strings, whereas the "linear"
complexity algorithm you showed will always loop through both strings
fully. There are other differences such as not assuming lower-case
ascii characters etc. but these are not so important to this point.

If course in practice in Python the above code is inefficient because
it is implemented by the interpreter. The following is O(N*log(N)) but
will probably out-compete any of the examples shown:

def anagram(s1, s2):
    return sorted(s1) == sorted(s2)

The sorted function in Python is very efficient so it will be hard to
beat the performance of that unless the strings really are long. This
is why anagrams are also a poor example of algorithmic performance: in
most contexts where we might practically want to check for anagrams
the strings involved are likely to be very short meaning that the
big-O type analysis is useless.


> My questions are:
...
>
> 2)
> How could I go about adapting this algorithm for multiple strings (say I
> had 20 strings and wanted to check if they are anagrams of one another).
>
>     def are_anagrams(*args):
>
>     """ Accepts a tuple of strings and checks if
>
>          they are anagrams of each other """
>
>
>      # Check that neither of strings are null
>
>      for i in args:
>
>      if not i:
>
>          raise TypeError, "Invalid input"
>
>          return None
>
>
>
>      # Initialize a list of counters for each string
>
>      c = ( [] for i in range(len(args) ) ???

The complexity problem here is that you want to avoid comparing all N
strings against each other. If you do then your comparing N*(N-1)/2
(quadratic) strings for anagram-ness. Peter suggested using tuples of
counts so that a dict can check them. I would use sorted string keys:

def anagrams(*words):
    bykey = {}
    for word in words:
        key = ''.join(sorted(word))
        if key not in bykey:
            bykey[key] = [word]
        else:
            bykey[key].append(word)
    return bykey.values()

--
Oscar
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Understanding a linear runtime implementation of anagram detection

Reply via email to