Coming in late to the conversation:

On Sun, Nov 09, 2014 at 04:34:29PM -0800, Clayton Kirkwood wrote:
> I have the following code:
> 
> import urllib.request,re,string
> months = ['Jan.', 'Feb.', 'Mar.', 'Apr.', 'May.', 'Jun.', 'Jul.', 'Aug.',
> 'Sep.', 'Oct.', 'Nov.', 'Dec.']
> from urllib.request import urlopen
> for line in urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl'):
>     line = line.decode('utf-8')  # Decoding the binary data to text.
>     if 'EST' in line or 'EDT' in line:  # look for Eastern Time
>          blah = re.search(
>          r'<\w\w>(\w{3}\.)\s+(\d{2}),\s+(\d{2}).+([AP]M)\s+(E[SD]T)', line)
>          (month, day, time, ap, offset) = blah.group(1,2,3,4,5)
>          print(blah,'\n',ap,month, offset,day, time )


In programming, just like real life, it usually helps to try to isolate 
the fault to the smallest possible component. When confused by some 
programming feature, eliminate everything you can and focus only on that 
feature.

In this case, all that business about downloading a Perl script from the 
web, decoding it, iterating over it line by line, is completely 
irrelevent. You can recognise this by simplifying the code until either 
the problem goes away or you have isolated where the problem is.

In this case, I would simplify the regular expression to something much 
simpler, and apply it to a single known string:

text = 'xxxx1234 5678xxx'
regex = r'(\d*) (\d*)'  # Match <digits><space><digits>
mo = re.search(regex, text)  # "mo" = Match Object
a, b = mo.group(1, 2)
print(a, b)


Now we can focus on the part that is confusing you, namely the need to 
manually write out the group numbers. In this case, writing 1,2 is no 
big deal, but what if you had twenty groups?

a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t = mo.group(1, 2, 3, 4, 5, ...

When you find yourself doing something ridiculously error-prone like 
that, chances are there is a better way. And in this case, we have this:

a, b = mo.groups()

mo.groups() returns a tuple of all the groups. You can treat it like any 
other tuple:

mo.groups() + (1, 2, 3)
=> returns a new tuple with five items ('1234', '5678', 1, 2, 3)

mo.groups() gives you *all* the groups. What if you only wanted some of 
them? Well, it's a tuple, so once you have it, you can slice it the same 
as any other tuple:

mo.groups()[1:]  # skip the zeroth item, keep all the rest
mo.groups()[:-1]  # skip the last item, keep all the rest
mo.groups()[3::2]  # every second item starting from the third


Let's go back to the mo.group(1, 2) again, and suppose that there are 
more than two groups. Let's pretend that there are 5 groups. How can you 
do it using range?

mo.group(*range(1, 5+1))

In this case, the asterisk * behaves like a unary operator. Binary 
operators take two arguments:

10-6

Unary operators take one:

-6

The single * isn't a "real" operator, because it is only legal inside a 
function call. But it takes a single operand, which must be some sort of 
iterable, like range, lists, tuples, strings, even dicts.

With a small helper function, we can experiment with this:

def test(a, b, *args):
    print("First argument:", a)
    print("Second argument:", b)
    print("All the rest:", args)

And in use:

py> test(*[1, 2, 3, 4])
First argument: 1
Second argument: 2
All the rest: (3, 4)


What works with our test() function will work with mo.group() as well, 
and what works with a hard-coded list will work with range:

py> test(*range(1, 10))
First argument: 1
Second argument: 2
All the rest: (3, 4, 5, 6, 7, 8, 9)


There is no need to turn the range() object into a list first.

Iterator unpacking does require an iterable object. You can't iterate 
over integers:

py> for x in 10:
...     pass
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable

nor can you unpack them:

py> test(*10)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: test() argument after * must be a sequence, not int

Note that the error message is a bit too conservative, in fact 
any iterable is allowed as well as sequences.


> This works fine, but in the (month... line, I have blah.group(1,2,3,4,5),
> but this is problematic for me. I shouldn't have to use that 1,2,3,4,5
> sequence. I tried to use many alternatives using:  range(5) which doesn't
> work, list(range(5)) which actually lists the numbers in a list, and several
> others. As I read it, the search puts out a tuple. I was hoping to just
> assign the re.search to month, day, time, ap, offset directly. Why wouldn't
> that work? Why won't a range(5) work? I couldn't find a way to get the len
> of blah.

The length of a Match Object is meaningful. What do you mean by the 
length of it?

- the total number of groups in the regular expression?

- the number of groups which actually matched something?

- the total number of characters matched?

- something else?

The idea of the Match Object itself having a length is problematic. 


-- 
Steven
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to