On 31 Jan 2005, [EMAIL PROTECTED] wrote:

> I've got an issue that's been driving me a bit nuts.  I'm sure it _can_ 
> be done with a regexp, although I'm missing a piece needed to tie it 
> together to work for all cases.
>
> I need to parse out a list of RPMs in this case, but it seems the RPM 
> naming convention has changed, as there are files I'll need to parse 
> that are NOT in the normal name-version-release.arch.rpm format.
>
> I need to be able to grab the 'basename' for each file, as well as the 
> version and arch, although these can be done seperately.  The problem 
> can be shown by the following list of filenames:
>
> XFree86-ISO8859-15-75dpi-fonts-4.3.0-78.EL.i386.rpm           (Note the EL 
> embedded in name)
> xfig-3.2.3d-12.i386.rpm               (standard naming)
> rhel-ig-ppc-multi-zh_tw-3-4.noarch.rpm
> perl-DateManip-5.42a-0.rhel3.noarch.rpm
> openoffice.org-style-gnome-1.1.0-16.9.EL.i386.rpm

Perhaps try the following regexp.

import sre
reg = sre.compile(
    r'''
    (?P<name>
    ^[-\w]+     # name is the match in our string which can consist of
                # nearly everything
     (\.\D[-\w]+?)?) # if it contains a point it is followed by a non-digit char
                     # we search till we find
    -           # a hyphen
    (?P<version>
    \d+[-.]\d+  # version always starts with one or more digits a hyphen or a 
point
                # and one or more digits
    .*)         # we grab everything else till        
    \.          # we find a point
    (?P<arch>
    .+?)        # arch is the shortest everything between .rpm and the point
    \.rpm$'''
    , sre.X)

In an interactive session I get (with names being a list with the names
of the rpms):

,----
| >>> for name in names:
|         m = reg.search(name)
|         print m.groupdict()
| ... ... ... 
| {'version': '4.3.0-78.EL', 'arch': 'i386', 'name': 
'XFree86-ISO8859-15-75dpi-fonts'}
| {'version': '3.2.3d-12', 'arch': 'i386', 'name': 'xfig'}
| {'version': '3-4', 'arch': 'noarch', 'name': 'rhel-ig-ppc-multi-zh_tw'}
| {'version': '5.42a-0.rhel3', 'arch': 'noarch', 'name': 'perl-DateManip'}
| {'version': '1.1.0-16.9.EL', 'arch': 'i386', 'name': 
'openoffice.org-style-gnome'}
`----

 I'm not sure about the version of perl-DateManip; you didn't include
 the trailing 'rhel3'.  Did you forget it or should that be trimmed?
 But it shouldn't be too complicated to augment the above regexp in that
 way if needed.



   Karl
-- 
Please do *not* send copies of replies to me.
I read the list
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to