On 13Jul2016 22:03, Mike Wright <nob...@nospam.hostisimo.com> wrote:
OK, thanks everybody.

Had to use egrep. This works:

PATTERN='https?://[^/]*\.in(/.*)*'
egrep $PATTERN file.of.links > links.in

You need quotes around $PATTERN when you use it, thus:

 egrep "$PATTERN" file.of.links > links.in

You may be getting away with it here, but another pattern may well be broken up by the shell on whitespace. Not to mention globbing (unquoted askerisks and question marks, etc).

Covers cases with https and where nothing follows the .in

Your:

 (/.*)*

is better written:

 (/.*)?

i.e. it is there or it is not. As it happens the "*" form you used will be matched as efficiently in this case, but there are plenty of patterns where using "*" instead of something more constrained can lead to exponential cost as the regexp engine tries many many more combinations as it attempts to match. Always write these things as pickily/conservatively as possible.

The other nit is that you should use $lowercase variable names in the shell instead of $UPPERCASE names for script local variables which you do not intend to export. This is a good practice thing, but quite important for reasons I can explain at length is requested.

Cheers,
Cameron Simpson <c...@zip.com.au>
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://lists.fedoraproject.org/admin/lists/users@lists.fedoraproject.org
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org

Reply via email to