Re: Curious to see alternate approach on a search/replace via regex

2013-02-15 Thread Serhiy Storchaka
On 08.02.13 03:08, Ian Kelly wrote: I think what we're seeing here is that the time needed to look up the compiled regular expression in the cache is a significant fraction of the time needed to actually execute it. There is a bug issue for this. See http://bugs.python.org/issue16389 . -- http

Re: Curious to see alternate approach on a search/replace via regex

2013-02-08 Thread Ian Kelly
On Fri, Feb 8, 2013 at 4:43 AM, Steven D'Aprano wrote: > Ian Kelly wrote: > Surely that depends on the size of the pattern, and the size of the data > being worked on. Natually. > Compiling the pattern "s[ai]t" doesn't take that much work, it's only six > characters and very simple. Applying it

Re: Curious to see alternate approach on a search/replace via regex

2013-02-08 Thread Steven D'Aprano
Ian Kelly wrote: > On Thu, Feb 7, 2013 at 10:57 PM, rh wrote: >> On Thu, 7 Feb 2013 18:08:00 -0700 >> Ian Kelly wrote: >> >>> Which is approximately 30 times slower, so clearly the regular >>> expression *is* being cached. I think what we're seeing here is that >>> the time needed to look up th

Re: Curious to see alternate approach on a search/replace via regex

2013-02-08 Thread Peter Otten
Serhiy Storchaka wrote: > On 07.02.13 11:49, Peter Otten wrote: >> ILLEGAL = "-:./?&=" >> try: >> TRANS = string.maketrans(ILLEGAL, "_" * len(ILLEGAL)) >> except AttributeError: >> # python 3 >> TRANS = dict.fromkeys(map(ord, ILLEGAL), "_") > > str.maketrans() D'oh. ILLEGAL = "-:

Re: Curious to see alternate approach on a search/replace via regex

2013-02-08 Thread Nick Mellor
Hi RH, It's essential to know about regex, of course, but often there's a better, easier-to-read way to do things in Python. One of Python's aims is clarity and ease of reading. Regex is complex, potentially inefficient and hard to read (as well as being the only reasonable way to do things so

Re: Curious to see alternate approach on a search/replace via regex

2013-02-08 Thread Ian Kelly
On Thu, Feb 7, 2013 at 10:57 PM, rh wrote: > On Thu, 7 Feb 2013 18:08:00 -0700 > Ian Kelly wrote: > >> Which is approximately 30 times slower, so clearly the regular >> expression *is* being cached. I think what we're seeing here is that >> the time needed to look up the compiled regular express

Re: Curious to see alternate approach on a search/replace via regex

2013-02-07 Thread Dave Angel
On 02/07/2013 06:13 PM, rh wrote: On Fri, 08 Feb 2013 09:45:41 +1100 Steven D'Aprano wrote: But since you don't demonstrate any actual working code, you could be correct, or you could be timing it wrong. Without seeing your timing code, my guess is that you are doing it wrong. Timing code is

Re: Curious to see alternate approach on a search/replace via regex

2013-02-07 Thread Steven D'Aprano
Ian Kelly wrote: > On Thu, Feb 7, 2013 at 4:59 PM, Steven D'Aprano > wrote: >> Oh, one last thing... pulling out "re.compile" outside of the function >> does absolutely nothing. You don't even compile anything. It basically >> looks up that a compile function exists in the re module, and that's a

Re: Curious to see alternate approach on a search/replace via regex

2013-02-07 Thread Ian Kelly
On Thu, Feb 7, 2013 at 5:55 PM, Ian Kelly wrote: > Whatever caching is being done by re.compile, that's still a 24% > savings by moving the compile calls into the setup. On the other hand, if you add an re.purge() call to the start of t1 to clear the cache: >>> t3 = Timer(""" ... re.purge() ...

Re: Curious to see alternate approach on a search/replace via regex

2013-02-07 Thread Steven D'Aprano
rh wrote: > On Fri, 08 Feb 2013 09:45:41 +1100 > Steven D'Aprano wrote: > >> rh wrote: >> >> > I am using 2.7.3 and I put the re.compile outside the function and >> > it performed faster than urlparse. I don't print out the data. >> >> I find that hard to believe. re.compile caches its results

Re: Curious to see alternate approach on a search/replace via regex

2013-02-07 Thread Steven D'Aprano
rh wrote: > I am using 2.7.3 and I put the re.compile outside the function and it > performed faster than urlparse. I don't print out the data. I find that hard to believe. re.compile caches its results, so except for the very first time it is called, it is very fast -- basically a function call

Re: Curious to see alternate approach on a search/replace via regex

2013-02-07 Thread Serhiy Storchaka
On 07.02.13 11:49, Peter Otten wrote: ILLEGAL = "-:./?&=" try: TRANS = string.maketrans(ILLEGAL, "_" * len(ILLEGAL)) except AttributeError: # python 3 TRANS = dict.fromkeys(map(ord, ILLEGAL), "_") str.maketrans() -- http://mail.python.org/mailman/listinfo/python-list

Re: Curious to see alternate approach on a search/replace via regex

2013-02-07 Thread Demian Brecht
On 2013-02-06 7:04 PM, "Steven D'Aprano" wrote: >I dispute those results. I think you are mostly measuring the time to >print the result, and I/O is quite slow. Good call, hadn't even considered that. >My tests show that using urlparse >is 33% faster than using regexes, and far more understanda

Re: Curious to see alternate approach on a search/replace via regex

2013-02-07 Thread Nick Mellor
Hi RH, translate methods might be faster (and a little easier to read) for your use case. Just precompute and re-use the translation table punct_flatten. Note that the translate method has changed somewhat for Python 3 due to the separation of text from bytes. The is a Python 3 version. from u

Re: Curious to see alternate approach on a search/replace via regex

2013-02-07 Thread Chris Angelico
On Thu, Feb 7, 2013 at 10:08 PM, jmfauth wrote: > The future is bright for ... ascii users. > > jmf So you're admitting to being not very bright? *ducks* Seriously jmf, please don't hijack threads just to whine about contrived issues of Unicode performance yet again. That horse is dead. Go fork

Re: Curious to see alternate approach on a search/replace via regex

2013-02-07 Thread jmfauth
On 7 fév, 04:04, Steven D'Aprano wrote: > On Wed, 06 Feb 2013 13:55:58 -0800, Demian Brecht wrote: > > Well, an alternative /could/ be: > > ... > py> s = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1' > py> assert u2f(s) == mangle(s) > py> > py> from timeit import Timer > py> setup = 'f

Re: Curious to see alternate approach on a search/replace via regex

2013-02-07 Thread Peter Otten
rh wrote: > I am curious to know if others would have done this differently. And if so > how so? > > This converts a url to a more easily managed filename, stripping the > http protocol off. > > This: > > http://alongnameofasite1234567.com/q?sports=run&a=1&b=1 > > becomes this: > > alongname

Re: Curious to see alternate approach on a search/replace via regex

2013-02-06 Thread MRAB
On 2013-02-06 21:41, rh wrote: I am curious to know if others would have done this differently. And if so how so? This converts a url to a more easily managed filename, stripping the http protocol off. This: http://alongnameofasite1234567.com/q?sports=run&a=1&b=1 becomes this: alongnameofasi

Re: Curious to see alternate approach on a search/replace via regex

2013-02-06 Thread Demian Brecht
python -m cProfile [script_name].py http://docs.python.org/2/library/profile.html#module-cProfile Demian Brecht http://demianbrecht.github.com On 2013-02-06 2:30 PM, "richard_hubbe11" wrote: >I see that urlparse uses split and not re at all and, in my tests, >urlparse >completes in less ti

Re: Curious to see alternate approach on a search/replace via regex

2013-02-06 Thread Demian Brecht
Well, an alternative /could/ be: from urlparse import urlparse parts = urlparse('http://alongnameofasite1234567.com/q?sports=run&a=1&b=1') print '%s%s_%s' % (parts.netloc.replace('.', '_'), parts.path.replace('/', '_'), parts.query.replace('&', '_').replace('=', '_') ) Although wit

Re: Curious to see alternate approach on a search/replace via regex

2013-02-06 Thread Roy Smith
In article , rh wrote: > I am curious to know if others would have done this differently. And if so > how so? > > This converts a url to a more easily managed filename, stripping the > http protocol off. I would have used the urlparse module. http://docs.python.org/2/library/urlparse.html --