Re: [Rd] match function causing bad performance when using tablefunction on factors with multibyte characters on Windows

2011-01-26 Thread Karl Ove Hufthammer
Karl Ove Hufthammer wrote: > Anyway, do you think it’s worth trying to change the ‘table’ function the > way I outlined in my first post¹? This should eliminate the performance > hit on all platforms. Some additional notes: ‘table’ uses ‘factor’ directly, but also indirectly, in ‘addNA’. The def

Re: [Rd] match function causing bad performance when using tablefunction on factors with multibyte characters on Windows

2011-01-26 Thread Karl Ove Hufthammer
Simon Urbanek wrote: >> I could *not* reproduce it; that is, ‘table’ is as fast on the non-ASCII >> factor as it is on the ASCII factor. > > Strange - are you sure you get the right locale names? Make sure it's > listed in locale -a. Yes, I managed to reproduce it now, using a locale listed in ‘

Re: [Rd] match function causing bad performance when using tablefunction on factors with multibyte characters on Windows

2011-01-25 Thread Matthew Dowle
Thanks Simon! I can reproduce this on Linux now, too. locale -a didn't show en_US.iso88591 for me so I needed 'sudo locale-gen en_US' first. Then running R with $ LANG="en_US.ISO-8859-1" R is enough to reproduce the problem. Karl - can you use tabulate instead as Simon suggests? Matthew -- V

Re: [Rd] match function causing bad performance when using tablefunction on factors with multibyte characters on Windows

2011-01-25 Thread Simon Urbanek
On Jan 25, 2011, at 5:49 AM, Karl Ove Hufthammer wrote: > Matthew Dowle wrote: > >> I'm not sure, but note the difference in locale between >> Linux (UTF-8) and Windows (non UTF-8). As far as I >> understand it R much prefers UTF-8, which Windows doesn't >> natively support. Otherwise you could

Re: [Rd] match function causing bad performance when using tablefunction on factors with multibyte characters on Windows

2011-01-25 Thread Karl Ove Hufthammer
Matthew Dowle wrote: > I'm not sure, but note the difference in locale between > Linux (UTF-8) and Windows (non UTF-8). As far as I > understand it R much prefers UTF-8, which Windows doesn't > natively support. Otherwise you could just change your > Windows locale to a UTF-8 locale to make R happ

Re: [Rd] match function causing bad performance when using tablefunction on factors with multibyte characters on Windows

2011-01-24 Thread Matthew Dowle
I'm not sure, but note the difference in locale between Linux (UTF-8) and Windows (non UTF-8). As far as I understand it R much prefers UTF-8, which Windows doesn't natively support. Otherwise you could just change your Windows locale to a UTF-8 locale to make R happier. My stab in the dark would