Re: Analyze table compute statistics on wide table taking too long

2015-04-08 Thread Gopal Vijayaraghavan
I'm happy to look into improving the Regex serde performance, any tips on where I should start looking?. There are three things off the top of my head. First up, the matcher needs to be reused within a single scan. You can also check the groupCount exactly once for a given pattern.

Re: Analyze table compute statistics on wide table taking too long

2015-04-07 Thread Gopal Vijayaraghavan
The table also has a large Regex serde. There are no stats fast paths for Regex SerDe. The statistics computation is lifting each row into memory, parsing it and throwing it away. Most of your time would be spent in GC (check the GC time millis), due to the huge expense of the Regex Serde.

Re: Analyze table compute statistics on wide table taking too long

2015-04-07 Thread Roger Marin
Hi Gopal, Thanks for that. I'm happy to look into improving the Regex serde performance, any tips on where I should start looking?. Regards, Roger On 08/04/2015 11:44 AM, Gopal Vijayaraghavan gop...@apache.org wrote: The table also has a large Regex serde. There are no stats fast paths

Analyze table compute statistics on wide table taking too long

2015-04-07 Thread Roger Marin
Hi, I have a hive table with 300 columns that are all strings with around 180k rows, when I run analyze table compute statistics it seems to be taking about 40 minutes to complete regardless of the execution engine. The table also has a large Regex serde. I an running hive 0.13.0. Any help