I'm happy to look into improving the Regex serde performance, any tips
on where I should start looking?.
There are three things off the top of my head.
First up, the matcher needs to be reused within a single scan. You can
also check the groupCount exactly once for a given pattern.
The table also has a large Regex serde.
There are no stats fast paths for Regex SerDe.
The statistics computation is lifting each row into memory, parsing it and
throwing it away.
Most of your time would be spent in GC (check the GC time millis), due to
the huge expense of the Regex Serde.
Hi Gopal,
Thanks for that.
I'm happy to look into improving the Regex serde performance, any tips on
where I should start looking?.
Regards,
Roger
On 08/04/2015 11:44 AM, Gopal Vijayaraghavan gop...@apache.org wrote:
The table also has a large Regex serde.
There are no stats fast paths
Hi,
I have a hive table with 300 columns that are all strings with around 180k
rows, when I run analyze table compute statistics it seems to be taking
about 40 minutes to complete regardless of the execution engine.
The table also has a large Regex serde.
I an running hive 0.13.0.
Any help