Hi,

I try to figure out why PIG is using so many zookeeper connections (from the frontend machine) when using HBaseStorage().

I added a trace in the constructor of HBaseStorage()

I wrote a simple script loading an HBase table:

sessions = LOAD 'hbase://mytable' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid meta:timestamp') AS (sid:chararray, start:long);
dump sessions;

When I run the script:

vbarat@lancelot:~$ pig -x local -f /Users/vbarat/ermin/pig/script/test.pig 2011-10-25 11:32:41,482 [main] INFO org.apache.pig.Main - Logging error messages to: /Users/vbarat/pig_1319535161481.log 2011-10-25 11:32:41,563 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:/// 2011-10-25 11:32:41,884 [main] INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE ********************* 2011-10-25 11:32:41,970 [main] INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE ********************* 2011-10-25 11:32:42,035 [main] INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE ********************* 2011-10-25 11:32:42,073 [main] INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE ********************* 2011-10-25 11:32:42,184 [main] INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE ********************* 2011-10-25 11:32:42,207 [main] INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE ********************* 2011-10-25 11:32:42,233 [main] INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE ********************* 2011-10-25 11:32:42,256 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2011-10-25 11:32:42,317 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId= 2011-10-25 11:32:42,374 [main] INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE ********************* 2011-10-25 11:32:42,391 [main] INFO org.apache.pig.backend.hadoop.hbase.HBaseStorage - *********** HBASESTORAGE ********************* 2011-10-25 11:32:42,425 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2011-10-25 11:32:42,449 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1

So HBaseStorage is create 10 times (and so the table is opened 9 times).

I'd like to konw why so many creation ?

Also, when I change my script to load 2 tables and join them, the HBaseStorage object is created 40 times !

Can someone give me some insight to help me investigating the issue ?

Thanks a lot

Reply via email to