Hi,
I try to figure out why PIG is using so many zookeeper connections
(from the frontend machine) when using HBaseStorage().
I added a trace in the constructor of HBaseStorage()
I wrote a simple script loading an HBase table:
sessions = LOAD 'hbase://mytable' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
meta:timestamp') AS (sid:chararray, start:long);
dump sessions;
When I run the script:
vbarat@lancelot:~$ pig -x local -f
/Users/vbarat/ermin/pig/script/test.pig
2011-10-25 11:32:41,482 [main] INFO org.apache.pig.Main - Logging
error messages to: /Users/vbarat/pig_1319535161481.log
2011-10-25 11:32:41,563 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to hadoop file system at: file:///
2011-10-25 11:32:41,884 [main] INFO
org.apache.pig.backend.hadoop.hbase.HBaseStorage - ***********
HBASESTORAGE *********************
2011-10-25 11:32:41,970 [main] INFO
org.apache.pig.backend.hadoop.hbase.HBaseStorage - ***********
HBASESTORAGE *********************
2011-10-25 11:32:42,035 [main] INFO
org.apache.pig.backend.hadoop.hbase.HBaseStorage - ***********
HBASESTORAGE *********************
2011-10-25 11:32:42,073 [main] INFO
org.apache.pig.backend.hadoop.hbase.HBaseStorage - ***********
HBASESTORAGE *********************
2011-10-25 11:32:42,184 [main] INFO
org.apache.pig.backend.hadoop.hbase.HBaseStorage - ***********
HBASESTORAGE *********************
2011-10-25 11:32:42,207 [main] INFO
org.apache.pig.backend.hadoop.hbase.HBaseStorage - ***********
HBASESTORAGE *********************
2011-10-25 11:32:42,233 [main] INFO
org.apache.pig.backend.hadoop.hbase.HBaseStorage - ***********
HBASESTORAGE *********************
2011-10-25 11:32:42,256 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: UNKNOWN
2011-10-25 11:32:42,317 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics
with processName=JobTracker, sessionId=
2011-10-25 11:32:42,374 [main] INFO
org.apache.pig.backend.hadoop.hbase.HBaseStorage - ***********
HBASESTORAGE *********************
2011-10-25 11:32:42,391 [main] INFO
org.apache.pig.backend.hadoop.hbase.HBaseStorage - ***********
HBASESTORAGE *********************
2011-10-25 11:32:42,425 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
- File concatenation threshold: 100 optimistic? false
2011-10-25 11:32:42,449 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
So HBaseStorage is create 10 times (and so the table is opened 9 times).
I'd like to konw why so many creation ?
Also, when I change my script to load 2 tables and join them, the
HBaseStorage object is created 40 times !
Can someone give me some insight to help me investigating the issue ?
Thanks a lot