Re: Question about pig & HDFS

2011-08-26 Thread Daniel Dai
Pig by default use plain text file as input/output, unless you write a custom LoadFunc/StoreFunc. There is no specific Pig storage format. You can copy the file to local using copyToLocal. If you want to export directly to SQL table, you need to write a StoreFunc. Pig work on tuple rather than K,V

Re: PIG 0.8.1 leaks Zookeeper connections when using HBaseStorage

2011-08-26 Thread Ashutosh Chauhan
Hey Vincent, Will it be easy for you to isolate this in a test code. That will help to debug the issue and also fixing it. Ashutosh On Fri, Aug 26, 2011 at 05:30, Vincent Barat wrote: > Hi, > > I run PIG jobs from a Java process (using PigServer). Most of which use > HBaseStorage to load data fr

Re: PIG 0.8.1 leaks Zookeeper connections when using HBaseStorage

2011-08-26 Thread Bill Graham
> > Should I report this a an issue ? > Yes, please. I've found other resource leaks when using PigServer this way, so this seems like a likely bug. Also, seeing that HTables are never closed by HBaseStorage is not a good sign. On Fri, Aug 26, 2011 at 5:30 AM, Vincent Barat wrote: > Hi, > > I r

Re: Ramdom behavior of PIG ???

2011-08-26 Thread Ashutosh Chauhan
Thanks Vincent for confirming that issue is resolved. Ashutosh On Fri, Aug 26, 2011 at 07:54, Vincent Barat wrote: > FYI, this was fixed by PIG-2193. > > Le 26/07/11 19:40, Vincent Barat a écrit : > > Hi, >> >> I'm using PIG 0.8.1 with HBase 0.90 and the following script sometime >> returns an e

Re: Removing local job traces in local mode

2011-08-26 Thread Ashutosh Chauhan
Vincent, Glad that you were able to solve the issue. Ideally, one should be able to configure log4j externally through log4j.properties config file and not by setting them explicitly in code. Did you try that? Ashutosh On Fri, Aug 26, 2011 at 07:56, Vincent Barat wrote: > Here is how I solved th

Re: Removing local job traces in local mode

2011-08-26 Thread Vincent Barat
Here is how I solved this issue (it was only related to log4j configuration): /* Deactivate most traces from Hadoop and PIG (keep ERROR) */ props.setProperty("log4j.logger.org.apache.hadoop", "ERROR"); props.setProperty("log4j.logger.org.apache.zookeeper", "ERROR"); props

Re: Ramdom behavior of PIG ???

2011-08-26 Thread Vincent Barat
FYI, this was fixed by PIG-2193. Le 26/07/11 19:40, Vincent Barat a écrit : Hi, I'm using PIG 0.8.1 with HBase 0.90 and the following script sometime returns an empty set, and sometimes work ! start_sessions = LOAD 'startSession' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:

PIG 0.8.1 leaks Zookeeper connections when using HBaseStorage

2011-08-26 Thread Vincent Barat
Hi, I run PIG jobs from a Java process (using PigServer). Most of which use HBaseStorage to load data from HBase. Each job is run using a new PigServer object, and I correctly call pigServer.shutdown() when my pig server is no longer used. Nevertheless, after a few hours of run, I notice that

Re: Question about request optimization

2011-08-26 Thread Vincent Barat
Le 23/08/11 20:28, Dmitriy Ryaboy a écrit : We should add merge join support to HBaseStorage, it should be able to do that for joins on the table key. It would be great ! Are your locids skewed? Have you tried using 'skewed' join for the last job? Actually, if locations are small, you can ev