Re: Could not estimate number of reducers

2014-03-25 Thread Vincent Barat
I hithttps://issues.apache.org/jira/browse/PIG-3512 Le 24/03/2014 14:40, Vincent Barat a écrit : Hi, Since I moved from Pig 0.10.0 to 0.11.0 or 0.12.0, the estimation of the number of reducers no longer work. My script: A = load 'data'; B = group A by $0; store B into 'out'; My data

Could not estimate number of reducers

2014-03-24 Thread Vincent Barat
Hi, Since I moved from Pig 0.10.0 to 0.11.0 or 0.12.0, the estimation of the number of reducers no longer work. My script: A = load 'data'; B = group A by $0; store B into 'out'; My data: grunt ls hdfs://computation-master.dev.ubithere.com:9000/user/root/.staging dir

Re: How to read a file generated by Pig+BinStorage using the HDFS API ?

2014-01-10 Thread Vincent Barat
On Thu, Dec 26, 2013 at 3:35 AM, Vincent Barat vincent.ba...@gmail.comwrote: Hi all and merry Christmas ! I generate a file using a Pig script embedded in a Java process and store it using a BinStorage. Then, I would like to read this file directly from another Java client, but without starting

Re: How to read a file generated by Pig+BinStorage using the HDFS API ?

2014-01-01 Thread Vincent Barat
storage is associated with Input/outputFormat as well as RecordReader/Writer. As for BinStorage, you can take a look at BinStorageRecordReader- https://github.com/apache/pig/blob/trunk/src/org/apache/pig/impl/io/BinStorageRecordReader.java#L40 On Thu, Dec 26, 2013 at 3:35 AM, Vincent Barat vincent.ba

How to read a file generated by Pig+BinStorage using the HDFS API ?

2013-12-26 Thread Vincent Barat
Hi all and merry Christmas ! I generate a file using a Pig script embedded in a Java process and store it using a BinStorage. Then, I would like to read this file directly from another Java client, but without starting a Pig script (i.e only by using Hadoop API and Pig's BinStorage class).

Re: Nb of reduce tasks when GROUPing

2013-05-22 Thread Vincent Barat
and pig.exec.reducers.max might be useful: https://issues.apache.org/jira/browse/PIG-1249 Norbert On Tue, May 21, 2013 at 9:23 AM, Vincent Barat vincent.ba...@gmail.comwrote: Thanks for your reply. My goal is actually to AVOID using PARALLEL toi let PIG guess a good number of reducer by itself

Re: Nb of reduce tasks when GROUPing

2013-05-21 Thread Vincent Barat
, Vincent Barat vincent.ba...@gmail.com wrote: Hi, I use this request to remove duplicated entries from a set of input files (I cannot use DISTINCT since some fields can be different) grp = GROUP alias BY key; alias = FOREACH grp { record = LIMIT alias 1; GENERATE FLATTEN(record

Is there a way to limit the number of maps produced by HBaseStorage ?

2013-01-21 Thread Vincent Barat
Hi, We are using HBaseStorage intensively to load data from tables having more than 100 regions. HBaseStorage generates 1 map par region, and our cluster having 50 map slots, it happens that our PIG scripts start 50 maps reading concurrently data from HBase. The problem is that our HBase

pig.import.search.path not working in 0.10.0

2012-09-10 Thread Vincent Barat
-- Vincent Barat CTO Contact info vba...@capptain.com www.capptain.com Cell: +33 6 15

pig.import.search.path not working in 0.10.0 ?

2012-09-05 Thread Vincent Barat
Hi, I've a very simple script that try to import a PIG file: set pig.import.search.path '/tmp' import 'event.pig'; Even if the file /tmp/event.pig exists, it cannot be found. It seems that the function getImportScriptAsReader that deals with the pig.import.search.path property is not even

Re: Loading data from a SQL database?

2012-08-10 Thread Vincent Barat
Really no advise on that from anybody ? Le 08/08/12 18:33, Vincent Barat a écrit : Hi, I'm seeking for a way to load data from a SQL database. It seems that the SQLLoader (yet mentioned in the Wiki) no longer exists. Sqoop seems to be a good solution, but it is not so convenient in my us

Loading data from a SQL database?

2012-08-08 Thread Vincent Barat
to my customers a easy to use PIG environment allowing to perform computation on data comming from HBase, SQL and Hadoop files, if possible without having to deal with workflow tools like Oozie). What is your recommendations about that ? Cheers -- Vincent

Loading data from a SQL database?

2012-08-08 Thread Vincent Barat
Hi, I'm seeking for a way to load data from a SQL database. Something like: It seems that the SQLLoader (yet mentioned in the Wiki) no longer exists. Sqoop seems to be a good solution, but it is not so convenient in my us case (I need to deliver to my customers a easy to use PIG

Re: How to extract the keys of a map ?

2012-06-29 Thread Vincent Barat
Thank you Bill, that should do the work :-) Le 29/06/12 18:16, Bill Graham a crit: There's support for this in the trunk FYI: https://issues.apache.org/jira/browse/PIG-2600 On Fri, Jun 29, 2012 at 5:05 AM, Vincent Barat vincent.ba...@gmail.comwrote

replicated join vs regular ?

2012-01-27 Thread Vincent Barat
Hi folks, I use replicated joins, and recently I encountered an issue : my rightmost relation seems to become too big and, even if I don't get any Java heap space the time it take to finish the maps become exponentially long (I cannot figure why exactly). Removing replicated fix the issue,

PIG 0.9 leaks HBaseStorage instances

2011-10-25 Thread Vincent Barat
Hi, I try to figure out why PIG is using so many zookeeper connections (from the frontend machine) when using HBaseStorage(). I added a trace in the constructor of HBaseStorage() I wrote a simple script loading an HBase table: sessions = LOAD 'hbase://mytable' USING

PIG regression in 0.9.1's BinStorage()

2011-10-06 Thread Vincent Barat
Hi, I made more investigation and I updated the issue to provide a very easy way to reproduce it. This seems to be an important regression in BinStorage() https://issues.apache.org/jira/browse/PIG-2271 Le 09/09/11 11:36, Vincent Barat a écrit : Issue reported: https://issues.apache.org

Re: PIG regression between 0.8.1 and 0.9.x

2011-09-09 Thread Vincent Barat
:**chararray), but I cannot find an explanation to that. I verified such conversion should be valid on 0.9. Can you show me the script? Daniel On Tue, Aug 30, 2011 at 5:14 AM, Vincent Barat vincent.ba...@gmail.com wrote: Hi, I have experienced the same issue by loading the data from raw text

Re: PIG behavior changed between 0.8.1 and 0.9.x ?

2011-08-30 Thread Vincent Barat
Hi, I have experienced the same issue by loading the data from raw text files (using PIG server in local mode and the regular PIG loader) and from HBaseStorage. The issue is exactly the same in both cases: each time a NULL string is encountered, the cast to a data bag cannot be done. Le

Re: Question about request optimization

2011-08-26 Thread Vincent Barat
Le 23/08/11 20:28, Dmitriy Ryaboy a écrit : We should add merge join support to HBaseStorage, it should be able to do that for joins on the table key. It would be great ! Are your locids skewed? Have you tried using 'skewed' join for the last job? Actually, if locations are small, you can

Re: Ramdom behavior of PIG ???

2011-08-26 Thread Vincent Barat
FYI, this was fixed by PIG-2193. Le 26/07/11 19:40, Vincent Barat a écrit : Hi, I'm using PIG 0.8.1 with HBase 0.90 and the following script sometime returns an empty set, and sometimes work ! start_sessions = LOAD 'startSession' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage

Re: Removing local job traces in local mode

2011-08-26 Thread Vincent Barat
( log4j.logger.org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher, INFO); props.setProperty(log4j.logger.org.apache.pig.tools.pigstats.PigStats, INFO); Le 27/06/11 13:21, Vincent Barat a écrit : Thanks, but it is actually not what I'm looking for. First, I use PIG from Java, not from the pig shell, so the -d option cannot

Question about request optimization

2011-08-23 Thread Vincent Barat
Hi, Over the bunch of request I run using PIG 0.8.1, the most heavy one is the following: /* load session data from HBase */ start_sessions = load ... (start of sessions) end_sessions = load ... (end of sessions) location = load ... (session location) info = load ... (session

Re: Blocking issue with HBase 0.90.3 and PIG 0.8.1

2011-07-28 Thread Vincent Barat
So, I've tried the exact same request but loading the data from HDFS files (using the regular Pig loader) : it works ! Here is the request loading from HDFS: start_sessions = LOAD 'start_sessions' AS (sid:chararray, infoid:chararray, imei:chararray, start:long); end_sessions = LOAD

Re: Blocking issue with HBase 0.90.3 and PIG 0.8.1

2011-07-28 Thread Vincent Barat
I've reported the issue here: https://issues.apache.org/jira/browse/PIG-2193 Still investigating, but seems so far that the FILTER clause makes the HBase loader loose all fields that are not explicitly used in the script I striped down the request to: start_sessions = LOAD

How to load a subset of an HBase table (timestamp based) ?

2011-07-28 Thread Vincent Barat
Hi, I'd like to make PIG load only a subset of an HBase table, based on the timestamp of the records, or on the key of the rows. As an example, I'd like to load only records that have a timestamp N, or a key something. I know that HBase can handle scanners that are highly optimized to

Re: How to load a subset of an HBase table (timestamp based) ?

2011-07-28 Thread Vincent Barat
Thanks for the input, [3] is more related to timestamp storage, anyway I added my 2 cents to the issue concerning loading by timestamp. Le 28/07/11 13:19, Norbert Burger a écrit : You can instruct HBaseStorage to load a subset of the rows using the -gt and -lt options to HBaseStorage,

Blocking issue with HBase 0.90.3 and PIG 0.8.1

2011-07-27 Thread Vincent Barat
More info on this issue: 1- I use PIG 0.8.1 and HBase 0.90.3 and Hadoop 0.20-append 2- The issue can be reproduced with PIG trunk too The script: start_sessions = LOAD 'startSession.mde253811.preprod.ubithere.com' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid meta:infoid

Re: Blocking issue with HBase 0.90.3 and PIG 0.8.1

2011-07-27 Thread Vincent Barat
A precision: HBase classes of the PIG trunk cannot be compiled inside PIG 0.8.1, so I was enable to test if a fix was introduced in the last version of these classes. So 2- must not be taken into account Le 27/07/11 14:38, Vincent Barat a écrit : More info on this issue: 1- I use PIG 0.8.1

Re: Blocking issue with HBase 0.90.3 and PIG 0.8.1

2011-07-27 Thread Vincent Barat
27/07/11 14:38, Vincent Barat a écrit : More info on this issue: 1- I use PIG 0.8.1 and HBase 0.90.3 and Hadoop 0.20-append 2- The issue can be reproduced with PIG trunk too The script: start_sessions = LOAD 'startSession.mde253811.preprod.ubithere.com' USING

Re: Blocking issue with HBase 0.90.3 and PIG 0.8.1

2011-07-27 Thread Vincent Barat
working consistently ? Thanks, Thejas On 7/27/11 7:31 AM, Vincent Barat wrote: I built the pig trunk with hbase 0.90.3 client lib (ant -Dhbase.version=0.90.3) and the issue is still here. It makes me thing about an issue in the optimizer... Anyway the fact is that my request is not complex, so I

Re: Blocking issue with HBase 0.90.3 and PIG 0.8.1

2011-07-27 Thread Vincent Barat
The behavior is not random. The first dump is always empty, and the second always works. I will try what you ask, and if I have more details, I will create a JIRA issue. Thanks. Le 27/07/11 16:59, Raghu Angadi a écrit : Vincent, is the behavior random or the same each time? Couple of

Ramdom behavior of PIG ???

2011-07-26 Thread Vincent Barat
Hi, I'm using PIG 0.8.1 with HBase 0.90 and the following script somethime returns an empty set, and sometimes work ! start_sessions = LOAD 'startSession' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid meta:infoid meta:imei meta:timestamp') AS (sid:chararray,

Ramdom behavior of PIG ???

2011-07-26 Thread Vincent Barat
Hi, I'm using PIG 0.8.1 with HBase 0.90 and the following script sometime returns an empty set, and sometimes work ! start_sessions = LOAD 'startSession' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid meta:infoid meta:imei meta:timestamp') AS (sid:chararray,

Re: Error when using Pig 0.8.1 in local mode from Java

2011-06-27 Thread Vincent Barat
case. As a work arround, I created a symbolic link: lrwxr-xr-x 1 vbaratwheel13 27 jui 12:20 hadoop-${user.name}@ - hadoop-vbarat drwxr-xr-x 4 vbaratwheel 136 27 jui 12:20 hadoop-vbarat/ and it works. Le 17/05/11 17:02, Vincent Barat a écrit : Hi, I'm trying to run PIG 0.8.1

Re: Removing local job traces in local mode

2011-06-27 Thread Vincent Barat
Thanks, but it is actually not what I'm looking for. First, I use PIG from Java, not from the pig shell, so the -d option cannot be used. Second, I want to be able to choose between which trace I want, and which I don't, depending on the origin of the trace. I guess log4j can be configured for

Re: HBaseStorage does not load all my regions!

2011-05-24 Thread Vincent Barat
You're right, I tested pig 0.8.0 with hbase 0.20.6, and not 0.8.1 (very sorry). Le 24/05/11 07:44, Dmitriy Ryaboy a écrit : You couldn't have possibly tested with Pig 0.8.1 successfully, as it does not work with HBase 0.20.6 at all. This issue should not show up if you use Pig 0.8.1 and HBase

HBaseStorage does not load all my regions!

2011-05-23 Thread Vincent Barat
); DUMP nbRows; -- *Vincent BARAT, UBIKOD, CTO* vba...@ubikod.com mailto:vba...@ubikod.com Mob +33 (0)6 15 41 15 18 UBIKOD Paris, c/o ESSEC VENTURES, Avenue Bernard Hirsch, 95021 Cergy-Pontoise cedex, FRANCE, Tel +33 (0)1 34 43 28 89 UBIKOD Rennes, 10 rue Duhamel, 35000 Rennes, FRANCE, Tel

Re: HBaseStorage does not load all my regions!

2011-05-23 Thread Vincent Barat
combination is on. -- *Vincent BARAT, UBIKOD, CTO* vba...@ubikod.com mailto:vba...@ubikod.com Mob +33 (0)6 15 41 15 18 UBIKOD Paris, c/o ESSEC VENTURES, Avenue Bernard Hirsch, 95021 Cergy-Pontoise cedex, FRANCE, Tel +33 (0)1 34 43 28 89 UBIKOD Rennes, 10 rue Duhamel, 35000 Rennes, FRANCE, Tel

Error when using Pig 0.8.1 in local mode from Java

2011-05-17 Thread Vincent Barat
Hi, I'm trying to run PIG 0.8.1 in loval mode from Java. My code used to work with PIG 0.6.1. Now, I got the following exception: java.io.FileNotFoundException: File file:/tmp/hadoop-vbarat/mapred/system/job_local_0001/job.xml does not exist. at

Re: How to remove the field key from bags tuples after a GROUP ?

2011-04-21 Thread Vincent Barat
space before storing this rich_sessions in a file. Is there any way to do this ? Thank for your help, -- *Vincent BARAT, UBIKOD, CTO* vba...@ubikod.com mailto:vba...@ubikod.com Mob +33 (0)6 15 41 15 18 UBIKOD Paris, c/o ESSEC VENTURES, Avenue Bernard Hirsch, 95021 Cergy-Pontoise cedex

Re: pig speed in local mode

2011-04-21 Thread Vincent Barat
tons of CFCs to the stratosphere, and increase their population by 263,000 Bowers, C.A. The Culture of Denial: Why the Environmental Movement Needs a Strategy for Reforming Universities and Public Schools. New York: State University of New York Press, 1997: (4) (5) (p.206) -- *Vincent BARAT

Re: How to remove the field key from bags tuples after a GROUP ?

2011-04-20 Thread Vincent Barat
all activity tuples in order to save space before storing this rich_sessions in a file. Is there any way to do this ? Thank for your help, -- *Vincent BARAT, UBIKOD, CTO* vba...@ubikod.com mailto:vba...@ubikod.com Mob +33 (0)6 15 41 15 18 UBIKOD Paris, c/o ESSEC VENTURES, Avenue Bernard

PIG 0.8.0 in local mode: FileNotFoundException

2011-02-08 Thread Vincent Barat
Hi, I use PIG 0.6.0 through its Java API. I'm trying to jump from 0.6.0 to 0.8.0, but I got the following error when I run my PIG jobs from Java: 2011-02-08 16:59:38,244 | INFO | main | MapReduceLauncher | 0% complete 2011-02-08 16:59:39,170 | INFO | main | MapReduceLauncher | job null