Re: Is it possible to fix MR jobs order in Pig?

2014-07-20 Thread Bertrand Dechoux
Well, a user don't really know how many jobs will be scheduled and so their order is not something that should matter. A pig script should really be seen as a graph of operators. Your problem was that a dependency between two operators was implicit. Exec allows to 'flush' the existing graph and mak

Re: AvroStorage issue

2013-10-15 Thread Bertrand Dechoux
t; at > > org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:103) > at > > org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58) > at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:257) > > Any idea whats going wrong ? > > Thanks, > Anup > -- Bertrand Dechoux

Re: number of M/R jobs for a Pig Script

2013-10-15 Thread Bertrand Dechoux
Or Lipstick : https://github.com/Netflix/Lipstick It's Netflix this time instead of Twitter. ;) http://techblog.netflix.com/2013/06/introducing-lipstick-on-apache-pig.html But by simply running the script, the information your are looking for will be displayed at the end of the job. Bertrand O

Re: Getting dimension values for Facts

2013-07-18 Thread Bertrand Dechoux
I would say either generate the script using another language (eg Python) or use a true programming language with an API having the same level of abstraction (eg Java and Cascading). Bertrand On Thu, Jul 18, 2013 at 8:44 AM, Something Something < mailinglist...@gmail.com> wrote: > There must be

Re: save several 64MB files in Pig Latin

2013-06-10 Thread Bertrand Dechoux
n an > example > > with 10 maps (we forget the reducers now), it means that each map will > read > > more or less 50MB? > > > > > > > > On 10 June 2013 11:21, Bertrand Dechoux wrote: > > > > > I wasn't clear. Specifying the size of the f

Re: save several 64MB files in Pig Latin

2013-06-10 Thread Bertrand Dechoux
ls > > explain it very clearly. I want to split a 500MB single txt in HDFS into > > multiple files using Pig latin. Is it possible? E.g., > > > > A = LOAD ‘myfile.txt’ USING PigStorage() AS (t); > > STORE A INTO ‘multiplefiles’ USING PigStorage(); -- and here creates >

Re: save several 64MB files in Pig Latin

2013-06-09 Thread Bertrand Dechoux
nto 'result-australia-0' using PigStorage('\t'); > > > > > > to store the data in HDFS. But the problem is that, this creates 1 file > > > with 500MB of size. Instead, want to save several 64MB files. How I do > > > this? > > > > > > -- > > > Best regards, > > > > > > > > > -- > Best regards, > -- Bertrand Dechoux

Extract inputs/outputs and required parameters from script?

2013-05-16 Thread Bertrand Dechoux
Hi, The command line and its output explain what are the required parameters and the inputs/outputs of a script. I was wondering : is there a simple way to extract them automatically from the script? For the parameters, I could parse the file with my own logic, inputs/ouputs should also be doable

Re: Disable ColumnMapKeyPrune rule with PigTest

2013-05-13 Thread Bertrand Dechoux
ira/browse/PIG-3317 > > So you will be able to set the properties in PigContext and pass it to > PigServer. > > The patch is not committed yet, but it's likely to be in next release. > > Thanks, > Cheolsoo > > > > On Mon, May 13, 2013 at 2:25 AM, Bertrand Decho

Disable ColumnMapKeyPrune rule with PigTest

2013-05-13 Thread Bertrand Dechoux
Hi, I am using PigTest in order to verify a script reading and storing data in avro format. However, at the moment, the script fails due to the optimisation rule ColumnMapKeyPrune. I known I can disable it using the -optimizer_off flag. But is there a way to do that using PigTest? It seems to me