java.lang.RuntimeException: native-lzo library not available

2013-07-18 Thread Bhavesh Shah
Hello, I have written one PIG Script and tried to execute it, but after executing some part it gives me error java.io.IOException: Spill failed. I have included below statements in my script. And also I have set the classpath for hadoop-LZO jar. 1) set mapred.compress.map.output true; 2) set ma

Re: Compiling PigUnit

2013-07-18 Thread j.barrett Strausser
I'm unable to recreate this by doing the following : tar xvzf pig-0.11.1.tar.gz cd pig-0.11.1 ant ant pigunit-jar I'm running on Mint 14 % java -version java version "1.7.0_25" Java(TM) SE Runtime Environment (build 1.7.0_25-b15) Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)

Compiling PigUnit

2013-07-18 Thread Siegfried Bilstein
Hi everyone, I'm attempting to compile a pigunit jar so that I can begin unit testing my pig scripts, but I'm running into this issue during compilation: [ivy:resolve] [ivy:resolve] :: problems summary :: [ivy:resolve] WARNINGS [ivy:resolve] ::

Re: python version with Jython/Pig

2013-07-18 Thread Dexin Wang
Thanks. Instead, I found a Python implementation of the erf function, so that'll be good for now. http://stackoverflow.com/questions/457408/is-there-an-easily-available-implementation-of-erf-for-python On Wed, Jul 17, 2013 at 5:08 PM, Cheolsoo Park wrote: > Hi Dexin, > > Unfortunately, Pig is

Re: Getting dimension values for Facts

2013-07-18 Thread Pradeep Gollakota
Unfortunately I can't think of any good way of doing this (other than what Bertrand suggested with using a different language to generate the script). I'd also recommend Hive... it may be easier to do this in Hive since you have SQL like syntax. (Haven't used Hive, but it looks like this type of t

Re: Want to add data in same file in Apache PIG?

2013-07-18 Thread Xuefu Zhang
One thing you can do though, is to let pig create new files every time and have a post-pig task/job to combine the old file and the new file. It's a little abnormal to require a single file on HDFS. Normally, MR or other jobs deal with a folder of files, not just a single file. Regards, Xuefu O

Re: DISTINCT and paritioner

2013-07-18 Thread Alan Gates
You're correct. It looks like an optimization was put in to make distinct use a special partitioner which prevents the user from setting the partitioner. Could you file a JIRA against the docs so we can get that fixed? Alan. On Jul 17, 2013, at 11:27 AM, William Oberman wrote: > The docs say

Re: Getting dimension values for Facts

2013-07-18 Thread Something Something
I don't think this is macro-able, Pradeep. Every step of the way a different column gets updated. For example, for FACT_TABLE3 we update 'col1' from DIMENSION1, for FACT_TABLE5 we update 'col2' from DIMENSION2 & so on. Feel free to correct me if I am wrong. Thanks. On Thu, Jul 18, 2013 at

Re: Getting dimension values for Facts

2013-07-18 Thread Pradeep Gollakota
Looks like this might be macroable. Not entirely sure how that can be done yet... but I'd look into that if I were you. On Thu, Jul 18, 2013 at 11:16 AM, Something Something < mailinglist...@gmail.com> wrote: > Wow, Bertrand, on the Pig mailing list you're recommending not to use > Pig... LOL!

Re: Getting dimension values for Facts

2013-07-18 Thread Something Something
Wow, Bertrand, on the Pig mailing list you're recommending not to use Pig... LOL! Jokes apart, I would think this would be a common use case for Pig, no? Generating a Pig script on the fly is a decent idea, but we're hoping to avoid that - unless there's no other way. Thanks for the pointers.

DESCRIBE alias in local mode

2013-07-18 Thread Serega Sheypak
Hi, we've created simple utility project for testing pig scripts. We the core we do: def pigServer = new PigServer(ExecType.LOCAL) pigServer.setBatchOn() try { pigServer.registerScript(new FileInputStream(scriptFile.absolutePath), params, null) pigServer.dumpSchema(

Re: Want to add data in same file in Apache PIG?

2013-07-18 Thread Serega Sheypak
Use ORDER. if set is not too big. Or write mr job with single reducer. You even can try use default mapper and reducer in there is no problem with input format. 2013/7/18 Bhavesh Shah > Thanks Serega and Pradeep for your quick replies. > > > > Serega, As i am new to PIG, I didn't understand "Pi

RE: Want to add data in same file in Apache PIG?

2013-07-18 Thread Bhavesh Shah
Thanks Serega and Pradeep for your quick replies. Serega, As i am new to PIG, I didn't understand "Pig Script with one reduce action". Do you mean to write reduce action in Pig Latin or in some other langauge? - Bhavesh. > Date: Thu, 18 Jul 2013 16:03:54 +0400 > Subject: Re: Want to

Re: Want to add data in same file in Apache PIG?

2013-07-18 Thread Serega Sheypak
*merge* and sort them to only one file on *local fs*. is kept. Are you sure that you want to merge several HDFS files into one LOCAL file? Local file would be in your local file system. The simples way is to use union in pig and union existig files in HDFS with new one generated by pig script.

RE: Want to add data in same file in Apache PIG?

2013-07-18 Thread Pradeep Gollakota
If you want persistent storage like that, you're best bet is to use a database like HBase On Jul 18, 2013 7:56 AM, "Bhavesh Shah" wrote: > Thanks for reply. :) > > I just came across one command -getmerge > > > > -getmerge : Get all the files in the directories that > match the source file pa

RE: Want to add data in same file in Apache PIG?

2013-07-18 Thread Bhavesh Shah
Thanks for reply. :) I just came across one command -getmerge -getmerge : Get all the files in the directories that match the source file pattern and merge and sort them to only one file on local fs. is kept. I am thinking if I STORE the data in some other file say TMP_Name and l

Re: Want to add data in same file in Apache PIG?

2013-07-18 Thread Serega Sheypak
it's not possible. It's HDFS. 2013/7/18 Bhavesh Shah > Hello, > > Actually I have a use case in which I will receive the data from some > source and I have to dump it in the same file after every regular interval > and use that file for further operation. I tried to search on it, but I > didn't

Want to add data in same file in Apache PIG?

2013-07-18 Thread Bhavesh Shah
Hello, Actually I have a use case in which I will receive the data from some source and I have to dump it in the same file after every regular interval and use that file for further operation. I tried to search on it, but I didn't see the anything related to this. I am using STORE function,

Re: Getting dimension values for Facts

2013-07-18 Thread Bertrand Dechoux
I would say either generate the script using another language (eg Python) or use a true programming language with an API having the same level of abstraction (eg Java and Cascading). Bertrand On Thu, Jul 18, 2013 at 8:44 AM, Something Something < mailinglist...@gmail.com> wrote: > There must be