Re: Re: Re: distributed cache

2012-11-16 Thread yingnan.ma
when I use the distributed cache , I found that when the file is more than 100MB or the number of records are more than 10 million , the file can not be cache in the memory; and I try to set the io.sort.mb is 200MB ; it still can not work, Any suggestion would be fine! Thank you !

Re: Is Programming Pig book outdated?

2012-11-16 Thread Jagat Singh
In open source community no book can be ever latest , so we have to live by this :) I would suggest you to start from this book and see the latest documentation on pig website.side by side to see latest features Good luck On Fri, Nov 16, 2012 at 8:41 PM, Majid Azimi

Re: Accessing tuple field names from within a python udf

2012-11-16 Thread Martin Goodson
Unfortunately I've realised that boundscript.describe doesn't return a string. It returns void but prints to stdout. This means I have to go through a rather painful process of calling a separate python process that calls boundscript.describe and then capture the stdout of that process in order to

Re: Is Programming Pig book outdated?

2012-11-16 Thread Mohammad Tariq
Agree with Mr. Jagat. Regards, Mohammad Tariq On Fri, Nov 16, 2012 at 3:26 PM, Jagat Singh jagatsi...@gmail.com wrote: In open source community no book can be ever latest , so we have to live by this :) I would suggest you to start from this book and see the latest documentation on

Re: Is Programming Pig book outdated?

2012-11-16 Thread Robert Yerex
It is a bit dated but an excellent resource for learning Pig. We give each new data engineer a copy! Probably the biggest change from my point of view is the use of JSONStorage() now built in at 0.10 so one does not need to wrangle with a custom loader. When I started a couple years back, the only

Re: Accessing tuple field names from within a python udf

2012-11-16 Thread Jonathan Coveney
In the java interface, there is a getInputSchema() method. You could make this available in the python side of things. This would be a useful addition. 2012/11/16 Martin Goodson mar...@qubitproducts.com Unfortunately I've realised that boundscript.describe doesn't return a string. It returns

Re: PigStorage

2012-11-16 Thread Dmitriy Ryaboy
That sounds reasonable, I've run into the same problem. Do you mind submitting a patch? On Fri, Nov 16, 2012 at 12:48 PM, pablomar pablo.daniel.marti...@gmail.com wrote: hi all, I'm using Pig 0.9.2 (Apache Pig version 0.9.2-cdh4.0.1, precisely) I got a case today on which I needed to clean up

Re: problem filtering null values with pig

2012-11-16 Thread Arian Pasquali
just for the record I m posting here the solution for my problem. Thank you for your help. In the end the problem seams to be with the JsonLoader I was using. I don't know why exactly, but it seams to have a bug with my strings. I finally changed my code to use