Re: How to CONCAT multiple expressions

2012-07-11 Thread Scott Foster
StringConcat should work: key = StringConcat((chararray)product.$1,(chararray)$7,(chararray)$6); scott. On Tue, Jul 10, 2012 at 6:27 PM, Russell Jurney wrote: > I really need to fix this in pig 0.11 > > Russell Jurney > twitter.com/rjurney > russell.jur...@gmail.com > datasyndrome.com > > On Ju

Re: How can I set the mapper number for pig script?

2012-06-26 Thread Scott Foster
You are right that if you have a CPU intensive mapper then having more mappers will help in that case. As suggested you can reduce the block size of the files you are processing and disable split combination and you'll end up with more mappers in your job. One correction to my previous email, the

Re: How can I set the mapper number for pig script?

2012-06-23 Thread Scott Foster
You can also turn off split combination completely and then the number of mappers will equal the number of blocks SET pig.noSplitCombination false; Adding mappers may not make your process run faster since the time to read the data may be less than the overhead of creating a new JVM for each map t

Re: How to define a constant in pig which is evaluated first

2012-05-19 Thread Scott Foster
You could use a shell script to generate the periods and call then using backticks: %declare startperiod `stringdatetoepoch 2012/05/17/02 ...` %declare endperiod `stringdatetoepoch ...` scott. On Sat, May 19, 2012 at 9:56 AM, Mustafi, Priyo wrote: > Hi All, > I have some data which has epoch_ti

Re: Python UDF "import RE" bug

2012-05-19 Thread Scott Foster
You could install a version of Pig in your home directory and use that instead of the one installed in /usr/lib/pig. Pig itself only runs on the client machine and uses standard interfaces to connect to the cluster and submit map reduce jobs. On Thu, May 17, 2012 at 3:23 PM, Saurabh S wrote: > >

Re: Can no longer do a join

2012-05-05 Thread Scott Foster
I agree with Jagat, you either need to upgrade hadoop or downgrade pig. Or you might try getting Pig 0.8.1, 0.9.2 or 0.10.0 from Apache. On Fri, May 4, 2012 at 1:11 PM, Nicholas Kolegraff wrote: > Scott, > Thanks for the response! > > it does not >

Re: Can no longer do a join

2012-05-04 Thread Scott Foster
Looks like a hadoop classpath problem since it can't find the guava jar file. Does this command return anything? pig -x local -secretDebugCmd | sed 's/:/\n/g' | grep guava scott. On Thu, May 3, 2012 at 11:56 AM, Nicholas Kolegraff wrote: > Hi Everyone, > This doesn't seem to be a *pig* error bu

Re: Python UDF: import re causes error

2012-04-25 Thread Scott Foster
Go to http://jython.org/downloads.html and download jython version 2.5.2 Follow the installation instructions at http://wiki.python.org/jython/InstallationInstructions to create a 'Standalone' jar. Place the resulting jython.jar file in the /usr/lib/pig/lib scott. On Mon, Apr 23, 2012 at 12:00 PM

Re: conditional and multiple generate inside foreach?

2011-07-23 Thread Scott Foster
B = GROUP A BY random; >   C = FOREACH B GENERATE myudf(A); > > But I really don't like adding another GROUP BY here. > > On Fri, Jul 22, 2011 at 5:23 PM, Scott Foster wrote: > >> Hi Dexin, >> This is the sort of thing I've started using Python UDFs for. See

Re: conditional and multiple generate inside foreach?

2011-07-22 Thread Scott Foster
Hi Dexin, This is the sort of thing I've started using Python UDFs for. See: http://wiki.apache.org/pig/UDFsUsingScriptingLanguages for examples of how to write the python code. If your udf was implemented in Python you could then do this... register 'udfs.py' using jython as udf; ... B = FOREACH