Re: Combining small S3 inputs

2014-06-17 Thread John Meagher
Check out https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/CombineFileInputFormat.html. I don't know if there's an S3 version, but this should help. On Tue, Jun 17, 2014 at 4:48 PM, Brian Stempin bstem...@50onred.com wrote: Hi, I was comparing performance of a Hadoop job

Re: Any way to join two aliases without using CROSS

2014-03-25 Thread John Meagher
Try this: http://pig.apache.org/docs/r0.11.0/basic.html#rank Rank each data set then join on the rank. On Tue, Mar 25, 2014 at 4:03 PM, Christopher Surage csur...@gmail.com wrote: The output I would like to see is (1,2,3,4,5,10,11) (1,2,4,5,7,10,12) (1,5,7,8,9,10,13) On Tue, Mar 25, 2014

Re: how to load custom Writable class from sequence file?

2013-09-24 Thread John Meagher
There's a patch available to allow using any available javax.script language to do the conversion from any Java object type in the sequence file to pig types. See https://issues.apache.org/jira/browse/PIG-1777 On Tue, Sep 24, 2013 at 5:22 AM, Dmitriy Ryaboy dvrya...@gmail.com wrote: I assume by

Re: Merging files

2013-07-31 Thread John Meagher
to create? Is it only based on file sizes? On Wed, Jul 31, 2013 at 6:28 AM, John Meagher john.meag...@gmail.comwrote: Here's a great tool for handling exactly that case: https://github.com/edwardcapriolo/filecrush On Wed, Jul 31, 2013 at 2:40 AM, Something Something mailinglist...@gmail.com

Re: foreach in PIG is not working.

2012-07-25 Thread John Meagher
Change: using PigStorage(',') to: using PigStorage(' ') The delimiter passed into PigStorage does not appear to be correct. On Wed, Jul 25, 2012 at 2:31 PM, yogesh.kuma...@wipro.com wrote: Thanks All :-) yes the file I have uploaded was text file having format (Yogesh 12) (Aashi 13)

Re: How can I set the mapper number for pig script?

2012-06-23 Thread John Meagher
Another option is to either reduce the block sizes of the input data or disabling the combine input format and splitting the data into more files. On Sat, Jun 23, 2012 at 5:58 PM, Yang tedd...@gmail.com wrote: hi Sheng: I had exactly the same problem as you did. right now with hadoop

Re: ? ERROR 1070: Could not resolve sum using imports

2012-05-17 Thread John Meagher
The UDFs are case sensitive. Use SUM and it will work. On Thu, May 17, 2012 at 11:24 AM, John Morrison john.h.morri...@gmail.com wrote: Hi, I am new to ping and am unable to use pig builtin functions (please see details below). Is this a CLASSPATH issue? Any ideas on how to resolve?

Re: Pig unit tests minus Java

2012-02-21 Thread John Meagher
You can use JUnit for system tests like that, but it ends up being a mess. You would need a JUnit test that ran hadoop, ran any other server pieces you needed, then you can use Selenium http://seleniumhq.org/ for the browser side of the test. On Tue, Feb 21, 2012 at 05:23, Dmitriy Ryaboy