Check out
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/CombineFileInputFormat.html.
I don't know if there's an S3 version, but this should help.
On Tue, Jun 17, 2014 at 4:48 PM, Brian Stempin bstem...@50onred.com wrote:
Hi,
I was comparing performance of a Hadoop job
Try this: http://pig.apache.org/docs/r0.11.0/basic.html#rank
Rank each data set then join on the rank.
On Tue, Mar 25, 2014 at 4:03 PM, Christopher Surage csur...@gmail.com wrote:
The output I would like to see is
(1,2,3,4,5,10,11)
(1,2,4,5,7,10,12)
(1,5,7,8,9,10,13)
On Tue, Mar 25, 2014
There's a patch available to allow using any available javax.script
language to do the conversion from any Java object type in the
sequence file to pig types. See
https://issues.apache.org/jira/browse/PIG-1777
On Tue, Sep 24, 2013 at 5:22 AM, Dmitriy Ryaboy dvrya...@gmail.com wrote:
I assume by
to create? Is it only based on file
sizes?
On Wed, Jul 31, 2013 at 6:28 AM, John Meagher john.meag...@gmail.comwrote:
Here's a great tool for handling exactly that case:
https://github.com/edwardcapriolo/filecrush
On Wed, Jul 31, 2013 at 2:40 AM, Something Something
mailinglist...@gmail.com
Change: using PigStorage(',')
to: using PigStorage(' ')
The delimiter passed into PigStorage does not appear to be correct.
On Wed, Jul 25, 2012 at 2:31 PM, yogesh.kuma...@wipro.com wrote:
Thanks All :-)
yes the file I have uploaded was text file having format
(Yogesh 12)
(Aashi 13)
Another option is to either reduce the block sizes of the input data
or disabling the combine input format and splitting the data into more
files.
On Sat, Jun 23, 2012 at 5:58 PM, Yang tedd...@gmail.com wrote:
hi Sheng:
I had exactly the same problem as you did.
right now with hadoop
The UDFs are case sensitive. Use SUM and it will work.
On Thu, May 17, 2012 at 11:24 AM, John Morrison
john.h.morri...@gmail.com wrote:
Hi,
I am new to ping and am unable to use pig builtin functions (please see
details below).
Is this a CLASSPATH issue?
Any ideas on how to resolve?
You can use JUnit for system tests like that, but it ends up being a
mess. You would need a JUnit test that ran hadoop, ran any other
server pieces you needed, then you can use Selenium
http://seleniumhq.org/ for the browser side of the test.
On Tue, Feb 21, 2012 at 05:23, Dmitriy Ryaboy