odeler. Can anyone point me to a tutorial for
> getting up to speed modeling data in the Hadoop environment?
>
>
>
> Thanks,
>
> Chris
>
>
>
--
Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com
t;>>
>>>>>> On Wed, Jun 24, 2015 at 12:05 PM, Ravikant Dindokar <
>>>>>> ravikant.i...@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Hi Hadoop user,
>>>>>>>
>>>>>>> I want to use hadoop for performing operation on graph data
>>>>>>> I have two file :
>>>>>>>
>>>>>>> 1. Edge list file
>>>>>>> This file contains one line for each edge in the graph.
>>>>>>> sample:
>>>>>>> 12 (here 1 is source and 2 is sink node for the edge)
>>>>>>> 15
>>>>>>> 23
>>>>>>> 42
>>>>>>> 43
>>>>>>> 56
>>>>>>> 54
>>>>>>> 57
>>>>>>> 78
>>>>>>> 89
>>>>>>> 810
>>>>>>>
>>>>>>> 2. Partition file :
>>>>>>> This file contains one line for each vertex. Each line has
>>>>>>> two values first number is and second number is >>>>>> id >
>>>>>>> sample :
>>>>>>> 21
>>>>>>> 31
>>>>>>> 41
>>>>>>> 52
>>>>>>> 62
>>>>>>> 72
>>>>>>> 81
>>>>>>> 91
>>>>>>> 101
>>>>>>>
>>>>>>>
>>>>>>> The Edge list file is having size of 32Gb, while partition file is
>>>>>>> of 10Gb.
>>>>>>> (size is so large that map/reduce can read only partition file . I
>>>>>>> have 20 node cluster with 24Gb memory per node.)
>>>>>>>
>>>>>>> My aim is to get all vertices (along with their adjacency list
>>>>>>> )those having same partition id in one reducer so that I can perform
>>>>>>> further analytics on a given partition in reducer.
>>>>>>>
>>>>>>> Is there any way in hadoop to get join of these two file in mapper
>>>>>>> and so that I can map based on the partition id ?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Ravikant
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Harshit Mathur
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Harshit Mathur
>>>
>>
>>
>
>
> --
> Harshit Mathur
>
--
Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com
n idea , that will be great.
>
> Thanks
> Krish
>
--
Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com
n Fri, Oct 17, 2014 at 11:06 AM, Adaryl "Bob" Wakefield, MBA <
> adaryl.wakefi...@hotmail.com
> > wrote:
>
>> Does anybody have any performance figures on how Spark stacks up
>> against Tez? If you don’t have figures, does anybody have an opinion? Spark
>> seems so popular but I’m not really seeing why.
>> B.
>>
>
>
--
Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com
;hello world"}
> {"author":"foo234", "text": "hello this world"}
>
> So I want to do wordcount for text part.
> I understand that in mapper, I just have to pass this data as json and
> extract "text" and rest of the code is just the same but I am trying to
> switch from python to java hadoop.
> How do I do this.
> Thanks
>
--
Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com
AM, Russell Jurney <
>> russell.jur...@gmail.com > 'russell.jur...@gmail.com');>>
>> > wrote:
>> >>
>> >>
>> >>
>> http://svn.apache.org/repos/asf/accumulo/contrib/pig/trunk/src/main/java/org/apache/accumulo/pig/Acc
http://svn.apache.org/repos/asf/accumulo/contrib/pig/trunk/src/main/java/org/apache/accumulo/pig/AccumuloStorage.java
AccumuloStorage for Pig comes with Accumulo. Easiest way would be to try it.
Russell Jurney http://datasyndrome.com
On Mar 4, 2013, at 5:30 AM, Aji Janis wrote:
Hello,
I
Hadoop streaming can do this, and there's been some discussion in the past,
but it's not a core use case. Check the list archives.
Russell Jurney http://datasyndrome.com
On Jan 17, 2013, at 9:25 AM, Jeremy Lewi wrote:
I don't think running hadoop on a GPU cluster is a commo
Hourly consultants may prefer MapReduce. Everyone else should be using Pig,
Hive, Cascading, etc.
Russell Jurney twitter.com/rjurney
On Nov 7, 2012, at 8:08 PM, yogesh dhari wrote:
Thanks Bejoy Sir,
I am always grateful to u for your help.
Please explain these word into simple language with
You just made my year. Let me know how I can make it better (off list).
Russell Jurney twitter.com/rjurney
On Oct 29, 2012, at 2:17 PM, "Daniel Käfer" wrote:
> Thank you, that book is exactly what i'm looking for.
>
> Regards
> Daniel Käfer
>
> Am Samstag, de
Russell Jurney http://datasyndrome.com
On Oct 25, 2012, at 12:24 PM, "Daniel Käfer" wrote:
> Hello all,
>
> I'm looking for a reference architecture for hadoop. The only result I
> found is Lambda architecture from Nathan Marz[0].
>
> With architecture I mean
I define one of these in the book agile data, from O'Reilly. I express
opinions on all matters you query us about. But you don't have to take
my word for it...
It's a reading rainbow!
Jordi!
Russell Jurney http://datasyndrome.com
On Oct 27, 2012, at 1:09 AM, "Daniel
);>" > 'cvml', 'user@hadoop.apache.org');>>
>> Date: Thursday, October 11, 2012 12:36 PM
>> To: "user@hadoop.apache.org > 'user@hadoop.apache.org');>" > 'cvml', 'user@hadoop.apache.org');>&g
My own clusters are too temporary and virtual for me to notice. I haven't
thought of clock speed as having mattered in a long time, so I'm curious
what kind of use cases might benefit from faster cores. Is there a category
in some way where this sweet spot for faster cores occurs?
Russ
Anyone got data on this? This is interesting, and somewhat counter-intuitive.
Russell Jurney http://datasyndrome.com
On Oct 11, 2012, at 10:47 AM, Jay Vyas wrote:
> Presumably, if you have a reasonable number of cores - speeding the cores up
> will be better than forking a task into s
r messes.
>>
>> As to the ninjas... sorry that sugar high or even caffeine high can be
>> deadly.
>>
>> Definitely not a good mix. Gluten free foods with simple chicken and fish
>> work best.
>>
>>
>> On Sep 7, 2012, at 12:10 AM, Russell Jurney
With the pastries, I feel like you're calling me fat. And that they're a
distraction for the Ninjas.
Russell Jurney http://datasyndrome.com
On Sep 6, 2012, at 10:05 PM, sathyavageeswaran
wrote:
Yah that would be great!
*From:* Fabio Pitzolu [mailto:fabio.pitz...@gr-ci.com]
HR is giving us crap over our use of pirates for business development.
Russell Jurney http://datasyndrome.com
On Sep 6, 2012, at 6:02 AM, Michael Segel wrote:
Why can't we use our Ninja's?
They are sitting on the bench.
On Sep 6, 2012, at 7:52 AM, Russell Jurney wrote:
Also there
forward them along, as we already have a sizable bill outstanding
(aforementioned copy and reply fees as well as two days back-retainer for a
total of $6,000 US) and billing is hounding me for collection. Please don't
make us use ninjas.
Russell Jurney http://datasyndrome.com
On Sep 5, 2012,
-one-avroizing-the-enron-emails/
http://hortonworks.com/blog/the-data-lifecycle-part-two-mining-avros-with-pig-consuming-data-with-hive/
Russell Jurney http://datasyndrome.com
On Aug 23, 2012, at 8:58 AM, rajesh bathala wrote:
> Hi Friends,
>
> I am new to Hadoop. Can you please let us
s. Order of processing is important in so far as related messages
> need to be processed in sequence hence today all related messages go to the
> same queue and are processed by the same queue consumer.
> >>
> >> The idea would be replace the use of MQ with some kind of reliab
uling
jobs look at Oozie and Azkaban.
Russell Jurney http://datasyndrome.com
On Aug 19, 2012, at 9:47 AM, Robert Nicholson
wrote:
> We have an application or a series of applications that listen to incoming
> feeds they then distribute this data in XML form to a number of queues.
>
22 matches
Mail list logo