Re: Why my tests shows Yarn is worse than MRv1 for terasort?

2013-06-18 Thread Michel Segel
Sam, I think your cluster is too small for any meaningful conclusions to be made. Sent from a remote device. Please excuse any typos... Mike Segel On Jun 18, 2013, at 3:58 AM, sam liu wrote: > Hi Harsh, > > Thanks for your detailed response! Now, the efficiency of my Yarn cluster > improved

Re: recovery accidently deleted pig script

2013-06-13 Thread Michel Segel
Well if the script was sitting on the cluster... Then it would be a Hadoop question. ?How do you recover a file that was deleted on HDFS? Which is an interesting question... But the OP said it wasn't on HDFS, and to your point... One can only say sorry dude, bummer, rewrite it. Sorry you're hav

Re: 600s timeout during copy phase of job

2013-05-13 Thread Michel Segel
That doesn't make sense... Try introducing a combiner step. Sent from a remote device. Please excuse any typos... Mike Segel On May 13, 2013, at 3:30 AM, shashwat shriparv wrote: > > On Mon, May 13, 2013 at 11:35 AM, David Parks wrote: >> (I’ve got 8 reducers, 1-per-core, 25 i > > Reduc

Re: Using FairScheduler to limit # of tasks

2013-05-13 Thread Michel Segel
Using fair scheduler or capacity scheduler, you are creating a queue that is being applied to the cluster. Having said that, you can limit who uses the special queue as well as specify the queue at the start of you job as a command line option. HTH Sent from a remote device. Please excuse an

Re: Hardware Selection for Hadoop

2013-05-06 Thread Michel Segel
8 physical cores is so 2009 - 2010 :-) Intel now offers a chip w 10 physical cores on a die. You are better off thinking of 4-8 GB per physical core. It depends on what you want to do, and what you think you may want to do... It also depends on the price points of the hardware. Memory, drives,

Re: M/R job to a cluster?

2013-04-29 Thread Michel Segel
This is one of the reasons we set up edge nodes in the cluster. This is a node where Hadoop is loaded yet none of the Hadoop services are running . This allows jobs to automatically pick up the right Hadoop configuration from the node and point to the right cluster. The edge nodes are used for

Re: Best Hadoop dev environment [WAS: RE: Few noob MR questions]

2013-04-14 Thread Michel Segel
I tend to use a real cluster so that I can test at a reasonable fraction of scale. I've seen some instances where code that ran 'okay' in aVM failed to perform adequately at scale. Sent from a remote device. Please excuse any typos... Mike Segel On Apr 14, 2013, at 2:19 AM, Jens Scheidtmann

Re: How can I record some position of context in Reduce()?

2013-04-12 Thread Michel Segel
t in SQL form > > select * from table1, table2 where table1.attr < table2.attr > > it is also called theta join where theta can be <, >, <=,>=,!= > > > > On Wed, Apr 10, 2013 at 9:35 PM, Michel Segel > wrote: >> Not sure what is meant by a

Re: How can I record some position of context in Reduce()?

2013-04-10 Thread Michel Segel
> Regards, > Vikas > > > > On Wed, Apr 10, 2013 at 4:22 PM, Michel Segel > wrote: >> Can you show an example of your join? >> All joins are an equality in that the key has to match. >> Whether its a one to one , one to many, or many to many remains to b

Re: How can I record some position of context in Reduce()?

2013-04-10 Thread Michel Segel
Can you show an example of your join? All joins are an equality in that the key has to match. Whether its a one to one , one to many, or many to many remains to be seen. Sent from a remote device. Please excuse any typos... Mike Segel On Apr 9, 2013, at 10:35 AM, Effyroth Gu wrote: > Only equ

Re: Group names for custom Counters

2013-03-22 Thread Michel Segel
Just a suggestion, look at dynamic counters... For the group, just create a group name and you are done. Sent from a remote device. Please excuse any typos... Mike Segel On Mar 22, 2013, at 11:17 AM, Tony Burton wrote: > Hi list, > > I'm using Hadoop 1.0.3 and creating some custom Counters i

Re: S3N copy creating recursive folders

2013-03-06 Thread Michel Segel
Have you tried using distcp? Sent from a remote device. Please excuse any typos... Mike Segel On Mar 5, 2013, at 8:37 AM, Subroto wrote: > Hi, > > Its not because there are too many recursive folders in S3 bucket; in-fact > there is no recursive folder in the source. > If I list the S3 bucke

Re: Transpose

2013-03-06 Thread Michel Segel
Sandy, Remember KISS. Don't try to read it in as anything but just a text line. Its really a 3x3 matrix in what looks to be grouped by columns. Your output will drop the initial key, and you then parse the lines and then output it. Without further explanation, it looks like each tuple is uniq

Re: Execution handover in map/reduce pipeline

2013-03-06 Thread Michel Segel
RTFM? Yes you can do this. See Oozie. When you have a cryptic name, you get a cryptic answer. Sent from a remote device. Please excuse any typos... Mike Segel On Mar 5, 2013, at 5:35 PM, Public Network Services wrote: > Hi... > > I have an application that processes large amounts of propr

Re: Transpose

2013-03-05 Thread Michel Segel
Yes you can. You read in the row in each iteration of Mapper.map() Text input. You then output 3 times to the collector one for each row of the matrix. Spin,sort, and reduce as needed. Sent from a remote device. Please excuse any typos... Mike Segel On Mar 5, 2013, at 9:11 AM, Mix Nin wrote:

Re: How to take Whole Database From RDBMS to HDFS Instead of Table/Table

2013-02-27 Thread Michel Segel
I wouldn't use sqoop if you are taking everything. Simpler to write your own java/jdbc program that writes its output to HDFS. Just saying... Sent from a remote device. Please excuse any typos... Mike Segel On Feb 27, 2013, at 5:15 AM, samir das mohapatra wrote: > thanks all. > > > > On W

Re: One NameNode keeps Rolling Edit Log on HDFS Federation

2013-02-24 Thread Michel Segel
I think part of the confusion stems from the fact that federation of name nodes only splits the very large cluster in to smaller portions of the same cluster. If you lose a federated name node, you only lose a portion of the cluster not the whole thing. So now instead of one SPOF, you have two S

Re: Hadoop efficient resource isolation

2013-02-21 Thread Michel Segel
Not sure what the question is... Have you looked at either the fair scheduler or better yet capacity scheduler? Sent from a remote device. Please excuse any typos... Mike Segel On Feb 21, 2013, at 5:16 AM, Dhanasekaran Anbalagan wrote: > Hi Guys, > > It's possible isolation job submission f

Re: Delivery Status Notification (Failure)

2013-02-14 Thread Michel Segel
I'm confused... Why is this not a general how to on Hive? Is there something special about the CDH distro? IMHO questions like these aren't distro specific, are they? -Mike Sent from a remote device. Please excuse any typos... Mike Segel On Feb 12, 2013, at 9:42 PM, Arun C Murthy wrote: > Pl

Re: [OT] MapR m3

2013-02-11 Thread Michel Segel
Depends on the question. Everything above MapRFS is pretty much the same. Why be a hater? Sent from a remote device. Please excuse any typos... Mike Segel On Feb 11, 2013, at 6:52 AM, Alexander Alten-Lorenz wrote: > Please refer to a mapr mailinglist, thats a generic Apache Hadoop Users > m

Re: Maximum Storage size in a Single datanode

2013-01-30 Thread Michel Segel
Can you say Centos? :-) Sent from a remote device. Please excuse any typos... Mike Segel On Jan 30, 2013, at 4:21 AM, Jean-Marc Spaggiari wrote: > Hi, > > Also, think about the memory you will need in your DataNode to serve > all this data... I'm not sure there is any server which can take t

Re: hadoop namenode recovery

2013-01-17 Thread Michel Segel
MapR was the first vendor to remove the NN as a SPOF. They did this w their 1.0 release when it first came out. The downside is that their release is proprietary and very different in terms of the underlying architecture from Apace based releases. Horton works relies on VMware as a key piece of

Re: What is the preferred way to pass a small number of configuration parameters to a mapper or reducer

2012-12-28 Thread Michel Segel
Sounds like someone is cheating on a test... Sent from a remote device. Please excuse any typos... Mike Segel On Dec 28, 2012, at 3:10 PM, Ted Dunning wrote: > Answer B sounds pathologically bad to me. > > A or C are the only viable options. > > Neither B nor D work. B fails because it woul

Re: Multiuser setup on Hive

2012-11-22 Thread Michel Segel
User 2 has the permission to delete database2 because he created it. Did the OP mean that user1 can delete it? If so there are permissions that would prevent that. Sent from a remote device. Please excuse any typos... Mike Segel On Nov 22, 2012, at 2:41 AM, Alexander Alten-Lorenz wrote: > Y

Re: Legal Matter

2012-09-09 Thread Michel Segel
I don't know where the pirates came from, but you need to send pastries to HR so that you can send the ninjas as long as you tell HR that they are doing an international gig otherwise they will complain about OSHA. Whatever you do, don't get legal involve. They will drag this out and by the tim

Re: Legal Matter

2012-09-09 Thread Michel Segel
You're missing something... ;-) Sent from a remote device. Please excuse any typos... Mike Segel On Sep 7, 2012, at 8:01 PM, Deepak Kapoor wrote: > What does all this have to do with Hadoop? Or have I missed something here. > > On Sat, Sep 8, 2012 at 10:59 AM, Lance Norskog wrote: > Even wor

Re: SNN

2012-09-04 Thread Michel Segel
Which distro? Saw this happen, way back when with a Cloudera release. Check your config files too... Sent from a remote device. Please excuse any typos... Mike Segel On Sep 4, 2012, at 3:22 AM, surfer wrote: > Hi > > When I start my cluster (with start-dfs.sh), secondary namenodes are > c

Re: datanode startup before hostname is resovable

2012-08-08 Thread Michel Segel
So you're running a pseudo cluster... Take out the boot up starting of the cluster and start the cluster manually. Even w DHCP, you shouldn't always get a new ip address because your lease shouldn't expire that quickly... Manually start Hadoop... Sent from a remote device. Please excuse any t