Sam,
I think your cluster is too small for any meaningful conclusions to be made.
Sent from a remote device. Please excuse any typos...
Mike Segel
On Jun 18, 2013, at 3:58 AM, sam liu wrote:
> Hi Harsh,
>
> Thanks for your detailed response! Now, the efficiency of my Yarn cluster
> improved
Well if the script was sitting on the cluster... Then it would be a Hadoop
question.
?How do you recover a file that was deleted on HDFS?
Which is an interesting question...
But the OP said it wasn't on HDFS, and to your point... One can only say sorry
dude, bummer, rewrite it.
Sorry you're hav
That doesn't make sense...
Try introducing a combiner step.
Sent from a remote device. Please excuse any typos...
Mike Segel
On May 13, 2013, at 3:30 AM, shashwat shriparv
wrote:
>
> On Mon, May 13, 2013 at 11:35 AM, David Parks wrote:
>> (I’ve got 8 reducers, 1-per-core, 25 i
>
> Reduc
Using fair scheduler or capacity scheduler, you are creating a queue that is
being applied to the cluster.
Having said that, you can limit who uses the special queue as well as specify
the queue at the start of you job as a command line option.
HTH
Sent from a remote device. Please excuse an
8 physical cores is so 2009 - 2010 :-)
Intel now offers a chip w 10 physical cores on a die.
You are better off thinking of 4-8 GB per physical core.
It depends on what you want to do, and what you think you may want to do...
It also depends on the price points of the hardware. Memory, drives,
This is one of the reasons we set up edge nodes in the cluster. This is a node
where Hadoop is loaded yet none of the Hadoop services are running . This
allows jobs to automatically pick up the right Hadoop configuration from the
node and point to the right cluster.
The edge nodes are used for
I tend to use a real cluster so that I can test at a reasonable fraction of
scale.
I've seen some instances where code that ran 'okay' in aVM failed to perform
adequately at scale.
Sent from a remote device. Please excuse any typos...
Mike Segel
On Apr 14, 2013, at 2:19 AM, Jens Scheidtmann
t in SQL form
>
> select * from table1, table2 where table1.attr < table2.attr
>
> it is also called theta join where theta can be <, >, <=,>=,!=
>
>
>
> On Wed, Apr 10, 2013 at 9:35 PM, Michel Segel
> wrote:
>> Not sure what is meant by a
> Regards,
> Vikas
>
>
>
> On Wed, Apr 10, 2013 at 4:22 PM, Michel Segel
> wrote:
>> Can you show an example of your join?
>> All joins are an equality in that the key has to match.
>> Whether its a one to one , one to many, or many to many remains to b
Can you show an example of your join?
All joins are an equality in that the key has to match.
Whether its a one to one , one to many, or many to many remains to be seen.
Sent from a remote device. Please excuse any typos...
Mike Segel
On Apr 9, 2013, at 10:35 AM, Effyroth Gu wrote:
> Only equ
Just a suggestion, look at dynamic counters...
For the group, just create a group name and you are done.
Sent from a remote device. Please excuse any typos...
Mike Segel
On Mar 22, 2013, at 11:17 AM, Tony Burton wrote:
> Hi list,
>
> I'm using Hadoop 1.0.3 and creating some custom Counters i
Have you tried using distcp?
Sent from a remote device. Please excuse any typos...
Mike Segel
On Mar 5, 2013, at 8:37 AM, Subroto wrote:
> Hi,
>
> Its not because there are too many recursive folders in S3 bucket; in-fact
> there is no recursive folder in the source.
> If I list the S3 bucke
Sandy,
Remember KISS.
Don't try to read it in as anything but just a text line.
Its really a 3x3 matrix in what looks to be grouped by columns.
Your output will drop the initial key, and you then parse the lines and then
output it.
Without further explanation, it looks like each tuple is uniq
RTFM?
Yes you can do this. See Oozie.
When you have a cryptic name, you get a cryptic answer.
Sent from a remote device. Please excuse any typos...
Mike Segel
On Mar 5, 2013, at 5:35 PM, Public Network Services
wrote:
> Hi...
>
> I have an application that processes large amounts of propr
Yes you can.
You read in the row in each iteration of Mapper.map()
Text input.
You then output 3 times to the collector one for each row of the matrix.
Spin,sort, and reduce as needed.
Sent from a remote device. Please excuse any typos...
Mike Segel
On Mar 5, 2013, at 9:11 AM, Mix Nin wrote:
I wouldn't use sqoop if you are taking everything.
Simpler to write your own java/jdbc program that writes its output to HDFS.
Just saying...
Sent from a remote device. Please excuse any typos...
Mike Segel
On Feb 27, 2013, at 5:15 AM, samir das mohapatra
wrote:
> thanks all.
>
>
>
> On W
I think part of the confusion stems from the fact that federation of name nodes
only splits the very large cluster in to smaller portions of the same cluster.
If you lose a federated name node, you only lose a portion of the cluster not
the whole thing. So now instead of one SPOF, you have two S
Not sure what the question is... Have you looked at either the fair scheduler
or better yet capacity scheduler?
Sent from a remote device. Please excuse any typos...
Mike Segel
On Feb 21, 2013, at 5:16 AM, Dhanasekaran Anbalagan wrote:
> Hi Guys,
>
> It's possible isolation job submission f
I'm confused...
Why is this not a general how to on Hive?
Is there something special about the CDH distro?
IMHO questions like these aren't distro specific, are they?
-Mike
Sent from a remote device. Please excuse any typos...
Mike Segel
On Feb 12, 2013, at 9:42 PM, Arun C Murthy wrote:
> Pl
Depends on the question. Everything above MapRFS is pretty much the same.
Why be a hater?
Sent from a remote device. Please excuse any typos...
Mike Segel
On Feb 11, 2013, at 6:52 AM, Alexander Alten-Lorenz wrote:
> Please refer to a mapr mailinglist, thats a generic Apache Hadoop Users
> m
Can you say Centos?
:-)
Sent from a remote device. Please excuse any typos...
Mike Segel
On Jan 30, 2013, at 4:21 AM, Jean-Marc Spaggiari
wrote:
> Hi,
>
> Also, think about the memory you will need in your DataNode to serve
> all this data... I'm not sure there is any server which can take t
MapR was the first vendor to remove the NN as a SPOF.
They did this w their 1.0 release when it first came out. The downside is that
their release is proprietary and very different in terms of the underlying
architecture from Apace based releases.
Horton works relies on VMware as a key piece of
Sounds like someone is cheating on a test...
Sent from a remote device. Please excuse any typos...
Mike Segel
On Dec 28, 2012, at 3:10 PM, Ted Dunning wrote:
> Answer B sounds pathologically bad to me.
>
> A or C are the only viable options.
>
> Neither B nor D work. B fails because it woul
User 2 has the permission to delete database2 because he created it.
Did the OP mean that user1 can delete it? If so there are permissions that
would prevent that.
Sent from a remote device. Please excuse any typos...
Mike Segel
On Nov 22, 2012, at 2:41 AM, Alexander Alten-Lorenz wrote:
> Y
I don't know where the pirates came from, but you need to send pastries to HR
so that you can send the ninjas as long as you tell HR that they are doing an
international gig otherwise they will complain about OSHA.
Whatever you do, don't get legal involve. They will drag this out and by the
tim
You're missing something... ;-)
Sent from a remote device. Please excuse any typos...
Mike Segel
On Sep 7, 2012, at 8:01 PM, Deepak Kapoor wrote:
> What does all this have to do with Hadoop? Or have I missed something here.
>
> On Sat, Sep 8, 2012 at 10:59 AM, Lance Norskog wrote:
> Even wor
Which distro?
Saw this happen, way back when with a Cloudera release.
Check your config files too...
Sent from a remote device. Please excuse any typos...
Mike Segel
On Sep 4, 2012, at 3:22 AM, surfer wrote:
> Hi
>
> When I start my cluster (with start-dfs.sh), secondary namenodes are
> c
So you're running a pseudo cluster...
Take out the boot up starting of the cluster and start the cluster manually.
Even w DHCP, you shouldn't always get a new ip address because your lease
shouldn't expire that quickly...
Manually start Hadoop...
Sent from a remote device. Please excuse any t
28 matches
Mail list logo