ne thing i want to clarify that you can use multi-reducers to sort
> the data globally and then cat all the parts to get the top n records. The
> data in all parts are globally in order.
> Then you may find the problem is much easier.
>
> 在 2013-2-2 下午3:18,"praveenesh kumar&
ub.com/4696443
> https://github.com/linkedin/datafu
>
>
> On Fri, Feb 1, 2013 at 11:17 PM, praveenesh kumar wrote:
>
>> Actually what I am trying to find to top n% of the whole data.
>> This n could be very large if my data is large.
>>
>> Assuming I have uniform rows
e input dataset
> * How many mappers you have
> * Do input splits correlate with the sorting criterion for top N?
>
> Depending on the answers, very different strategies will be optimal.
>
>
>
> On Fri, Feb 1, 2013 at 9:05 PM, praveenesh kumar wrote:
>
>> I am looking f
I am looking for a better solution for this.
1 way to do this would be to find top N values from each mappers and
then find out the top N out of them in 1 reducer. I am afraid that
this won't work effectively if my N is larger than number of values in
my inputsplit (or mapper input).
Otherway is
Is there a way to know the total shuffle time of a map-reduce job - I mean
some command or output that can tell that ?
I want to measure total map, total shuffle and total reduce time for my MR
job -- how can I achieve it ? I am using hadoop 0.20.205
Regards,
Praveenesh
@Harsh ---
I was wondering...although it doesn't make much/any sense --- if a person
wants to store the files only on HDFS (something like a backup) consider
the above hardware scenario --- no MR processing, In that case, it should
be possible to have a file with a size more than 20 GB to be store
I don't know whether this will work or not.. but you can give it a shot..(I
am assuming you are having 8 nodes as hadoop cluster)
1. Mount 1 TB hard disk to one of the DN.
2. Put it to HDFS. I think once its on HDFS.. it will automatically gets
distributed.
Regards,
Praveenesh
On Thu, Jun 14, 20
Check your Datanode logs.. or do "hadoop fsck /" or "hadoop dfsadmin
-report" to get more details about your HDFS.
Seems like DN is down.
Regards,
Praveenesh
On Tue, Jun 5, 2012 at 12:13 AM, wrote:
> Hi Sean,
>
> It seems your HDFS has not properly started. Go through your HDFS
> webconsole to
You can control your map outputs based on any condition you want. I have
done that - it worked for me.
It could be your code problem that its not working for you.
Can you please share your map code or cross-check whether your conditions
are correct ?
Regards,
Praveenesh
On Mon, Jun 4, 2012 at 5:5
e :
> PriviledgedActionException
>
> /etc/hosts
>
> 127.0.0.1 localhost.localdomain localhost
> ::1 localhost6.localdomain6 localhost6
> 192.168.0.10 hadoop00
> 192.168.0.11 hadoop01
> 192.168.0.12 hadoop02
>
> -
a high level just wanted to know does these
hardware specs make sense ?
Regards,
Praveenesh
On Mon, Jun 4, 2012 at 5:46 PM, Nitin Pawar wrote:
> if you tell us the purpose of this cluster, then it would be helpful to
> tell exactly how good it is
>
> On Mon, Jun 4, 2012 at 3:57 PM
, 2012 at 5:42 PM, wrote:
> /etc/hosts
>
> 127.0.0.1 localhost.localdomain localhost
> ::1 localhost6.localdomain6 localhost6
> 192.168.0.10 hadoop00
> 192.168.0.11 hadoop01
> 192.168.0.12 hadoop02
>
> -Original Message-----
> From: pr
same problem on my real cluster. Will try to explicitly configure
> starting IP for this SNN.
>
> -----Original Message-
> From: praveenesh kumar [mailto:praveen...@gmail.com]
> Sent: lunes, 04 de junio de 2012 14:02
> To: common-user@hadoop.apache.org
> Subject: Re:
MSG:
> /
> SHUTDOWN_MSG: Shutting down SecondaryNameNode at hadoop01/192.168.0.11
> /
>
> -Original Message-
> From: praveenesh kumar [mailto:praveen...@gmail.com]
> Sent: lunes, 04 de junio de 2012 13:15
> T
I am not sure what could be the exact issue but when configuring secondary
NN to NN, you need to tell your SNN where the actual NN resides.
Try adding - dfs.http.address on your secondary namenode machine having
value as on hdfs-site.xml
Port should be on which your NN url is opening - means your
Hello all,
I am looking forward to build a 5 node hadoop cluster with the following
configurations per machine. --
1. Intel Xeon E5-2609 (2.40GHz/4-core)
2. 32 GB RAM (8GB 1Rx4 PC3)
3. 5 x 900GB 6G SAS 10K hard disk ( total 4.5 TB storage/machine)
4. Ethernet 1GbE connection
I would like the ex
Rhive uses Hive Thrift server to connect with Hive. You can execute hive
queries and get results back into R data frames. and then play around with
it using R libraries. Its pretty interesting project, given that you have
Hive setup on top of hadoop.
Regards,
Praveenesh
On Thu, Apr 26, 2012 at 1:
; yarn.scheduler.capacity.minimum-allocation-mb
> # Set max container size to 1024M (max given to NM) by setting
> yarn.scheduler.capacity.maximum-allocation-mb
>
> Arun
>
> On Apr 18, 2012, at 8:00 PM, praveenesh kumar wrote:
>
> > Hi,
> >
> > Sweet.. Ca
ld be a good value to use for RAM if available (1.0 will do
> too, if you make sure to tweak your configs to not use too much heap
> memory). Single processor should do fine for testing purposes.
>
> On Tue, Apr 17, 2012 at 8:51 PM, praveenesh kumar
> wrote:
> > I am looking to te
I am looking to test hadoop 0.23 or CDH4 beta on my local VM. I am looking
to execute the sample example codes in new architecture, play around with
the containers/resource managers.
Is there any pre-requisite on default memory/CPU/core settings I need to
keep in mind before setting up the VM.
Reg
for
> proxyuser groups, as the property name states are GROUPS, not USERS.
>
> thxs.
>
> Alejandro
>
> On Mon, Apr 2, 2012 at 2:27 PM, praveenesh kumar >wrote:
>
> > How can I specify multiple users /groups for proxy user setting ?
> > Can I give comma separated
the proxyuser
> (hosts/groups) settings. You have to use explicit hosts/groups.
>
> Thxs.
>
> Alejandro
> PS: please follow up this thread in the oozie-us...@incubator.apache.org
>
> On Mon, Apr 2, 2012 at 2:15 PM, praveenesh kumar >wrote:
>
> > Hi all,
> >
&g
I have 10 node cluster ( around 24 CPUs, 48 GB RAM, 1 TB HDD, 10 GB
ethernet connection)
After triggering any MR job, its taking like 3-5 seconds to launch ( I mean
the time when I can see any MR job completion % on the screen).
I know internally its trying to launch the job,intialize mappers, load
ror.
Can anyone help me in debugging this issue ?
Thanks,
Praveenesh
On Tue, Feb 28, 2012 at 1:12 PM, praveenesh kumar wrote:
> Hi all,
>
> I am trying to use hadoop eclipse plugin on my windows machine to connect
> to the my remote hadoop cluster. I am currently using putty to login
Hi all,
I am trying to use hadoop eclipse plugin on my windows machine to connect
to the my remote hadoop cluster. I am currently using putty to login to the
cluster. So ssh is enable and my windows machine is able to listen to my
hadoop cluster.
I am using hadoop 0.20.205, hadoop-eclipse plugin
Okay, I figured it out. I need to put the hadoop-eclipse plugin.jar file in
$RAD_INSTALLED_DIR/features directory. Please comment if you feel I am
doing something wrong.
Thanks,
Praveenesh
On Mon, Feb 27, 2012 at 11:31 AM, praveenesh kumar wrote:
> Is there a way to make IBM RAD 8.0 work w
Is there a way to make IBM RAD 8.0 work with hadoop-eclipse plugin ?
I tried puting hadoop-eclipse-plugin.jar in eclipse/plugins folder.. but
couldn't see any hadoop-map reduce perspective. I know there are limited
options for using hadoop-eclipse plugins. But did anyone try running the
above 2 co
If I am correct :
For setting mappers/node --- mapred.tasktracker.map.tasks.maximum
For setting reducers/node --- mapred.tasktracker.reduce.tasks.maximum
For setting mappers/job mapred.map.tasks (applicable for whole cluster)
For setting reducers/job mapred.reduce.tasks(same)
You can
You can probably use hadoop fs - chmod as suggested
above. You can provide r/w permissions as you provide for general unix
files.
Can you please share your experiences on this thing ?
Thanks,
Praveenesh
On Wed, Feb 22, 2012 at 4:37 PM, Ben Smithers
wrote:
> Hi Shreya,
>
> A permissions guide
for your cluster size (default is 10
> replicas for all MR job submit data), or bit rot of existing blocks on
> HDDs around the cluster, etc. -- You can mostly spot the pattern of
> files causing it by running the fsck and obtaining the listing.
>
> On Mon, Feb 20, 2012 at 11:43
Hi,
I am suddenly seeing some under-replicated blocks on my cluster. Although
its not causing any problems, but It seems like few blocks are not
replicated properly.
Number of Under-Replicated Blocks : 147
Is it okay behavior on hadoop. If no, How can I know what are the files
with under
Guys,
Is there any regression API/tool that is developed on top of hadoop *(APART
from mahout) *?
Thanks,
Praveenesh
You can also use R-hadoop package that allows you to run R statistical
algos on hadoop.
Thanks,
Praveenesh
On Fri, Feb 3, 2012 at 10:54 PM, Harsh J wrote:
> You may want to check out Apache Mahout: http://mahout.apache.org
>
> On Fri, Feb 3, 2012 at 10:31 PM, Fabio Pitzolu
> wrote:
> > Hello e
ng: $HADOOP_HOME is deprecated is always there. whether the variable
> is set or not. Why?
> Because the hadoop-config is sourced in all scripts. And all it does is
> sets HADOOP_PREFIX as HADOOP_HOME. I think this can be reported as a bug.
>
> -P
>
>
> On Wed, Feb 1, 2012 a
easier way may be
> an identity job with sequence-file input format and text output
> format.
>
> On Wed, Feb 1, 2012 at 3:28 PM, praveenesh kumar
> wrote:
> > I am running SimpleKmeansClustering sample code from mahout in action.
> How
> > can I convert sequence file wr
I am running SimpleKmeansClustering sample code from mahout in action. How
can I convert sequence file written using SequenceFile.Writer into plain
HDFS file so that I can read it properly. I know mahout has seqdumper tool
to read it. But I want to create normal text file rather than sequence file
Can anyone please eyeball the config parameters as defined below and share
their thoughts on this ?
Thanks,
Praveenesh
On Mon, Jan 30, 2012 at 6:20 PM, praveenesh kumar wrote:
> Hey guys,
>
> Just wanted to ask, are there any sort of best practices to be followed
> for hado
Have you configured your hostname and localhost with your IP in /etc/hosts
file.
Thanks,
Praveenesh
On Tue, Jan 31, 2012 at 3:18 AM, anil gupta wrote:
> Hi All,
>
> I am using hadoop-0.20.2 and doing a fresh installation of a distributed
> Hadoop cluster along with Hbase.I am having virtualized
Hey guys,
Just wanted to ask, are there any sort of best practices to be followed for
hadoop shuffling improvements ?
I am running Hadoop 0.20.205 on 8 nodes cluster.Each node is 24 cores/CPUs
with 48 GB RAM.
I have set the following parameters :
fs.inmemory.size.mb=2000
io.sort.mb=2000
io.sort
ed.task.timeout} of Reporter to your desired value.
>
> Good Luck.
>
>
> On 01/30/2012 04:14 PM, praveenesh kumar wrote:
>
>> Yeah, I am aware of that, but it needs you to explicity monitor the job
>> and
>> look for jobid and then hadoop job -kill command.
>&
e, it would get killed automatically
Thanks,
Praveenesh
On Mon, Jan 30, 2012 at 12:38 PM, Prashant Kommireddi
wrote:
> You might want to take a look at the kill command : "hadoop job -kill
> ".
>
> Prashant
>
> On Sun, Jan 29, 2012 at 11:06 PM, praveenesh kumar >wrote:
Is there anyway through which we can kill hadoop jobs that are taking
enough time to execute ?
What I want to achieve is - If some job is running more than
"_some_predefined_timeout_limit", it should be killed automatically.
Is it possible to achieve this, through shell scripts or any other way ?
pers do not
> like being engrossed with hassles that hadoop streaming can bring.
>
> -P
>
> P.S. I am not endorsing anyone. It's just my view.
>
> On Sun, Jan 29, 2012 at 12:54 PM, praveenesh kumar >wrote:
>
> > Does anyone has done any work with "R" + Hado
Does anyone has done any work with "R" + Hadoop ?
I know there are some flavors of R+Hadoop available such as "rmr","rhdfs",
"RHIPE", "R-hive"
But as far as I know submitting jobs using Hadoop Streaming is the best way
right now available. Am I right ?
Any info on R on Hadoop ?
Thanks,
Praveen
>
> On Wed, Jan 25, 2012 at 8:49 PM, praveenesh kumar
> wrote:
> > Then in that case, will I be using group name tag in allocations file,
> like
> > this inside each pool ?
> >
> > < group name="ABC">
> >6
> >
> >
>
e identifier to be the poolnameproperty. Would this work for
> you instead?
>
> On Wed, Jan 25, 2012 at 8:00 PM, praveenesh kumar
> wrote:
> > Also, with the above mentioned method, my problem is I am having one
> > pool/user (thats obviously not a good way of configuri
h
On Wed, Jan 25, 2012 at 7:55 PM, praveenesh kumar wrote:
> I am looking for the solution where we can do it permanently without
> specify these things inside jobs.
> I want to keep these things hidden from the end-user.
> End-user would just write pig scripts and all the jobs
t; Then you can provide per-poolname config overrides via the "pool"
> element config described in
>
> http://hadoop.apache.org/common/docs/current/fair_scheduler.html#Allocation+File+%28fair-scheduler.xml%29
>
> On Wed, Jan 25, 2012 at 7:01 PM, praveenesh kumar
> wrote:
&g
etions,
> instead of default 5%. This helps your MR performance overall, if you
> run multiple jobs at a time, as the reduce slots aren't wasted.
>
> On Wed, Jan 25, 2012 at 3:34 PM, praveenesh kumar
> wrote:
> > Hey,
> >
> > Can anyone explain me what is reduce
our pool name while
> running the job. By default, mapred.faircheduler.poolnameproperty set to
> user.name ( each job run by user is allocated to his named pool ) and you
> can also change this property to group.name.
>
> Srinivas --
>
> Also, you can set
>
> On Wed, Jan
Understanding Fair Schedulers better.
Can we create mulitple pools in Fair Schedulers. I guess Yes. Please
correct me.
Suppose I have 2 pools in my fair-scheduler.xml
1. Hadoop-users : Min map : 10, Max map : 50, Min Reduce : 10, Max Reduce :
50
2. Admin-users: Min map : 20, Max map : 80, Min Re
@hadoophive
Can you explain more by "balance the cluster" ?
Thanks,
Praveenesh
On Wed, Jan 25, 2012 at 4:29 PM, hadoop hive wrote:
> i face the same issue but after sumtime when i balanced the cluster the
> jobs started running fine,
>
> On Wed, Jan 25, 2012 at 3:3
Hey,
Can anyone explain me what is reduce > copy phase in the reducer section ?
The (K,List(V)), is passed to the reducer. Is reduce > copy representing
copying of K,List(V) on the reducer from all mappers ?
I am monitoring my jobs on the cluster, using Jobtracker url.
I am seeing for most of my
Hey guys,
How can I configure HDFS so that internally I can set permissions on the
data.
I know there is a parameter called dfs.permissions that needs to be true,
otherwise permissions won't work.
Actually I had set it true previously, so that any user can use the HDFS
data to run jobs on it.
Now
eird that all the missing blocks were that of
> the outputs of your M/R jobs? The NameNode should have been distributing
> them evenly across the hard drives of your cluster. If the output of the
> jobs is set to replication factor = 2, then the output should have been
> replicated
Hi everyone,
Any ideas on how to tackle this kind of situation.
Thanks,
Praveenesh
On Tue, Jan 17, 2012 at 1:02 PM, praveenesh kumar wrote:
> I have a replication factor of 2, because of the reason that I can not
> afford 3 replicas on my cluster.
> fsck output was saying block replica
y refers to the fsimage or edits getting corrupted).
>
> Did your files not have adequate replication that they could not withstand
> the loss of one DN's disk? What exactly did fsck output? Did all block
> replicas go missing for your files?
>
> On 17-Jan-2012, at 12:08 PM
Hi guys,
I just faced a weird situation, in which one of my hard disks on DN went
down.
Due to which when I restarted namenode, some of the blocks went missing and
it was saying my namenode is CORRUPT and in safe mode, which doesn't allow
you to add or delete any files on HDFS.
I know , we can cl
.
Please guide me.. why its happening like this ?
Thanks,
Praveenesh
On Wed, Jan 11, 2012 at 7:32 PM, praveenesh kumar wrote:
> Its running,.
> I am running jobs on hadoop. they are running fine,
>
> Thanks,
> Praveenesh
>
>
> On Wed, Jan 11, 2012 at 7:20 PM, hadoop hive wrot
Its running,.
I am running jobs on hadoop. they are running fine,
Thanks,
Praveenesh
On Wed, Jan 11, 2012 at 7:20 PM, hadoop hive wrote:
> your job tracker is not running
>
> On Wed, Jan 11, 2012 at 7:08 PM, praveenesh kumar >wrote:
>
> > Jobtracker webUI suddenly st
Jobtracker webUI suddenly stopped showing. It was working fine before.
What could be the issue ? Can anyone guide me how can I recover my WebUI ?
Thanks,
Praveenesh
Hi,
Masters file in $HADOOP_HOME/conf tells you about where exactly the
SecondaryNamenode deamon should run. Correct me please if I am wrong.
My doubt is Do all Datanodes should know where secondary namenode is
running or Only namenode should be knowing where secondary namenode is
running ?
The re
Hey Guys,
Do I need to format the namenode again if I am changing some HDFS
configurations like blocksize, checksum, compression codec etc or is there
any other way to enforce these new changes in the present cluster setup ?
Thanks,
Praveenesh
Hi,
How can I allow multiple users to submit jobs in hadoop 0.20.205 ?
Thanks,
Praveenesh
Hi,
I am using Hive 0.7.1 on hadoop 0.20.205
While running hive. its giving me following error :
Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.hadoop.security.UserGroupInformation.login(Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/security/UserGroupInformation;
, 2011 at 11:54 PM, praveenesh kumar
> wrote:
> > I set up proxy, Now I am getting the following error :
> >
> > root@lxe9700 [/usr/local/hadoop/pig/new/trunk] $ --> ant
> jar-withouthadoop
> > -verbose
> > Apache Ant version 1.6.5 compiled on June 5 2007
> &g
me: 0 seconds
On Fri, Dec 30, 2011 at 11:11 AM, praveenesh kumar wrote:
> When I am pinging its saying "Unknown host."..
> Is there any kind of proxy setting we need to do, when building from ant ?
>
> Thanks,
> Praveenesh
>
>
>
> On Fri, Dec 30, 2011 at 11:0
.jar
> to see if your server can connect to that URL.
> If not you have some kind of connection issue with outgoing requests.
>
> --Joey
>
> On Thu, Dec 29, 2011 at 11:28 PM, praveenesh kumar
> wrote:
> > Hi everyone,
> > I am trying to build Pig from SVN trunk on ha
Hi everyone,
I am trying to build Pig from SVN trunk on hadoop 0.20.205.
While doing that, I am getting the following error : Any idea why its
happening ?
Thanks,
Praveenesh
root@lxe [/usr/local/hadoop/pig/new/trunk] $ --> ant jar-withouthadoop
-verbose
Apache Ant version 1.6.5 compiled on June
oey
>
>
>
> On Dec 29, 2011, at 2:49, praveenesh kumar wrote:.
> > Guys,
> >
> > Did someone try this thing ?
> >
> > Thanks
> >
> > On Tue, Dec 27, 2011 at 4:36 PM, praveenesh kumar >wrote:
> >
> >> Hey guys,
> >>
&
Guys,
Did someone try this thing ?
Thanks
On Tue, Dec 27, 2011 at 4:36 PM, praveenesh kumar wrote:
> Hey guys,
>
> How we can make hadoop as multiuser ?
>
> One way to think as whatever group we currently assigned to use hadoop,
> add users to same group and cha
Hey guys,
How we can make hadoop as multiuser ?
One way to think as whatever group we currently assigned to use hadoop, add
users to same group and change permissions to hadoop.tmp.dir,
mapred.system.dir, dfs.data.dir, and what not.
I was playing on hadoop 0.20.205 and I observed we can't change
Hey people,
I have a plain text file.I want to parse it using M/R line by line. When I
am saying line it means plain text line that ends with a DOT.
Can I use M/R to do this kind of job. I know if I have to do it like this,
I have to write my own InputFormat.
Can someone guide me/or share their ex
tarball start/stop scripts, putting in the
hostname for SNN in the conf/masters list is sufficient to get it
auto-started there.
>
> On 27-Dec-2011, at 11:36 AM, praveenesh kumar wrote:
>
>> Thanks..But, my 1st question is still unanswered.
>> I have a 8 DN/TT machines and 1 NN m
rg/common/docs/current/hdfs_user_guide.html#Secondary+NameNode
>> You can configure secondary node IP in masters file, start-dfs.sh itself
>> will start the SNN automatically as it starts DN and NN as well.
>>
>> also you can see
>> http://www.cloudera.com/blog/2009
Hey people,
How can we setup another machine in the cluster as Secondary Namenode
in hadoop 0.20.205 ?
Can a DN also act as SNN, any pros and cons of having this configuration ?
Thanks,
Praveenesh
;
> Why not just do the simple think and make all of your DN the same?
>
> Sent from my iPhone
>
> On Dec 23, 2011, at 6:51 AM, "praveenesh kumar" wrote:
>
>> When installing hadoop on slave machines, do we have to install hadoop
>> at same locations on each
When installing hadoop on slave machines, do we have to install hadoop
at same locations on each machine ?
Can we have hadoop installation at different location on different
machines at same cluster ?
If yes, what things we have to take care in that case
Thanks,
Praveenesh
Hello people,
So I am trying to install hadoop .20.205 on 2 machines
Individually I am able to run hadoop on each machines.
Now when I am configuring one machine as slave and other as master,
and tryin to start hadoop, its not able to even execute hadoop-run
commands on slave machine
I am getting
Okay so I have one question in mind.
Suppose I have a replication factor of 3 on my cluster of some N
nodes, where N>3 and there is a data block B1 that exists on some 3
Data nodes --> DD1, DD2, DD3.
I want to run some Mapper function on this block.. My JT will
communicate with NN, to know where
Hey Guys,
So I have a very naive question in my mind regarding Hadoop cluster nodes ?
more cores or more nodes – Shall I spend money on going from 2-4 core
machines, or spend money on buying more nodes less core eg. say 2
machines of 2 cores for example?
Thanks,
Praveenesh
while executing this line:
/usr/local/hadoop/hive/release-0.7.1/jdbc/build.xml:51: Compile failed; see
the compiler error output for details.
Total time: 29 minutes 46 seconds
Thanks,
Praveenesh
On Fri, Dec 9, 2011 at 2:08 PM, praveenesh kumar wrote:
> Did anyone tried HIVE on Hadoop 0.20.
Did anyone tried HIVE on Hadoop 0.20.205.
I am trying to build HIVE from svn. but I am seeing its downloading
hadoop-0.20.3-CDH3-SNAPSHOT.tar.gz and hadoop-0.20.1.tar.gz.
If I am trying to do ant -Dhadoop.version=”0.20.205″ package ,but build is
failing.
Any ideas or suggestion on what I may be
>
> - Alex
>
> On Wed, Dec 7, 2011 at 11:37 AM, praveenesh kumar >wrote:
>
> > How to avoid "Warning: $HADOOP_HOME is deprecated" messages on hadoop
> > 0.20.205 ?
> >
> > I tried adding *export HADOOP_HOME_WARN_SUPPRESS=" " *in had
How to avoid "Warning: $HADOOP_HOME is deprecated" messages on hadoop
0.20.205 ?
I tried adding *export HADOOP_HOME_WARN_SUPPRESS=" " *in hadoop-env.sh on
Namenode.
But its still coming. Am I doing the right thing ?
Thanks,
Praveenesh
version onwards.
> ____
> From: praveenesh kumar [praveen...@gmail.com]
> Sent: Wednesday, December 07, 2011 12:40 PM
> To: common-user@hadoop.apache.org
> Subject: HDFS Backup nodes
>
> Does hadoop 0.20.205 supports configuring HDFS backup nodes ?
>
> Thanks,
> Praveenesh
>
Does hadoop 0.20.205 supports configuring HDFS backup nodes ?
Thanks,
Praveenesh
Hi all,
Can anyone guide me how to automate the hadoop installation/configuration
process?
I want to install hadoop on 10-20 nodes which may even exceed to 50-100
nodes ?
I know we can use some configuration tools like puppet/or shell-scripts ?
Has anyone done it ?
How can we do hadoop installati
or Do I have to apply some hadoop patch for this ?
Thanks,
Praveenesh
Hi everyone,
So I have this blade server with 4x500 GB hard disks.
I want to use all these hard disks for hadoop HDFS.
How can I achieve this target ?
If I install hadoop on 1 hard disk and use other hard disk as normal
partitions eg. -
/dev/sda1, -- HDD 1 -- Primary partition -- Linux + Hadoop
gmail wrote:
> commenting the line 127.0.0.1 in /etc/hosts is not working. if i format the
> namenode then automatically this line is added.
> any other solution?
>
> On 16 October 2011 19:13, praveenesh kumar wrote:
>
> > try commenting 127.0.0.1 localhost line in your /etc/
try commenting 127.0.0.1 localhost line in your /etc/hosts and then restart
the cluster and then try again.
Thanks,
Praveenesh
On Sun, Oct 16, 2011 at 2:00 PM, Humayun gmail wrote:
> we are using hadoop on virtual box. when it is a single node then it works
> fine for big dataset larger than the
Hi all,
Any Idea, when is hadoop 0.20.205 is officially going to release ?
Is Hadoop-0.20.205 rc2 stable enough to start into production ?
I am using hadoop-0.20-append now with hbase 0.90.3, want to switch to 205.
But looking for some valubale suggestions/recommendations ?
Thanks,
Praveenesh
doop cluster-> add "ub16" entry
> in /etc/hosts on where the task running.
> On 10/5/2011 12:15 PM, praveenesh kumar wrote:
> > I am trying to use distcp to copy a file from one HDFS to another.
> >
> > But while copying I am getting the following exception :
I am trying to use distcp to copy a file from one HDFS to another.
But while copying I am getting the following exception :
hadoop distcp hdfs://ub13:54310/user/hadoop/weblog
hdfs://ub16:54310/user/hadoop/weblog
11/10/05 10:41:01 INFO mapred.JobClient: Task Id :
attempt_201110031447_0005_m_0
Hi,
I want to know can we use SAN storage for Hadoop cluster setup ?
If yes, what should be the best pratices ?
Is it a good way to do considering the fact "the underlining power of Hadoop
is co-locating the processing power (CPU) with the data storage and thus it
must be local storage to be effe
't see the difference, it's a pure vmware
> stuff.
> Obviously, it's not something you can do for production nor performance
> analysis.
>
> Cheers,
>
> N.
>
> On Wed, Sep 28, 2011 at 8:38 AM, praveenesh kumar >wrote:
>
> > Hi,
> >
> >
Hi,
Suppose I am having 10 windows machines and if I have 10 VM individual
instances running on these machines independently, can I use these VM
instances to communicate with each other so that I can make hadoop cluster
using those VM instances.
Did anyone tried that thing ?
I know we can setup
Hey,
I have this code written using mahout. I am able to run the code from
eclipse
How can I run the code written in mahout from command line ?
My question is do I have to make a jar file and run it as hadoop jar
jarfilename.jar class
or shall I run it using simple java command ?
Can anyone solve
:-)
>
>
> Regards,
> Uma
> - Original Message -
> From: praveenesh kumar
> Date: Thursday, September 22, 2011 10:42 am
> Subject: Re: Can we replace namenode machine with some other machine ?
> To: common-user@hadoop.apache.org
>
> > If I just change configurat
1 - 100 of 169 matches
Mail list logo