Re: how to find top N values using map-reduce ?

2013-02-02 Thread praveenesh kumar
ne thing i want to clarify that you can use multi-reducers to sort > the data globally and then cat all the parts to get the top n records. The > data in all parts are globally in order. > Then you may find the problem is much easier. > > 在 2013-2-2 下午3:18,"praveenesh kumar&

Re: how to find top N values using map-reduce ?

2013-02-01 Thread praveenesh kumar
ub.com/4696443 > https://github.com/linkedin/datafu > > > On Fri, Feb 1, 2013 at 11:17 PM, praveenesh kumar wrote: > >> Actually what I am trying to find to top n% of the whole data. >> This n could be very large if my data is large. >> >> Assuming I have uniform rows

Re: how to find top N values using map-reduce ?

2013-02-01 Thread praveenesh kumar
e input dataset > * How many mappers you have > * Do input splits correlate with the sorting criterion for top N? > > Depending on the answers, very different strategies will be optimal. > > > > On Fri, Feb 1, 2013 at 9:05 PM, praveenesh kumar wrote: > >> I am looking f

how to find top N values using map-reduce ?

2013-02-01 Thread praveenesh kumar
I am looking for a better solution for this. 1 way to do this would be to find top N values from each mappers and then find out the top N out of them in 1 reducer. I am afraid that this won't work effectively if my N is larger than number of values in my inputsplit (or mapper input). Otherway is

Measuring Shuffle time for MR job

2012-08-26 Thread praveenesh kumar
Is there a way to know the total shuffle time of a map-reduce job - I mean some command or output that can tell that ? I want to measure total map, total shuffle and total reduce time for my MR job -- how can I achieve it ? I am using hadoop 0.20.205 Regards, Praveenesh

Re: Dealing with low space cluster

2012-06-14 Thread praveenesh kumar
@Harsh --- I was wondering...although it doesn't make much/any sense --- if a person wants to store the files only on HDFS (something like a backup) consider the above hardware scenario --- no MR processing, In that case, it should be possible to have a file with a size more than 20 GB to be store

Re: Dealing with low space cluster

2012-06-14 Thread praveenesh kumar
I don't know whether this will work or not.. but you can give it a shot..(I am assuming you are having 8 nodes as hadoop cluster) 1. Mount 1 TB hard disk to one of the DN. 2. Put it to HDFS. I think once its on HDFS.. it will automatically gets distributed. Regards, Praveenesh On Thu, Jun 14, 20

Re: Trying to put 16gb file onto hdfs

2012-06-04 Thread praveenesh kumar
Check your Datanode logs.. or do "hadoop fsck /" or "hadoop dfsadmin -report" to get more details about your HDFS. Seems like DN is down. Regards, Praveenesh On Tue, Jun 5, 2012 at 12:13 AM, wrote: > Hi Sean, > > It seems your HDFS has not properly started. Go through your HDFS > webconsole to

Re: What happens when I do not output anything from my mapper

2012-06-04 Thread praveenesh kumar
You can control your map outputs based on any condition you want. I have done that - it worked for me. It could be your code problem that its not working for you. Can you please share your map code or cross-check whether your conditions are correct ? Regards, Praveenesh On Mon, Jun 4, 2012 at 5:5

Re: SecondaryNameNode not connecting to NameNode : PriviledgedActionException

2012-06-04 Thread praveenesh kumar
e : > PriviledgedActionException > > /etc/hosts > > 127.0.0.1 localhost.localdomain localhost > ::1 localhost6.localdomain6 localhost6 > 192.168.0.10 hadoop00 > 192.168.0.11 hadoop01 > 192.168.0.12 hadoop02 > > -

Re: Hadoop cluster hardware configuration

2012-06-04 Thread praveenesh kumar
a high level just wanted to know does these hardware specs make sense ? Regards, Praveenesh On Mon, Jun 4, 2012 at 5:46 PM, Nitin Pawar wrote: > if you tell us the purpose of this cluster, then it would be helpful to > tell exactly how good it is > > On Mon, Jun 4, 2012 at 3:57 PM

Re: SecondaryNameNode not connecting to NameNode : PriviledgedActionException

2012-06-04 Thread praveenesh kumar
, 2012 at 5:42 PM, wrote: > /etc/hosts > > 127.0.0.1 localhost.localdomain localhost > ::1 localhost6.localdomain6 localhost6 > 192.168.0.10 hadoop00 > 192.168.0.11 hadoop01 > 192.168.0.12 hadoop02 > > -Original Message----- > From: pr

Re: SecondaryNameNode not connecting to NameNode : PriviledgedActionException

2012-06-04 Thread praveenesh kumar
same problem on my real cluster. Will try to explicitly configure > starting IP for this SNN. > > -----Original Message- > From: praveenesh kumar [mailto:praveen...@gmail.com] > Sent: lunes, 04 de junio de 2012 14:02 > To: common-user@hadoop.apache.org > Subject: Re:

Re: SecondaryNameNode not connecting to NameNode : PriviledgedActionException

2012-06-04 Thread praveenesh kumar
MSG: > / > SHUTDOWN_MSG: Shutting down SecondaryNameNode at hadoop01/192.168.0.11 > / > > -Original Message- > From: praveenesh kumar [mailto:praveen...@gmail.com] > Sent: lunes, 04 de junio de 2012 13:15 > T

Re: SecondaryNameNode not connecting to NameNode : PriviledgedActionException

2012-06-04 Thread praveenesh kumar
I am not sure what could be the exact issue but when configuring secondary NN to NN, you need to tell your SNN where the actual NN resides. Try adding - dfs.http.address on your secondary namenode machine having value as on hdfs-site.xml Port should be on which your NN url is opening - means your

Hadoop cluster hardware configuration

2012-06-04 Thread praveenesh kumar
Hello all, I am looking forward to build a 5 node hadoop cluster with the following configurations per machine. -- 1. Intel Xeon E5-2609 (2.40GHz/4-core) 2. 32 GB RAM (8GB 1Rx4 PC3) 3. 5 x 900GB 6G SAS 10K hard disk ( total 4.5 TB storage/machine) 4. Ethernet 1GbE connection I would like the ex

Re: Text Analysis

2012-04-26 Thread praveenesh kumar
Rhive uses Hive Thrift server to connect with Hive. You can execute hive queries and get results back into R data frames. and then play around with it using R libraries. Its pretty interesting project, given that you have Hive setup on top of hadoop. Regards, Praveenesh On Thu, Apr 26, 2012 at 1:

Re: Pre-requisites for hadoop 0.23/CDH4

2012-04-19 Thread praveenesh kumar
; yarn.scheduler.capacity.minimum-allocation-mb > # Set max container size to 1024M (max given to NM) by setting > yarn.scheduler.capacity.maximum-allocation-mb > > Arun > > On Apr 18, 2012, at 8:00 PM, praveenesh kumar wrote: > > > Hi, > > > > Sweet.. Ca

Re: Pre-requisites for hadoop 0.23/CDH4

2012-04-18 Thread praveenesh kumar
ld be a good value to use for RAM if available (1.0 will do > too, if you make sure to tweak your configs to not use too much heap > memory). Single processor should do fine for testing purposes. > > On Tue, Apr 17, 2012 at 8:51 PM, praveenesh kumar > wrote: > > I am looking to te

Pre-requisites for hadoop 0.23/CDH4

2012-04-17 Thread praveenesh kumar
I am looking to test hadoop 0.23 or CDH4 beta on my local VM. I am looking to execute the sample example codes in new architecture, play around with the containers/resource managers. Is there any pre-requisite on default memory/CPU/core settings I need to keep in mind before setting up the VM. Reg

Re: How can I configure oozie to submit different workflows from different users ?

2012-04-02 Thread praveenesh kumar
for > proxyuser groups, as the property name states are GROUPS, not USERS. > > thxs. > > Alejandro > > On Mon, Apr 2, 2012 at 2:27 PM, praveenesh kumar >wrote: > > > How can I specify multiple users /groups for proxy user setting ? > > Can I give comma separated

Re: How can I configure oozie to submit different workflows from different users ?

2012-04-02 Thread praveenesh kumar
the proxyuser > (hosts/groups) settings. You have to use explicit hosts/groups. > > Thxs. > > Alejandro > PS: please follow up this thread in the oozie-us...@incubator.apache.org > > On Mon, Apr 2, 2012 at 2:15 PM, praveenesh kumar >wrote: > > > Hi all, > > &g

MR job launching is slower

2012-03-20 Thread praveenesh kumar
I have 10 node cluster ( around 24 CPUs, 48 GB RAM, 1 TB HDD, 10 GB ethernet connection) After triggering any MR job, its taking like 3-5 seconds to launch ( I mean the time when I can see any MR job completion % on the screen). I know internally its trying to launch the job,intialize mappers, load

Re: Need help on hadoop eclipse plugin

2012-02-28 Thread praveenesh kumar
ror. Can anyone help me in debugging this issue ? Thanks, Praveenesh On Tue, Feb 28, 2012 at 1:12 PM, praveenesh kumar wrote: > Hi all, > > I am trying to use hadoop eclipse plugin on my windows machine to connect > to the my remote hadoop cluster. I am currently using putty to login

Need help on hadoop eclipse plugin

2012-02-27 Thread praveenesh kumar
Hi all, I am trying to use hadoop eclipse plugin on my windows machine to connect to the my remote hadoop cluster. I am currently using putty to login to the cluster. So ssh is enable and my windows machine is able to listen to my hadoop cluster. I am using hadoop 0.20.205, hadoop-eclipse plugin

Re: Hadoop eclipse plugin on IBM RAD 8.0

2012-02-26 Thread praveenesh kumar
Okay, I figured it out. I need to put the hadoop-eclipse plugin.jar file in $RAD_INSTALLED_DIR/features directory. Please comment if you feel I am doing something wrong. Thanks, Praveenesh On Mon, Feb 27, 2012 at 11:31 AM, praveenesh kumar wrote: > Is there a way to make IBM RAD 8.0 work w

Hadoop eclipse plugin on IBM RAD 8.0

2012-02-26 Thread praveenesh kumar
Is there a way to make IBM RAD 8.0 work with hadoop-eclipse plugin ? I tried puting hadoop-eclipse-plugin.jar in eclipse/plugins folder.. but couldn't see any hadoop-map reduce perspective. I know there are limited options for using hadoop-eclipse plugins. But did anyone try running the above 2 co

Re: mapred.map.tasks and mapred.reduce.tasks parameter meaning

2012-02-22 Thread praveenesh kumar
If I am correct : For setting mappers/node --- mapred.tasktracker.map.tasks.maximum For setting reducers/node --- mapred.tasktracker.reduce.tasks.maximum For setting mappers/job mapred.map.tasks (applicable for whole cluster) For setting reducers/job mapred.reduce.tasks(same) You can

Re: Security at file level in Hadoop

2012-02-22 Thread praveenesh kumar
You can probably use hadoop fs - chmod as suggested above. You can provide r/w permissions as you provide for general unix files. Can you please share your experiences on this thing ? Thanks, Praveenesh On Wed, Feb 22, 2012 at 4:37 PM, Ben Smithers wrote: > Hi Shreya, > > A permissions guide

Re: Number of Under-Replicated Blocks ?

2012-02-19 Thread praveenesh kumar
for your cluster size (default is 10 > replicas for all MR job submit data), or bit rot of existing blocks on > HDDs around the cluster, etc. -- You can mostly spot the pattern of > files causing it by running the fsck and obtaining the listing. > > On Mon, Feb 20, 2012 at 11:43

Number of Under-Replicated Blocks ?

2012-02-19 Thread praveenesh kumar
Hi, I am suddenly seeing some under-replicated blocks on my cluster. Although its not causing any problems, but It seems like few blocks are not replicated properly. Number of Under-Replicated Blocks : 147 Is it okay behavior on hadoop. If no, How can I know what are the files with under

Regression on Hadoop ?

2012-02-08 Thread praveenesh kumar
Guys, Is there any regression API/tool that is developed on top of hadoop *(APART from mahout) *? Thanks, Praveenesh

Re: Retail receipt analysis

2012-02-03 Thread praveenesh kumar
You can also use R-hadoop package that allows you to run R statistical algos on hadoop. Thanks, Praveenesh On Fri, Feb 3, 2012 at 10:54 PM, Harsh J wrote: > You may want to check out Apache Mahout: http://mahout.apache.org > > On Fri, Feb 3, 2012 at 10:31 PM, Fabio Pitzolu > wrote: > > Hello e

Re: Why $HADOOP_PREFIX ?

2012-02-01 Thread praveenesh kumar
ng: $HADOOP_HOME is deprecated is always there. whether the variable > is set or not. Why? > Because the hadoop-config is sourced in all scripts. And all it does is > sets HADOOP_PREFIX as HADOOP_HOME. I think this can be reported as a bug. > > -P > > > On Wed, Feb 1, 2012 a

Re: How to convert sequence file into normal text file

2012-02-01 Thread praveenesh kumar
easier way may be > an identity job with sequence-file input format and text output > format. > > On Wed, Feb 1, 2012 at 3:28 PM, praveenesh kumar > wrote: > > I am running SimpleKmeansClustering sample code from mahout in action. > How > > can I convert sequence file wr

How to convert sequence file into normal text file

2012-02-01 Thread praveenesh kumar
I am running SimpleKmeansClustering sample code from mahout in action. How can I convert sequence file written using SequenceFile.Writer into plain HDFS file so that I can read it properly. I know mahout has seqdumper tool to read it. But I want to create normal text file rather than sequence file

Re: Best practices for hadoop shuffling/tunning ?

2012-01-31 Thread praveenesh kumar
Can anyone please eyeball the config parameters as defined below and share their thoughts on this ? Thanks, Praveenesh On Mon, Jan 30, 2012 at 6:20 PM, praveenesh kumar wrote: > Hey guys, > > Just wanted to ask, are there any sort of best practices to be followed > for hado

Re: Namenode service not running on the Configured IP address

2012-01-30 Thread praveenesh kumar
Have you configured your hostname and localhost with your IP in /etc/hosts file. Thanks, Praveenesh On Tue, Jan 31, 2012 at 3:18 AM, anil gupta wrote: > Hi All, > > I am using hadoop-0.20.2 and doing a fresh installation of a distributed > Hadoop cluster along with Hbase.I am having virtualized

Best practices for hadoop shuffling/tunning ?

2012-01-30 Thread praveenesh kumar
Hey guys, Just wanted to ask, are there any sort of best practices to be followed for hadoop shuffling improvements ? I am running Hadoop 0.20.205 on 8 nodes cluster.Each node is 24 cores/CPUs with 48 GB RAM. I have set the following parameters : fs.inmemory.size.mb=2000 io.sort.mb=2000 io.sort

Re: Killing hadoop jobs automatically

2012-01-30 Thread praveenesh kumar
ed.task.timeout} of Reporter to your desired value. > > Good Luck. > > > On 01/30/2012 04:14 PM, praveenesh kumar wrote: > >> Yeah, I am aware of that, but it needs you to explicity monitor the job >> and >> look for jobid and then hadoop job -kill command. >&

Re: Killing hadoop jobs automatically

2012-01-29 Thread praveenesh kumar
e, it would get killed automatically Thanks, Praveenesh On Mon, Jan 30, 2012 at 12:38 PM, Prashant Kommireddi wrote: > You might want to take a look at the kill command : "hadoop job -kill > ". > > Prashant > > On Sun, Jan 29, 2012 at 11:06 PM, praveenesh kumar >wrote:

Killing hadoop jobs automatically

2012-01-29 Thread praveenesh kumar
Is there anyway through which we can kill hadoop jobs that are taking enough time to execute ? What I want to achieve is - If some job is running more than "_some_predefined_timeout_limit", it should be killed automatically. Is it possible to achieve this, through shell scripts or any other way ?

Re: Any info on R+Hadoop

2012-01-29 Thread praveenesh kumar
pers do not > like being engrossed with hassles that hadoop streaming can bring. > > -P > > P.S. I am not endorsing anyone. It's just my view. > > On Sun, Jan 29, 2012 at 12:54 PM, praveenesh kumar >wrote: > > > Does anyone has done any work with "R" + Hado

Any info on R+Hadoop

2012-01-28 Thread praveenesh kumar
Does anyone has done any work with "R" + Hadoop ? I know there are some flavors of R+Hadoop available such as "rmr","rhdfs", "RHIPE", "R-hive" But as far as I know submitting jobs using Hadoop Streaming is the best way right now available. Am I right ? Any info on R on Hadoop ? Thanks, Praveen

Re: Understanding fair schedulers

2012-01-25 Thread praveenesh kumar
> > On Wed, Jan 25, 2012 at 8:49 PM, praveenesh kumar > wrote: > > Then in that case, will I be using group name tag in allocations file, > like > > this inside each pool ? > > > > < group name="ABC"> > >6 > > > > >

Re: Understanding fair schedulers

2012-01-25 Thread praveenesh kumar
e identifier to be the poolnameproperty. Would this work for > you instead? > > On Wed, Jan 25, 2012 at 8:00 PM, praveenesh kumar > wrote: > > Also, with the above mentioned method, my problem is I am having one > > pool/user (thats obviously not a good way of configuri

Re: Understanding fair schedulers

2012-01-25 Thread praveenesh kumar
h On Wed, Jan 25, 2012 at 7:55 PM, praveenesh kumar wrote: > I am looking for the solution where we can do it permanently without > specify these things inside jobs. > I want to keep these things hidden from the end-user. > End-user would just write pig scripts and all the jobs

Re: Understanding fair schedulers

2012-01-25 Thread praveenesh kumar
t; Then you can provide per-poolname config overrides via the "pool" > element config described in > > http://hadoop.apache.org/common/docs/current/fair_scheduler.html#Allocation+File+%28fair-scheduler.xml%29 > > On Wed, Jan 25, 2012 at 7:01 PM, praveenesh kumar > wrote: &g

Re: Reduce > copy at 0.00 MB/s

2012-01-25 Thread praveenesh kumar
etions, > instead of default 5%. This helps your MR performance overall, if you > run multiple jobs at a time, as the reduce slots aren't wasted. > > On Wed, Jan 25, 2012 at 3:34 PM, praveenesh kumar > wrote: > > Hey, > > > > Can anyone explain me what is reduce

Re: Understanding fair schedulers

2012-01-25 Thread praveenesh kumar
our pool name while > running the job. By default, mapred.faircheduler.poolnameproperty set to > user.name ( each job run by user is allocated to his named pool ) and you > can also change this property to group.name. > > Srinivas -- > > Also, you can set > > On Wed, Jan

Understanding fair schedulers

2012-01-25 Thread praveenesh kumar
Understanding Fair Schedulers better. Can we create mulitple pools in Fair Schedulers. I guess Yes. Please correct me. Suppose I have 2 pools in my fair-scheduler.xml 1. Hadoop-users : Min map : 10, Max map : 50, Min Reduce : 10, Max Reduce : 50 2. Admin-users: Min map : 20, Max map : 80, Min Re

Re: Reduce > copy at 0.00 MB/s

2012-01-25 Thread praveenesh kumar
@hadoophive Can you explain more by "balance the cluster" ? Thanks, Praveenesh On Wed, Jan 25, 2012 at 4:29 PM, hadoop hive wrote: > i face the same issue but after sumtime when i balanced the cluster the > jobs started running fine, > > On Wed, Jan 25, 2012 at 3:3

Reduce > copy at 0.00 MB/s

2012-01-25 Thread praveenesh kumar
Hey, Can anyone explain me what is reduce > copy phase in the reducer section ? The (K,List(V)), is passed to the reducer. Is reduce > copy representing copying of K,List(V) on the reducer from all mappers ? I am monitoring my jobs on the cluster, using Jobtracker url. I am seeing for most of my

How to set file permissions internally on hadoop

2012-01-22 Thread praveenesh kumar
Hey guys, How can I configure HDFS so that internally I can set permissions on the data. I know there is a parameter called dfs.permissions that needs to be true, otherwise permissions won't work. Actually I had set it true previously, so that any user can use the HDFS data to run jobs on it. Now

Re: Best practices to recover from Corrupt Namenode

2012-01-20 Thread praveenesh kumar
eird that all the missing blocks were that of > the outputs of your M/R jobs? The NameNode should have been distributing > them evenly across the hard drives of your cluster. If the output of the > jobs is set to replication factor = 2, then the output should have been > replicated

Re: Best practices to recover from Corrupt Namenode

2012-01-18 Thread praveenesh kumar
Hi everyone, Any ideas on how to tackle this kind of situation. Thanks, Praveenesh On Tue, Jan 17, 2012 at 1:02 PM, praveenesh kumar wrote: > I have a replication factor of 2, because of the reason that I can not > afford 3 replicas on my cluster. > fsck output was saying block replica

Re: Best practices to recover from Corrupt Namenode

2012-01-16 Thread praveenesh kumar
y refers to the fsimage or edits getting corrupted). > > Did your files not have adequate replication that they could not withstand > the loss of one DN's disk? What exactly did fsck output? Did all block > replicas go missing for your files? > > On 17-Jan-2012, at 12:08 PM

Best practices to recover from Corrupt Namenode

2012-01-16 Thread praveenesh kumar
Hi guys, I just faced a weird situation, in which one of my hard disks on DN went down. Due to which when I restarted namenode, some of the blocks went missing and it was saying my namenode is CORRUPT and in safe mode, which doesn't allow you to add or delete any files on HDFS. I know , we can cl

Re: JobTracker webUI stopped showing suddenly

2012-01-11 Thread praveenesh kumar
. Please guide me.. why its happening like this ? Thanks, Praveenesh On Wed, Jan 11, 2012 at 7:32 PM, praveenesh kumar wrote: > Its running,. > I am running jobs on hadoop. they are running fine, > > Thanks, > Praveenesh > > > On Wed, Jan 11, 2012 at 7:20 PM, hadoop hive wrot

Re: JobTracker webUI stopped showing suddenly

2012-01-11 Thread praveenesh kumar
Its running,. I am running jobs on hadoop. they are running fine, Thanks, Praveenesh On Wed, Jan 11, 2012 at 7:20 PM, hadoop hive wrote: > your job tracker is not running > > On Wed, Jan 11, 2012 at 7:08 PM, praveenesh kumar >wrote: > > > Jobtracker webUI suddenly st

JobTracker webUI stopped showing suddenly

2012-01-11 Thread praveenesh kumar
Jobtracker webUI suddenly stopped showing. It was working fine before. What could be the issue ? Can anyone guide me how can I recover my WebUI ? Thanks, Praveenesh

Question on Secondary Namenode

2012-01-08 Thread praveenesh kumar
Hi, Masters file in $HADOOP_HOME/conf tells you about where exactly the SecondaryNamenode deamon should run. Correct me please if I am wrong. My doubt is Do all Datanodes should know where secondary namenode is running or Only namenode should be knowing where secondary namenode is running ? The re

Does tunning requires re-formatting Namenode ?

2012-01-08 Thread praveenesh kumar
Hey Guys, Do I need to format the namenode again if I am changing some HDFS configurations like blocksize, checksum, compression codec etc or is there any other way to enforce these new changes in the present cluster setup ? Thanks, Praveenesh

Allowing multiple users to submit jobs in hadoop 0.20.205 ?

2012-01-02 Thread praveenesh kumar
Hi, How can I allow multiple users to submit jobs in hadoop 0.20.205 ? Thanks, Praveenesh

Hive starting error

2011-12-30 Thread praveenesh kumar
Hi, I am using Hive 0.7.1 on hadoop 0.20.205 While running hive. its giving me following error : Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.security.UserGroupInformation.login(Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/security/UserGroupInformation;

Re: Unable to build pig from Trunk

2011-12-29 Thread praveenesh kumar
, 2011 at 11:54 PM, praveenesh kumar > wrote: > > I set up proxy, Now I am getting the following error : > > > > root@lxe9700 [/usr/local/hadoop/pig/new/trunk] $ --> ant > jar-withouthadoop > > -verbose > > Apache Ant version 1.6.5 compiled on June 5 2007 > &g

Re: Unable to build pig from Trunk

2011-12-29 Thread praveenesh kumar
me: 0 seconds On Fri, Dec 30, 2011 at 11:11 AM, praveenesh kumar wrote: > When I am pinging its saying "Unknown host.".. > Is there any kind of proxy setting we need to do, when building from ant ? > > Thanks, > Praveenesh > > > > On Fri, Dec 30, 2011 at 11:0

Re: Unable to build pig from Trunk

2011-12-29 Thread praveenesh kumar
.jar > to see if your server can connect to that URL. > If not you have some kind of connection issue with outgoing requests. > > --Joey > > On Thu, Dec 29, 2011 at 11:28 PM, praveenesh kumar > wrote: > > Hi everyone, > > I am trying to build Pig from SVN trunk on ha

Unable to build pig from Trunk

2011-12-29 Thread praveenesh kumar
Hi everyone, I am trying to build Pig from SVN trunk on hadoop 0.20.205. While doing that, I am getting the following error : Any idea why its happening ? Thanks, Praveenesh root@lxe [/usr/local/hadoop/pig/new/trunk] $ --> ant jar-withouthadoop -verbose Apache Ant version 1.6.5 compiled on June

Re: Multi user Hadoop 0.20.205 ?

2011-12-29 Thread praveenesh kumar
oey > > > > On Dec 29, 2011, at 2:49, praveenesh kumar wrote:. > > Guys, > > > > Did someone try this thing ? > > > > Thanks > > > > On Tue, Dec 27, 2011 at 4:36 PM, praveenesh kumar >wrote: > > > >> Hey guys, > >> &

Re: Multi user Hadoop 0.20.205 ?

2011-12-28 Thread praveenesh kumar
Guys, Did someone try this thing ? Thanks On Tue, Dec 27, 2011 at 4:36 PM, praveenesh kumar wrote: > Hey guys, > > How we can make hadoop as multiuser ? > > One way to think as whatever group we currently assigned to use hadoop, > add users to same group and cha

Multi user Hadoop 0.20.205 ?

2011-12-27 Thread praveenesh kumar
Hey guys, How we can make hadoop as multiuser ? One way to think as whatever group we currently assigned to use hadoop, add users to same group and change permissions to hadoop.tmp.dir, mapred.system.dir, dfs.data.dir, and what not. I was playing on hadoop 0.20.205 and I observed we can't change

Custom input format for parsing text files

2011-12-27 Thread praveenesh kumar
Hey people, I have a plain text file.I want to parse it using M/R line by line. When I am saying line it means plain text line that ends with a DOT. Can I use M/R to do this kind of job. I know if I have to do it like this, I have to write my own InputFormat. Can someone guide me/or share their ex

Re: Secondary Namenode on hadoop 0.20.205 ?

2011-12-26 Thread praveenesh kumar
tarball start/stop scripts, putting in the hostname for SNN in the conf/masters list is sufficient to get it auto-started there. > > On 27-Dec-2011, at 11:36 AM, praveenesh kumar wrote: > >> Thanks..But, my 1st question is still unanswered. >> I have a 8 DN/TT machines and 1 NN m

Re: Secondary Namenode on hadoop 0.20.205 ?

2011-12-26 Thread praveenesh kumar
rg/common/docs/current/hdfs_user_guide.html#Secondary+NameNode >> You can configure secondary node IP in masters file, start-dfs.sh itself >> will start the SNN automatically as it starts DN and NN as well. >> >> also you can see >> http://www.cloudera.com/blog/2009

Secondary Namenode on hadoop 0.20.205 ?

2011-12-26 Thread praveenesh kumar
Hey people, How can we setup another machine in the cluster as Secondary Namenode in hadoop 0.20.205 ? Can a DN also act as SNN, any pros and cons of having this configuration ? Thanks, Praveenesh

Re: Does hadoop installations need to be at same locations in cluster ?

2011-12-23 Thread praveenesh kumar
; > Why not just do the simple think and make all of your DN the same? > > Sent from my iPhone > > On Dec 23, 2011, at 6:51 AM, "praveenesh kumar" wrote: > >> When installing hadoop on slave machines, do we have to install hadoop >> at same locations on each

Does hadoop installations need to be at same locations in cluster ?

2011-12-23 Thread praveenesh kumar
When installing hadoop on slave machines, do we have to install hadoop at same locations on each machine ? Can we have hadoop installation at different location on different machines at same cluster ? If yes, what things we have to take care in that case Thanks, Praveenesh

Wierd problem in installing hadoop on 2 machines.

2011-12-23 Thread praveenesh kumar
Hello people, So I am trying to install hadoop .20.205 on 2 machines Individually I am able to run hadoop on each machines. Now when I am configuring one machine as slave and other as master, and tryin to start hadoop, its not able to even execute hadoop-run commands on slave machine I am getting

How Jobtracker choose DataNodes to run TaskTracker ?

2011-12-15 Thread praveenesh kumar
Okay so I have one question in mind. Suppose I have a replication factor of 3 on my cluster of some N nodes, where N>3 and there is a data block B1 that exists on some 3 Data nodes --> DD1, DD2, DD3. I want to run some Mapper function on this block.. My JT will communicate with NN, to know where

More cores Vs More Nodes ?

2011-12-12 Thread praveenesh kumar
Hey Guys, So I have a very naive question in my mind regarding Hadoop cluster nodes ? more cores or more nodes – Shall I spend money on going from 2-4 core machines, or spend money on buying more nodes less core eg. say 2 machines of 2 cores for example? Thanks, Praveenesh

Re: Hive on hadoop 0.20.205

2011-12-09 Thread praveenesh kumar
while executing this line: /usr/local/hadoop/hive/release-0.7.1/jdbc/build.xml:51: Compile failed; see the compiler error output for details. Total time: 29 minutes 46 seconds Thanks, Praveenesh On Fri, Dec 9, 2011 at 2:08 PM, praveenesh kumar wrote: > Did anyone tried HIVE on Hadoop 0.20.

Hive on hadoop 0.20.205

2011-12-09 Thread praveenesh kumar
Did anyone tried HIVE on Hadoop 0.20.205. I am trying to build HIVE from svn. but I am seeing its downloading hadoop-0.20.3-CDH3-SNAPSHOT.tar.gz and hadoop-0.20.1.tar.gz. If I am trying to do ant -Dhadoop.version=”0.20.205″ package ,but build is failing. Any ideas or suggestion on what I may be

Re: "Warning: $HADOOP_HOME is deprecated"

2011-12-07 Thread praveenesh kumar
> > - Alex > > On Wed, Dec 7, 2011 at 11:37 AM, praveenesh kumar >wrote: > > > How to avoid "Warning: $HADOOP_HOME is deprecated" messages on hadoop > > 0.20.205 ? > > > > I tried adding *export HADOOP_HOME_WARN_SUPPRESS=" " *in had

"Warning: $HADOOP_HOME is deprecated"

2011-12-07 Thread praveenesh kumar
How to avoid "Warning: $HADOOP_HOME is deprecated" messages on hadoop 0.20.205 ? I tried adding *export HADOOP_HOME_WARN_SUPPRESS=" " *in hadoop-env.sh on Namenode. But its still coming. Am I doing the right thing ? Thanks, Praveenesh

Re: HDFS Backup nodes

2011-12-07 Thread praveenesh kumar
version onwards. > ____ > From: praveenesh kumar [praveen...@gmail.com] > Sent: Wednesday, December 07, 2011 12:40 PM > To: common-user@hadoop.apache.org > Subject: HDFS Backup nodes > > Does hadoop 0.20.205 supports configuring HDFS backup nodes ? > > Thanks, > Praveenesh >

HDFS Backup nodes

2011-12-06 Thread praveenesh kumar
Does hadoop 0.20.205 supports configuring HDFS backup nodes ? Thanks, Praveenesh

Automate Hadoop installation

2011-12-05 Thread praveenesh kumar
Hi all, Can anyone guide me how to automate the hadoop installation/configuration process? I want to install hadoop on 10-20 nodes which may even exceed to 50-100 nodes ? I know we can use some configuration tools like puppet/or shell-scripts ? Has anyone done it ? How can we do hadoop installati

Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2011-12-02 Thread praveenesh kumar
or Do I have to apply some hadoop patch for this ? Thanks, Praveenesh

Utilizing multiple hard disks for hadoop HDFS ?

2011-12-01 Thread praveenesh kumar
Hi everyone, So I have this blade server with 4x500 GB hard disks. I want to use all these hard disks for hadoop HDFS. How can I achieve this target ? If I install hadoop on 1 hard disk and use other hard disk as normal partitions eg. - /dev/sda1, -- HDD 1 -- Primary partition -- Linux + Hadoop

Re: Too much fetch failure

2011-10-16 Thread praveenesh kumar
gmail wrote: > commenting the line 127.0.0.1 in /etc/hosts is not working. if i format the > namenode then automatically this line is added. > any other solution? > > On 16 October 2011 19:13, praveenesh kumar wrote: > > > try commenting 127.0.0.1 localhost line in your /etc/

Re: Too much fetch failure

2011-10-16 Thread praveenesh kumar
try commenting 127.0.0.1 localhost line in your /etc/hosts and then restart the cluster and then try again. Thanks, Praveenesh On Sun, Oct 16, 2011 at 2:00 PM, Humayun gmail wrote: > we are using hadoop on virtual box. when it is a single node then it works > fine for big dataset larger than the

Hadoop 0.20.205

2011-10-16 Thread praveenesh kumar
Hi all, Any Idea, when is hadoop 0.20.205 is officially going to release ? Is Hadoop-0.20.205 rc2 stable enough to start into production ? I am using hadoop-0.20-append now with hbase 0.90.3, want to switch to 205. But looking for some valubale suggestions/recommendations ? Thanks, Praveenesh

Re: Error using hadoop distcp

2011-10-05 Thread praveenesh kumar
doop cluster-> add "ub16" entry > in /etc/hosts on where the task running. > On 10/5/2011 12:15 PM, praveenesh kumar wrote: > > I am trying to use distcp to copy a file from one HDFS to another. > > > > But while copying I am getting the following exception :

Error using hadoop distcp

2011-10-04 Thread praveenesh kumar
I am trying to use distcp to copy a file from one HDFS to another. But while copying I am getting the following exception : hadoop distcp hdfs://ub13:54310/user/hadoop/weblog hdfs://ub16:54310/user/hadoop/weblog 11/10/05 10:41:01 INFO mapred.JobClient: Task Id : attempt_201110031447_0005_m_0

Is SAN storage is a good option for Hadoop ?

2011-09-28 Thread praveenesh kumar
Hi, I want to know can we use SAN storage for Hadoop cluster setup ? If yes, what should be the best pratices ? Is it a good way to do considering the fact "the underlining power of Hadoop is co-locating the processing power (CPU) with the data storage and thus it must be local storage to be effe

Re: hadoop question using VMWARE

2011-09-28 Thread praveenesh kumar
't see the difference, it's a pure vmware > stuff. > Obviously, it's not something you can do for production nor performance > analysis. > > Cheers, > > N. > > On Wed, Sep 28, 2011 at 8:38 AM, praveenesh kumar >wrote: > > > Hi, > > > >

hadoop question using VMWARE

2011-09-27 Thread praveenesh kumar
Hi, Suppose I am having 10 windows machines and if I have 10 VM individual instances running on these machines independently, can I use these VM instances to communicate with each other so that I can make hadoop cluster using those VM instances. Did anyone tried that thing ? I know we can setup

How to run java code using Mahout from commandline ?

2011-09-23 Thread praveenesh kumar
Hey, I have this code written using mahout. I am able to run the code from eclipse How can I run the code written in mahout from command line ? My question is do I have to make a jar file and run it as hadoop jar jarfilename.jar class or shall I run it using simple java command ? Can anyone solve

Re: Can we replace namenode machine with some other machine ?

2011-09-21 Thread praveenesh kumar
:-) > > > Regards, > Uma > - Original Message - > From: praveenesh kumar > Date: Thursday, September 22, 2011 10:42 am > Subject: Re: Can we replace namenode machine with some other machine ? > To: common-user@hadoop.apache.org > > > If I just change configurat

  1   2   >