Re: where are the old hadoop documentations for v0.22.0 and below ?

2014-07-30 Thread Jane Wayne
/hadoop/common/ On Tue, Jul 29, 2014 at 1:36 AM, Jane Wayne jane.wayne2...@gmail.com wrote: where can i get the old hadoop documentation (e.g. cluster setup, xml configuration params) for hadoop v0.22.0 and below? i downloaded the source and binary files but could not find

where are the old hadoop documentations for v0.22.0 and below ?

2014-07-28 Thread Jane Wayne
where can i get the old hadoop documentation (e.g. cluster setup, xml configuration params) for hadoop v0.22.0 and below? i downloaded the source and binary files but could not find the documentations as a part of the archive file. on the home page at http://hadoop.apache.org/, i only see

slaves datanodes are not starting, hadoop v2.4.1

2014-07-27 Thread Jane Wayne
i am following the instructions to setup a multi-node cluster at http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/ClusterSetup.html . my problem is that when i run the script to start up the slave datanodes, no slave datanode is started (more on this later). i have two

Re: slaves datanodes are not starting, hadoop v2.4.1

2014-07-27 Thread Jane Wayne
this is the correct way to start slave datanode daemons (NOTICE THE PLURAL DAEMONS). $HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs start datanode On Sun, Jul 27, 2014 at 3:11 AM, Jane Wayne jane.wayne2...@gmail.com wrote: i am following the instructions to setup

openjdk warning, the vm will try to fix the stack guard

2014-03-06 Thread Jane Wayne
hi, i have have hadoop v2.3.0 installed on CentOS 6.5 64-bit. OpenJDK 64-bit v1.7 is my java version. when i attempt to start hadoop, i keep seeing this message below. OpenJDK 64-Bit Server VM warning: You have loaded library /usr/local/hadoop-2.3.0/lib/native/libhadoop.so.1.0.0 which might

are the job and task tracker monitor webpages gone now in hadoop v2.3.0

2014-03-06 Thread Jane Wayne
i recently made the switch from hadoop 0.20.x to hadoop 2.3.0 (yes, big leap). i was wondering if there is a way to view my jobs now via a web UI? i used to be able to do this by accessing the following URL http://hadoop-cluster:50030/jobtracker.jsp however, there is no more job tracker

Re: are the job and task tracker monitor webpages gone now in hadoop v2.3.0

2014-03-06 Thread Jane Wayne
after it is done with its work. Once it is done, you can go look at it in the MapReduce specific JobHistoryServer. +Vinod On Mar 6, 2014, at 1:11 PM, Jane Wayne jane.wayne2...@gmail.com wrote: i recently made the switch from hadoop 0.20.x to hadoop 2.3.0 (yes, big leap). i was wondering

Re: are the job and task tracker monitor webpages gone now in hadoop v2.3.0

2014-03-06 Thread Jane Wayne
ok, the reason why hadoop jobs were not showing up was because i did not enable mapreduce to be run as a yarn application. On Thu, Mar 6, 2014 at 11:45 PM, Jane Wayne jane.wayne2...@gmail.comwrote: when i go to the job history server http://hadoop-cluster:19888/jobhistory i see no map

hdfs permission is still being checked after being disabled

2014-03-06 Thread Jane Wayne
i am using hadoop v2.3.0. in my hdfs-site.xml, i have the following property set. property namedfs.permissions.enabled/name valuefalse/value /property however, when i try to run a hadoop job, i see the following AccessControlException.

Re: hadoop v0.23.9, namenode -format command results in Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode

2013-08-12 Thread Jane Wayne
, Harsh J ha...@cloudera.com wrote: I don't think you ought to be using HADOOP_HOME anymore. Try unset HADOOP_HOME and then export HADOOP_PREFIX=/opt/hadoop and retry the NN command. On Sun, Aug 11, 2013 at 8:50 AM, Jane Wayne jane.wayne2...@gmail.com wrote: hi, i have downloaded

hadoop v0.23.9, namenode -format command results in Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode

2013-08-10 Thread Jane Wayne
hi, i have downloaded and untarred hadoop v0.23.9. i am trying to set up a single node instance to learn this version of hadop. also, i am following as best as i can, the instructions at http://hadoop.apache.org/docs/r0.23.9/hadoop-project-dist/hadoop-common/SingleCluster.html . when i attempt

Re: how to get the time of a hadoop cluster, v0.20.2

2013-05-17 Thread Jane Wayne
vote would be the Name node. ;-) HTH -Mike On May 16, 2013, at 10:34 AM, Niels Basjes ni...@basjes.nl wrote: If you make sure that everything uses NTP then this becomes an irrelevant distinction. On Thu, May 16, 2013 at 4:01 PM, Jane Wayne jane.wayne2...@gmail.com wrote: yes

Re: how to get the time of a hadoop cluster, v0.20.2

2013-05-17 Thread Jane Wayne
You are searching for a solution in the Hadoop API (where this does not exist) thanks, that's all i needed to know. cheers. On Fri, May 17, 2013 at 9:17 AM, Niels Basjes ni...@basjes.nl wrote: Hi, i have another computer (which i have referred to as a server, since it is running

Re: how to get the time of a hadoop cluster, v0.20.2

2013-05-17 Thread Jane Wayne
if NTP is correclty used that's the key statement. in several of our clusters, NTP setup is kludgy. note that the professionals administering the cluster are different from us the engineers. so, there's a lot of red tape to go through to get something trivial or not fixed. we have noticed that

Re: how to get the time of a hadoop cluster, v0.20.2

2013-05-17 Thread Jane Wayne
and please remember, i stated that although the hadoop cluster uses NTP, the server (the machine that is not a part of the hadoop cluster) cannot assume to be using NTP (and in fact, doesn't). On Fri, May 17, 2013 at 10:10 AM, Jane Wayne jane.wayne2...@gmail.comwrote: if NTP is correclty used

Re: how to get the time of a hadoop cluster, v0.20.2

2013-05-16 Thread Jane Wayne
information: http://stackoverflow.com/questions/833768/java-code-for-getting-current-time If you have a client that is not under NTP then that should be the way to fix your issue. Once you have that getting the current time is easy. Niels Basjes On Tue, May 14, 2013 at 5:46 PM, Jane Wayne

how to get the time of a hadoop cluster, v0.20.2

2013-05-14 Thread Jane Wayne
hi all, is there a way to get the current time of a hadoop cluster via the api? in particular, getting the time from the namenode or jobtracker would suffice. i looked at JobClient but didn't see anything helpful.

Re: how to get the time of a hadoop cluster, v0.20.2

2013-05-14 Thread Jane Wayne
with the server). On Tue, May 14, 2013 at 11:38 AM, Niels Basjes ni...@basjes.nl wrote: If you have all nodes using NTP then you can simply use the native Java SPI to get the current system time. On Tue, May 14, 2013 at 4:41 PM, Jane Wayne jane.wayne2...@gmail.comwrote: hi all, is there a way to get

Re: how to resolve conflicts with jar dependencies

2013-03-13 Thread Jane Wayne
AM, Jane Wayne jane.wayne2...@gmail.comwrote: hi, i need to know how to resolve conflicts with jar dependencies. * first, my job requires Jackson JSON-processor v1.9.11. * second, the hadoop cluster has Jackson JSON-processor v1.5.2. the jars are installed in $HADOOP_HOME/lib. according

Re: reg hadoop on AWS

2012-10-05 Thread Jane Wayne
i'm on windows using AWS EMR/EC2. i use the ruby client to manipulate AWS EMR. 1. spawn an EMR cluster. this should return a jobflow id (jobflow-id). ruby elastic-mapreduce --create --name j-med --alive --num-instances 10 --instance-type c1.medium 2. run a job. you need to describe the job

Re: Cumulative value using mapreduce

2012-10-05 Thread Jane Wayne
there's probably a million ways to do it, but it seems like it can be done, per your question. off the top of my head, you'd probably want to do the cumulative sum in the reducer. if you're savy, maybe even make the reducer reusable as a combiner (looks like this problem might have an associative

Re: Cumulative value using mapreduce

2012-10-05 Thread Jane Wayne
will have to come up with a composite key). when the data comes into the reducer, just keep a running count and emit each time. On Fri, Oct 5, 2012 at 11:21 AM, Jane Wayne jane.wayne2...@gmail.comwrote: there's probably a million ways to do it, but it seems like it can be done, per your question

Re: strategies to share information between mapreduce tasks

2012-09-26 Thread Jane Wayne
it but whether it is relevant or not to you will depend on your context. Regards Bertrand On Wed, Sep 26, 2012 at 5:36 PM, Jane Wayne jane.wayne2...@gmail.comwrote: hi, i know that some algorithms cannot be parallelized and adapted to the mapreduce paradigm. however, i have noticed that in most cases

Re: strategies to share information between mapreduce tasks

2012-09-26 Thread Jane Wayne
://research.google.com/pubs/pub36632.html (dremel) is from 2010. Regards Bertrand On Wed, Sep 26, 2012 at 8:18 PM, Jane Wayne jane.wayne2...@gmail.comwrote: jay, thanks. i just needed a sanity check. i hope and expect that one day, hadoop will mature towards supporting a shared-something

Re: strategies to share information between mapreduce tasks

2012-09-26 Thread Jane Wayne
jay, thanks. i just needed a sanity check. i hope and expect that one day, hadoop will mature towards supporting a shared-something approach. the web service call is not a bad idea at all. that way, we can abstract what that ultimate data store really is. i'm just a little surprised that we are

Re: Hadoop on physical Machines compared to Amazon Ec2 / virtual machines

2012-06-01 Thread Jane Wayne
Sandeep, How are you guys moving 100 TB into the AWS cloud? Are you using S3 or EBS? If you are using S3, it does not work like HDFS. Although data is replicated (I believe within an availability zone) in S3, it is not the same as HDFS replication. You lose the data locality optimization feature

how do i view the local file system output of a mapper on cygwin + windows?

2012-04-05 Thread Jane Wayne
i am currently testing my map reduce job on Windows + Cygwin + Hadoop v0.20.205. for some strange reason, the list of values (i.e. IterableT values) going into the reducer looks all wrong. i have tracked the map reduce process with logging statements (i.e. logged the input to the map, logged the

Re: how do i view the local file system output of a mapper on cygwin + windows?

2012-04-05 Thread Jane Wayne
it = values.iterator(); Value a = it.next(); Value b = it.next(); } the variables, a and b of type Value, will be the same object instance! i suppose this behavior of the iterator is to optimize iterating so as to avoid the new operator. On Thu, Apr 5, 2012 at 4:55 PM, Jane Wayne jane.wayne2

Re: how to fine tuning my map reduce job that is generating a lot of intermediate key-value pairs (a lot of I/O operations)

2012-04-04 Thread Jane Wayne
serge, i specify 15 instances, but only 14 end up being data/tasks nodes. 1 instance is reserved as the name node (job tracker). On Wed, Apr 4, 2012 at 1:17 PM, Serge Blazhievsky serge.blazhiyevs...@nice.com wrote: How many datanodes do you use fir your job? On 4/3/12 8:11 PM, Jane Wayne

how to fine tuning my map reduce job that is generating a lot of intermediate key-value pairs (a lot of I/O operations)

2012-04-03 Thread Jane Wayne
i have a map reduce job that is generating a lot of intermediate key-value pairs. for example, when i am 1/3 complete with my map phase, i may have generated over 130,000,000 output records (which is about 9 gigabytes). to get to the 1/3 complete mark is very fast (less than 10 minutes), but at

Re: how to fine tuning my map reduce job that is generating a lot of intermediate key-value pairs (a lot of I/O operations)

2012-04-03 Thread Jane Wayne
to 512Mb - increase map task heap size to 2GB. If the task still stalls, try providing lesser input for each mapper. Regards Bejoy KS On Tue, Apr 3, 2012 at 2:08 PM, Jane Wayne jane.wayne2...@gmail.com wrote: i have a map reduce job that is generating a lot of intermediate key-value pairs

Re: what is the code for WritableComparator.readVInt and WritableUtils.decodeVIntSize doing?

2012-03-31 Thread Jane Wayne
the vint bytes to get the length of the following byte array. So when you call the compareBytes method you need to pass in where the actual bytes start (s1 + vIntLen) and how many bytes to compare (vint) On Mar 31, 2012 12:38 AM, Jane Wayne jane.wayne2...@gmail.com wrote: in tom white's book

Re: how to unit test my RawComparator

2012-03-31 Thread Jane Wayne
a reference to a byte array, not making a copy of the array. Chris On Sat, Mar 31, 2012 at 12:23 AM, Jane Wayne jane.wayne2...@gmail.com wrote: i have a RawComparator that i would like to unit test (using mockito and mrunit testing packages). i want to test the method, public int

what is the code for WritableComparator.readVInt and WritableUtils.decodeVIntSize doing?

2012-03-30 Thread Jane Wayne
in tom white's book, Hadoop, The Definitive Guide, in the second edition, on page 99, he shows how to compare the raw bytes of a key with Text fields. he shows an example like the following. int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1); int firstL2 =

how can i increase the number of mappers?

2012-03-21 Thread Jane Wayne
i have a matrix that i am performing operations on. it is 10,000 rows by 5,000 columns. the total size of the file is just under 30 MB. my HDFS block size is set to 64 MB. from what i understand, the number of mappers is roughly equal to the number of HDFS blocks used in the input. i.e. if my

Re: how can i increase the number of mappers?

2012-03-21 Thread Jane Wayne
, Anil Gupta anilgupt...@gmail.com wrote: Have a look at NLineInputFormat class in Hadoop. That class will solve your purpose. Best Regards, Anil On Mar 20, 2012, at 11:07 PM, Jane Wayne jane.wayne2...@gmail.com wrote: i have a matrix that i am performing operations on. it is 10,000 rows

Re: is implementing WritableComparable and setting Job.setSortComparatorClass(...) redundant?

2012-03-20 Thread Jane Wayne
]); int n2 = WritableUtils.decodeVIntSize(b2[s2]); return compareBytes(b1, s1+n1, l1-n1, b2, s2+n2, l2-n2); } } static { // register this comparator WritableComparator.define(Text.class, new Comparator()); } Chris On Tue, Mar 20, 2012 at 2:47 AM, Jane Wayne

Re: Partition classes, how to pass in background information

2012-03-14 Thread Jane Wayne
On Mar 14, 2012 2:31 AM, Jane Wayne jane.wayne2...@gmail.com wrote: i am using the new org.apache.hadoop.mapreduce.Partitioner class. however, i need to pass it some background information. how can i do this? in the old, org.apache.hadoop.mapred.Partitioner class (now deprecated

Re: does hadoop always respect setNumReduceTasks?

2012-03-14 Thread Jane Wayne
/3686 I think the Partitioner class guarantees that you will have multiple reducers. On Thu, Mar 8, 2012 at 6:30 PM, Jane Wayne jane.wayne2...@gmail.com wrote: i am wondering if hadoop always respect Job.setNumReduceTasks(int)? as i am emitting items from the mapper, i expect/desire only 1

does hadoop always respect setNumReduceTasks?

2012-03-08 Thread Jane Wayne
i am wondering if hadoop always respect Job.setNumReduceTasks(int)? as i am emitting items from the mapper, i expect/desire only 1 reducer to get these items because i want to assign each key of the key-value input pair a unique integer id. if i had 1 reducer, i can just keep a local counter

Re: can i specify no shuffle and/or no sort in the reducer and no disk space left IOException when there is DFS space remaining

2012-03-07 Thread Jane Wayne
the reducer at all. Yeah the map output will go to HDFS directly. It's called map-only job. Jie On Thursday, March 8, 2012, Jane Wayne wrote: Jie, so if if i set the number of reduce tasks to 0, do i need to specify the reducer (or should i set it null)? if i don't specify the reducer

Re: why does my mapper class reads my input file twice?

2012-03-06 Thread Jane Wayne
by you. Fix the input path line to use a different config: Path input = new Path(conf.get(input.path)); And run job as: hadoop jar dummy-0.1.jar dummy.MyJob -Dinput.path=data/dummy.txt -Dmapred.output.dir=result On Tue, Mar 6, 2012 at 9:03 AM, Jane Wayne jane.wayne2...@gmail.com wrote

Re: is there anyway to detect the file size as am i writing a sequence file?

2012-03-06 Thread Jane Wayne
of the file). -Joey On Tue, Mar 6, 2012 at 9:53 AM, Jane Wayne jane.wayne2...@gmail.com wrote: hi, i am writing a little util class to recurse into a directory and add all *.txt files into a sequence file (key is the file name, value is the content of the corresponding text file). as i am