Re: Doubt in DoubleWritable

2015-11-23 Thread unmesha sreeveni
Please try this for (DoubleArrayWritable avalue : values) { Writable[] value = avalue.get(); // DoubleWritable[] value = new DoubleWritable[6]; // for(int k=0;k<6;k++){ // value[k] = DoubleWritable(wvalue[k]); // } //parse accordingly if (Double.parseDouble(value[1].toString()) != 0) {

Re: Doubt Regarding QJM protocol - example 2.10.6 of Quorum-Journal Design document

2014-09-28 Thread Ulul
Hi A developer should answer that but a quick look to an edit file with od suggests that record are not fixed length. So maybe the likeliness of the situation you suggest is so low that there is no need to check more than file size Ulul Le 28/09/2014 11:17, Giridhar Addepalli a écrit : Hi

RE: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop

2014-04-15 Thread Radhe Radhe
:37:56 -0400 Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop From: john.meag...@gmail.com To: user@hadoop.apache.org Also, Source Compatibility also means ONLY a recompile is needed. No code changes should

Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop

2014-04-15 Thread Zhijie Shen
:56 -0400 Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop From: john.meag...@gmail.com To: user@hadoop.apache.org Also, Source Compatibility also means ONLY a recompile is needed. No code changes should

RE: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop

2014-04-15 Thread Radhe Radhe
2014 13:03:53 -0700 Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop From: zs...@hortonworks.com To: user@hadoop.apache.org 1. If you have the binaries that were compiled against MRv1 mapred libs, it should just work

Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop

2014-04-15 Thread Zhijie Shen
file is execute it. -RR -- Date: Tue, 15 Apr 2014 13:03:53 -0700 Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop From: zs...@hortonworks.com To: user@hadoop.apache.org 1. If you

Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop

2014-04-14 Thread John Meagher
Also, Source Compatibility also means ONLY a recompile is needed. No code changes should be needed. On Mon, Apr 14, 2014 at 10:37 AM, John Meagher john.meag...@gmail.com wrote: Source Compatibility = you need to recompile and use the new version as part of the compilation Binary Compatibility

Re: Doubt

2014-03-19 Thread Jay Vyas
Certainly it is , and quite common especially if you have some high performance machines : they can run as mapreduce slaves and also double as mongo hosts. The problem would of course be that when running mapreduce jobs you might have very slow network bandwidth at times, and if your front end

Re: Doubt

2014-03-19 Thread sri harsha
thank s jay and praveen, i want to use both separately don't want to use mongodb in the place of hbase On Wed, Mar 19, 2014 at 9:25 PM, Jay Vyas jayunit...@gmail.com wrote: Certainly it is , and quite common especially if you have some high performance machines : they can run as mapreduce

Re: Doubt

2014-03-19 Thread praveenesh kumar
Why not ? Its just a matter of installing 2 different packages. Depends on what do you want to use it for, you need to take care of few things, but as far as installation is concerned, it should be easily doable. Regards Prav On Wed, Mar 19, 2014 at 3:41 PM, sri harsha rsharsh...@gmail.com

Re: doubt

2014-01-19 Thread Justin Black
I've installed a hadoop single node cluster on a VirtualBox machine running ubuntu 12.04LTS (64-bit) with 512MB RAM and 8GB HD. I haven't seen any errors in my testing yet. Is 1GB RAM required? Will I run into issues when I expand the cluster? On Sat, Jan 18, 2014 at 11:24 PM, Alexander

Re: doubt

2014-01-18 Thread Alexander Pivovarov
it' enough. hadoop uses only 1GB RAM by default. On Sat, Jan 18, 2014 at 10:11 PM, sri harsha rsharsh...@gmail.com wrote: Hi , i want to install 4 node cluster in 64-bit LINUX. 4GB RAM 500HD is enough for this or shall i need to expand ? please suggest about my query. than x -- amiable

Re: Doubt on Input and Output Mapper - Key value pairs

2012-11-07 Thread Harsh J
The answer (a) is correct, in general. On Wed, Nov 7, 2012 at 6:09 PM, Ramasubramanian Narayanan ramasubramanian.naraya...@gmail.com wrote: Hi, Which of the following is correct w.r.t mapper. (a) It accepts a single key-value pair as input and can emit any number of key-value pairs as

Re: Doubt on Input and Output Mapper - Key value pairs

2012-11-07 Thread Mahesh Balija
Hi Rams, A mapper will accept single key-value pair as input and can emit 0 or more key-value pairs based on what you want to do in mapper function (I mean based on your business logic in mapper function). But the framework will actually aggregate the list of values

Re: doubt about reduce tasks and block writes

2012-08-26 Thread Raj Vishwanathan
Message - From: Harsh J ha...@cloudera.com To: common-user@hadoop.apache.org; Raj Vishwanathan rajv...@yahoo.com Cc: Sent: Saturday, August 25, 2012 4:02 AM Subject: Re: doubt about reduce tasks and block writes Raj's almost right. In times of high load or space fillup on a local DN

Re: doubt about reduce tasks and block writes

2012-08-25 Thread Marc Sturlese
Thanks, Raj you got exactly my point. I wanted to confirm this assumption as I was guessing if a shared HDFS cluster with MR and Hbase like this would make sense: http://old.nabble.com/HBase-User-f34655.html -- View this message in context:

Re: doubt about reduce tasks and block writes

2012-08-25 Thread Harsh J
Raj's almost right. In times of high load or space fillup on a local DN, the NameNode may decide to instead pick a non-local DN for replica-writing. In this way, the Node A may get a copy 0 of a replica from a task. This is per the default block placement policy. P.s. Note that HDFS hardly makes

Re: doubt about reduce tasks and block writes

2012-08-24 Thread Minh Duc Nguyen
Marc, see my inline comments. On Fri, Aug 24, 2012 at 4:09 PM, Marc Sturlese marc.sturl...@gmail.comwrote: Hey there, I have a doubt about reduce tasks and block writes. Do a reduce task always first write to hdfs in the node where they it is placed? (and then these blocks would be

Re: doubt about reduce tasks and block writes

2012-08-24 Thread Bertrand Dechoux
Assuming that node A only contains replica, there is no garante that its data would never be read. First, you might lose a replica. The copy inside the node A could be used to create the missing replica again. Second, data locality is on best effort. If all the map slots are occupied except one on

Re: doubt about reduce tasks and block writes

2012-08-24 Thread Raj Vishwanathan
But since node A has no TT running, it will not run map or reduce tasks. When the reducer node writes the output file, the fist block will be written on the local node and never on node A. So, to answer the question, Node A will contain copies of blocks of all output files. It wont contain the

Re: doubt on Hadoop job submission process

2012-08-13 Thread Harsh J
Hi Manoj, Reply inline. On Mon, Aug 13, 2012 at 3:42 PM, Manoj Babu manoj...@gmail.com wrote: Hi All, Normal Hadoop job submission process involves: Checking the input and output specifications of the job. Computing the InputSplits for the job. Setup the requisite accounting information

Re: doubt on Hadoop job submission process

2012-08-13 Thread Manoj Babu
Hi Harsh, Thanks for your reply. Consider from my main program i am doing so many activities(Reading/writing/updating non hadoop activities) before invoking JobClient.runJob(conf); Is it anyway to separate the process flow by programmatic instead of going for any workflow engine? Cheers! Manoj.

Re: doubt on Hadoop job submission process

2012-08-13 Thread Harsh J
Sure, you may separate the logic as you want it to be, but just ensure the configuration object has a proper setJar or setJarByClass done on it before you submit the job. On Mon, Aug 13, 2012 at 4:43 PM, Manoj Babu manoj...@gmail.com wrote: Hi Harsh, Thanks for your reply. Consider from my

Re: Doubt from the book Definitive Guide

2012-04-05 Thread Mohit Anchlia
On Wed, Apr 4, 2012 at 10:02 PM, Prashant Kommireddi prash1...@gmail.comwrote: Hi Mohit, What would be the advantage? Reducers in most cases read data from all the mappers. In the case where mappers were to write to HDFS, a reducer would still require to read data from other datanodes across

Re: Doubt from the book Definitive Guide

2012-04-05 Thread Jean-Daniel Cryans
On Thu, Apr 5, 2012 at 7:03 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Only advantage I was thinking of was that in some cases reducers might be able to take advantage of data locality and avoid multiple HTTP calls, no? Data is anyways written, so last merged file could go on HDFS instead

Re: Doubt from the book Definitive Guide

2012-04-04 Thread Prashant Kommireddi
Answers inline. On Wed, Apr 4, 2012 at 4:56 PM, Mohit Anchlia mohitanch...@gmail.comwrote: I am going through the chapter How mapreduce works and have some confusion: 1) Below description of Mapper says that reducers get the output file using HTTP call. But the description under The Reduce

Re: Doubt from the book Definitive Guide

2012-04-04 Thread Harsh J
Hi Mohit, On Thu, Apr 5, 2012 at 5:26 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I am going through the chapter How mapreduce works and have some confusion: 1) Below description of Mapper says that reducers get the output file using HTTP call. But the description under The Reduce Side

Re: Doubt from the book Definitive Guide

2012-04-04 Thread Mohit Anchlia
On Wed, Apr 4, 2012 at 8:42 PM, Harsh J ha...@cloudera.com wrote: Hi Mohit, On Thu, Apr 5, 2012 at 5:26 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I am going through the chapter How mapreduce works and have some confusion: 1) Below description of Mapper says that reducers get the

Re: Doubt from the book Definitive Guide

2012-04-04 Thread Prashant Kommireddi
Hi Mohit, What would be the advantage? Reducers in most cases read data from all the mappers. In the case where mappers were to write to HDFS, a reducer would still require to read data from other datanodes across the cluster. Prashant On Apr 4, 2012, at 9:55 PM, Mohit Anchlia

Re: [Doubt]: Submission of Mapreduce from outside Hadoop Cluster

2011-07-01 Thread Harsh J
Narayanan, On Fri, Jul 1, 2011 at 11:28 AM, Narayanan K knarayana...@gmail.com wrote: Hi all, We are basically working on a research project and I require some help regarding this. Always glad to see research work being done! What're you working on? :) How do I submit a mapreduce job from

Re: [Doubt]: Submission of Mapreduce from outside Hadoop Cluster

2011-07-01 Thread Harsh J
Narayanan, On Fri, Jul 1, 2011 at 12:57 PM, Narayanan K knarayana...@gmail.com wrote: So the report will be run from a different machine outside the cluster. So we need a way to pass on the parameters to the hadoop cluster (master) and initiate a mapreduce job dynamically. Similarly the output

Re: [Doubt]: Submission of Mapreduce from outside Hadoop Cluster

2011-07-01 Thread Yaozhen Pan
Narayanan, Regarding the client installation, you should make sure that client and server use same version hadoop for submitting jobs and transfer data. if you use a different user in client than the one runs hadoop job, config the hadoop ugi property (sorry i forget the exact name). 在 2011 7 1

RE: Doubt: Regarding running Hadoop on a cluster with shared disk.

2010-05-05 Thread Michael Segel
Udaya, You can use non-local disk on your hadoop cloud, however it will have sub-optimal performance, and you will have to tune accordingly. If its a shared drive on all of your nodes, you need to create different directories for each machine. Suppose your shared drive is /foo then you

Re: Doubt: Using PBS to run mapreduce jobs.

2010-05-04 Thread Craig Macdonald
HOD supports a PBS environment, namely Torque. Torque is the vastly improved fork of OpenPBS. You may be able to get HOD working on OpenPBS, or better still persuade your cluster admins to upgrade to a more recent version of Torque (e.g. at least 2.1.x) Craig On 22/07/28164 20:59, Udaya

Re: Doubt: Using PBS to run mapreduce jobs.

2010-05-04 Thread Udaya Lakshmi
Thank you Craig. My cluster has got Torque. Can you please point me something which will have detailed explanation about using HOD on Torque. On Tue, May 4, 2010 at 10:17 PM, Craig Macdonald cra...@dcs.gla.ac.ukwrote: HOD supports a PBS environment, namely Torque. Torque is the vastly improved

Re: Doubt: Using PBS to run mapreduce jobs.

2010-05-04 Thread Peeyush Bishnoi
Udaya, Following link will help you for HOD on torque. http://hadoop.apache.org/common/docs/r0.20.0/hod_user_guide.html Thanks, --- Peeyush On Tue, 2010-05-04 at 22:49 +0530, Udaya Lakshmi wrote: Thank you Craig. My cluster has got Torque. Can you please point me something which will have

Re: Doubt: Using PBS to run mapreduce jobs.

2010-05-04 Thread Allen Wittenauer
On May 4, 2010, at 7:46 AM, Udaya Lakshmi wrote: Hi, I am given an account on a cluster which uses OpenPBS as the cluster management software. The only way I can run a job is by submitting it to OpenPBS. How to run mapreduce programs on it? Is there any possible work around? Take a look

Re: Doubt: Using PBS to run mapreduce jobs.

2010-05-04 Thread Udaya Lakshmi
Thank you. Udaya. On Wed, May 5, 2010 at 12:23 AM, Allen Wittenauer awittena...@linkedin.comwrote: On May 4, 2010, at 7:46 AM, Udaya Lakshmi wrote: Hi, I am given an account on a cluster which uses OpenPBS as the cluster management software. The only way I can run a job is by

Re: Doubt about SequenceFile.Writer

2010-02-07 Thread Jeff Zhang
The SequenceFile is not text file, so you can not see the content by invoking unix command cat. But you can get the text content by using hadoop command : hadoop fs -text src On Sun, Feb 7, 2010 at 8:51 AM, Andiana Squazo Ringa andriana.ri...@gmail.com wrote: Hi, I have written to a

Re: Doubt about SequenceFile.Writer

2010-02-07 Thread Ravi
Thanks a lot Jeff. Ringa. On Sun, Feb 7, 2010 at 10:30 PM, Jeff Zhang zjf...@gmail.com wrote: The SequenceFile is not text file, so you can not see the content by invoking unix command cat. But you can get the text content by using hadoop command : hadoop fs -text src On Sun, Feb 7,

Re: Re: Re: Re: Doubt in Hadoop

2009-11-29 Thread aa225
Hi, Actually, I just made the change suggested by Aaron and my code worked. But I still would like to know why does the setJarbyClass() method have to be called when the Main class and the Map and Reduce classes are in the same package ? Thank You Abhishek Agrawal SUNY- Buffalo

Re: Re: Doubt in Hadoop

2009-11-27 Thread Aaron Kimball
When you set up the Job object, do you call job.setJarByClass(Map.class)? That will tell Hadoop which jar file to ship with the job and to use for classloading in your code. - Aaron On Thu, Nov 26, 2009 at 11:56 PM, aa...@buffalo.edu wrote: Hi, I am running the job from command line. The

Re: Doubt in Hadoop

2009-11-26 Thread Jeff Zhang
Do you run the map reduce job in command line or IDE? in map reduce mode, you should put the jar containing the map and reduce class in your classpath Jeff Zhang On Fri, Nov 27, 2009 at 2:19 PM, aa...@buffalo.edu wrote: Hello Everybody, I have a doubt in Haddop and was

Re: Re: Doubt in Hadoop

2009-11-26 Thread aa225
Hi, I am running the job from command line. The job runs fine in the local mode but something happens when I try to run the job in the distributed mode. Abhishek Agrawal SUNY- Buffalo (716-435-7122) On Fri 11/27/09 2:31 AM , Jeff Zhang zjf...@gmail.com sent: Do you run the map reduce job

Re: Doubt in reducer

2009-08-27 Thread Vladimir Klimontovich
But reducer can do some preparations during map process. It can distribute map output across nodes that will work as reducers. Copying and sorting map output is also time costuming process (maybe, more consuming than reduce itself). For example, piece job run log on 40node cluster could be

Re: Doubt in HBase

2009-08-21 Thread Ryan Rawson
Well the inputs to those reducers would be the empty set, they wouldn't have anything to do and their output would also be nil as well. If you are doing something like this, and your operation is communitive, consider using a combiner so that you don't shuffle as much data. A large amount of

Re: Doubt in HBase

2009-08-21 Thread Ryan Rawson
hey, Yes the hadoop system attempts to assign map tasks to data local, but why would you be worried about this for 5 values? The max value size in hbase is Integer.MAX_VALUE, so it's not like you have much data to shuffle. Once your blobs ~ 64mb or so, it might make more sense to use HDFS

Re: Doubt in HBase

2009-08-21 Thread bharath vissapragada
Thanks Ryan I was just explaining with an example .. I have TBs of data to work with.Just i wanted to know that scheduler TRIES to assign the reduce phase to keep the data local (i.e.,TRYING to assign it to the machine with machine with greater num of key values). I was just explaining it with

Re: Doubt in HBase

2009-08-21 Thread Jonathan Gray
Ryan, In older versions of HBase, when we did not attempt any data locality, we had a few users running jobs that became network i/o bound. It wasn't a latency issue it was a bandwidth issue. That's actually when/why an attempt at better data locality for HBase MR was made in the first

Re: Doubt in HBase

2009-08-21 Thread bharath vissapragada
JG Can you please elaborate on the last statement for some.. by giving an example or some kind of scenario in which it can take place where MR jobs involve huge amount of data. Thanks. On Fri, Aug 21, 2009 at 11:24 PM, Jonathan Gray jl...@streamy.com wrote: Ryan, In older versions of HBase,

Re: Doubt in HBase

2009-08-21 Thread Jonathan Gray
I really couldn't be specific. The more data that has to be moved across the wire, the more network i/o. For example, if you have very large values, and a very large table, and you have that as the input to your MR. You could potentially be network i/o bound. It should be very easy to test

Re: Doubt in HBase

2009-08-21 Thread bharath vissapragada
JG, In one of your above replies , you have said that datalocality was not considered in older versions of HBase , Is there any development on the same in 0.20 RC1/2 or 0.19.x ? If no can you tell me where that patch can be available so that i can test my programs . Thanks in advance On Sat,

Re: Doubt in HBase

2009-08-20 Thread Amandeep Khurana
On Thu, Aug 20, 2009 at 9:42 AM, john smith js1987.sm...@gmail.com wrote: Hi all , I have one small doubt . Kindly answer it even if it sounds silly. No questions are silly.. Dont worry Iam using Map Reduce in HBase in distributed mode . I have a table which spans across 5 region

Re: Doubt in HBase

2009-08-20 Thread Jonathan Gray
What Amandeep said. Also, one clarification for you. You mentioned the reduce task moving map output across regionservers. Remember, HBase is just a MapReduce input source or output sink. The sort/shuffle/reduce is a part of Hadoop MapReduce and has nothing to do with HBase directly. It

Re: Doubt in HBase

2009-08-20 Thread bharath vissapragada
Aamandeep , Gray and Purtell thanks for your replies .. I have found them very useful. You said to increase the number of reduce tasks . Suppose the number of reduce tasks is more than number of distinct map output keys , some of the reduce processes may go waste ? is that the case? Also I have

Re: Doubt in HBase

2009-08-20 Thread john smith
Thanks for all your replies guys ,.As bharath said , what is the case when number of reducers becomes more than number of distinct Map key outputs? On Fri, Aug 21, 2009 at 9:39 AM, bharath vissapragada bharathvissapragada1...@gmail.com wrote: Aamandeep , Gray and Purtell thanks for your

Re: Doubt regarding mem cache.

2009-08-12 Thread Erik Holstad
Hi Rakhi! On Wed, Aug 12, 2009 at 11:49 AM, Rakhi Khatwani rkhatw...@gmail.comwrote: Hi, I am not very clear as to how does the mem cache thing works. MemCache was a name that was used and caused some confusion of what the purpose of it is. It has now been renamed to MemStore and is

Re: Doubt regarding Replication Factor

2009-08-12 Thread Konstantin Shvachko
You can try it: start a 3 node cluster and create a file with replication 5. The answer is that each data-node can store only one replica of a block. So in your case you will get an exception on close() saying the file cannot be fully replicated. Thanks, --Konstantin Rakhi Khatwani wrote: Hi,

Re: Doubt regarding Replication Factor

2009-08-12 Thread Tarandeep Singh
A similar question- If in an N node cluster, a file's replication is set to N (replicate on each node) and later if a node goes down, will HDFS throw an exception since the file's replication has gone down below the specified number ? Thanks, Tarandeep On Wed, Aug 12, 2009 at 12:11 PM,

Re: Doubt in implementing TableReduce Interface

2009-07-28 Thread Ninad Raut
the method looks fine. Put some logging inside the reduce method to trace the inputs to the reduce. Here's an example... change IntWritable to Text in your case... static class ReadTableReduce2 extends MapReduceBase implements TableReduceText, IntWritable{ SortedMapText, Text buzz = new

Re: Doubt regarding permissions

2009-04-13 Thread Tsz Wo (Nicholas), Sze
Hi Amar, I just have tried. Everything worked as expected. I guess user A in your experiment was a superuser so that he could read anything. Nicholas Sze /// permission testing // drwx-wx-wx - nicholas supergroup 0 2009-04-13 10:55

Re: Doubt in MultiFileWordCount.java

2008-09-29 Thread Arun C Murthy
On Sep 29, 2008, at 3:11 AM, Geethajini C wrote: Hi everyone, In the example MultiFileWordCount.java (hadoop-0.17.0), what happens when the statement JobClient.runJob(job);is executed. What methods will be called in sequence? This might help:

Re: Doubt in RegExpRowFilter and RowFilters in general

2008-02-11 Thread stack
Have you tried enabling DEBUG-level logging? Filters have lots of logging around state changes. Might help figure this issue. You might need to add extra logging around line #2401 in HStore. (I just spent some time trying to bend my head around whats going on. Filters are run at the Store

Re: Doubt in RegExpRowFilter and RowFilters in general

2008-02-11 Thread David Alves
Hi Again In my previous example I seem to have misplaced a new keyword (new myvalue1.getBytes() where it should have been myvalue1.getBytes()). On another note my program hangs when I supply my own filter to the scanner (I suppose it's clear that the nodes don't know my class so there should be