Please try this
for (DoubleArrayWritable avalue : values) {
Writable[] value = avalue.get();
// DoubleWritable[] value = new DoubleWritable[6];
// for(int k=0;k<6;k++){
// value[k] = DoubleWritable(wvalue[k]);
// }
//parse accordingly
if (Double.parseDouble(value[1].toString()) != 0) {
Hi
A developer should answer that but a quick look to an edit file with od
suggests that record are not fixed length. So maybe the likeliness of
the situation you suggest is so low that there is no need to check more
than file size
Ulul
Le 28/09/2014 11:17, Giridhar Addepalli a écrit :
Hi
:37:56 -0400
Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility with
old *mapred* APIs and new *mapreduce* APIs in Hadoop
From: john.meag...@gmail.com
To: user@hadoop.apache.org
Also, Source Compatibility also means ONLY a recompile is needed.
No code changes should
:56 -0400
Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility
with old *mapred* APIs and new *mapreduce* APIs in Hadoop
From: john.meag...@gmail.com
To: user@hadoop.apache.org
Also, Source Compatibility also means ONLY a recompile is needed.
No code changes should
2014 13:03:53 -0700
Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility with old
*mapred* APIs and new *mapreduce* APIs in Hadoop
From: zs...@hortonworks.com
To: user@hadoop.apache.org
1. If you have the binaries that were compiled against MRv1 mapred libs, it
should just work
file is execute it.
-RR
--
Date: Tue, 15 Apr 2014 13:03:53 -0700
Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility
with old *mapred* APIs and new *mapreduce* APIs in Hadoop
From: zs...@hortonworks.com
To: user@hadoop.apache.org
1. If you
Also, Source Compatibility also means ONLY a recompile is needed.
No code changes should be needed.
On Mon, Apr 14, 2014 at 10:37 AM, John Meagher john.meag...@gmail.com wrote:
Source Compatibility = you need to recompile and use the new version
as part of the compilation
Binary Compatibility
Certainly it is , and quite common especially if you have some high
performance machines : they can run as mapreduce slaves and also double as
mongo hosts. The problem would of course be that when running mapreduce
jobs you might have very slow network bandwidth at times, and if your front
end
thank s jay and praveen,
i want to use both separately don't want to use mongodb in the place of
hbase
On Wed, Mar 19, 2014 at 9:25 PM, Jay Vyas jayunit...@gmail.com wrote:
Certainly it is , and quite common especially if you have some high
performance machines : they can run as mapreduce
Why not ? Its just a matter of installing 2 different packages.
Depends on what do you want to use it for, you need to take care of few
things, but as far as installation is concerned, it should be easily doable.
Regards
Prav
On Wed, Mar 19, 2014 at 3:41 PM, sri harsha rsharsh...@gmail.com
I've installed a hadoop single node cluster on a VirtualBox machine running
ubuntu 12.04LTS (64-bit) with 512MB RAM and 8GB HD. I haven't seen any
errors in my testing yet. Is 1GB RAM required? Will I run into issues when
I expand the cluster?
On Sat, Jan 18, 2014 at 11:24 PM, Alexander
it' enough. hadoop uses only 1GB RAM by default.
On Sat, Jan 18, 2014 at 10:11 PM, sri harsha rsharsh...@gmail.com wrote:
Hi ,
i want to install 4 node cluster in 64-bit LINUX. 4GB RAM 500HD is enough
for this or shall i need to expand ?
please suggest about my query.
than x
--
amiable
The answer (a) is correct, in general.
On Wed, Nov 7, 2012 at 6:09 PM, Ramasubramanian Narayanan
ramasubramanian.naraya...@gmail.com wrote:
Hi,
Which of the following is correct w.r.t mapper.
(a) It accepts a single key-value pair as input and can emit any number of
key-value pairs as
Hi Rams,
A mapper will accept single key-value pair as input and can emit
0 or more key-value pairs based on what you want to do in mapper function
(I mean based on your business logic in mapper function).
But the framework will actually aggregate the list of values
Message -
From: Harsh J ha...@cloudera.com
To: common-user@hadoop.apache.org; Raj Vishwanathan rajv...@yahoo.com
Cc:
Sent: Saturday, August 25, 2012 4:02 AM
Subject: Re: doubt about reduce tasks and block writes
Raj's almost right. In times of high load or space fillup on a local
DN
Thanks, Raj you got exactly my point. I wanted to confirm this assumption as
I was guessing if a shared HDFS cluster with MR and Hbase like this would
make sense:
http://old.nabble.com/HBase-User-f34655.html
--
View this message in context:
Raj's almost right. In times of high load or space fillup on a local
DN, the NameNode may decide to instead pick a non-local DN for
replica-writing. In this way, the Node A may get a copy 0 of a
replica from a task. This is per the default block placement policy.
P.s. Note that HDFS hardly makes
Marc, see my inline comments.
On Fri, Aug 24, 2012 at 4:09 PM, Marc Sturlese marc.sturl...@gmail.comwrote:
Hey there,
I have a doubt about reduce tasks and block writes. Do a reduce task always
first write to hdfs in the node where they it is placed? (and then these
blocks would be
Assuming that node A only contains replica, there is no garante that its
data would never be read.
First, you might lose a replica. The copy inside the node A could be used
to create the missing replica again.
Second, data locality is on best effort. If all the map slots are occupied
except one on
But since node A has no TT running, it will not run map or reduce tasks. When
the reducer node writes the output file, the fist block will be written on the
local node and never on node A.
So, to answer the question, Node A will contain copies of blocks of all output
files. It wont contain the
Hi Manoj,
Reply inline.
On Mon, Aug 13, 2012 at 3:42 PM, Manoj Babu manoj...@gmail.com wrote:
Hi All,
Normal Hadoop job submission process involves:
Checking the input and output specifications of the job.
Computing the InputSplits for the job.
Setup the requisite accounting information
Hi Harsh,
Thanks for your reply.
Consider from my main program i am doing so
many activities(Reading/writing/updating non hadoop activities) before
invoking JobClient.runJob(conf);
Is it anyway to separate the process flow by programmatic instead of going
for any workflow engine?
Cheers!
Manoj.
Sure, you may separate the logic as you want it to be, but just ensure
the configuration object has a proper setJar or setJarByClass done on
it before you submit the job.
On Mon, Aug 13, 2012 at 4:43 PM, Manoj Babu manoj...@gmail.com wrote:
Hi Harsh,
Thanks for your reply.
Consider from my
On Wed, Apr 4, 2012 at 10:02 PM, Prashant Kommireddi prash1...@gmail.comwrote:
Hi Mohit,
What would be the advantage? Reducers in most cases read data from all
the mappers. In the case where mappers were to write to HDFS, a
reducer would still require to read data from other datanodes across
On Thu, Apr 5, 2012 at 7:03 AM, Mohit Anchlia mohitanch...@gmail.com wrote:
Only advantage I was thinking of was that in some cases reducers might be
able to take advantage of data locality and avoid multiple HTTP calls, no?
Data is anyways written, so last merged file could go on HDFS instead
Answers inline.
On Wed, Apr 4, 2012 at 4:56 PM, Mohit Anchlia mohitanch...@gmail.comwrote:
I am going through the chapter How mapreduce works and have some
confusion:
1) Below description of Mapper says that reducers get the output file using
HTTP call. But the description under The Reduce
Hi Mohit,
On Thu, Apr 5, 2012 at 5:26 AM, Mohit Anchlia mohitanch...@gmail.com wrote:
I am going through the chapter How mapreduce works and have some
confusion:
1) Below description of Mapper says that reducers get the output file using
HTTP call. But the description under The Reduce Side
On Wed, Apr 4, 2012 at 8:42 PM, Harsh J ha...@cloudera.com wrote:
Hi Mohit,
On Thu, Apr 5, 2012 at 5:26 AM, Mohit Anchlia mohitanch...@gmail.com
wrote:
I am going through the chapter How mapreduce works and have some
confusion:
1) Below description of Mapper says that reducers get the
Hi Mohit,
What would be the advantage? Reducers in most cases read data from all
the mappers. In the case where mappers were to write to HDFS, a
reducer would still require to read data from other datanodes across
the cluster.
Prashant
On Apr 4, 2012, at 9:55 PM, Mohit Anchlia
Narayanan,
On Fri, Jul 1, 2011 at 11:28 AM, Narayanan K knarayana...@gmail.com wrote:
Hi all,
We are basically working on a research project and I require some help
regarding this.
Always glad to see research work being done! What're you working on? :)
How do I submit a mapreduce job from
Narayanan,
On Fri, Jul 1, 2011 at 12:57 PM, Narayanan K knarayana...@gmail.com wrote:
So the report will be run from a different machine outside the cluster. So
we need a way to pass on the parameters to the hadoop cluster (master) and
initiate a mapreduce job dynamically. Similarly the output
Narayanan,
Regarding the client installation, you should make sure that client and
server use same version hadoop for submitting jobs and transfer data.
if you use a different user in client than the one runs hadoop job, config
the hadoop ugi property (sorry i forget the exact name).
在 2011 7 1
Udaya,
You can use non-local disk on your hadoop cloud, however it will have
sub-optimal performance, and you will have to tune accordingly.
If its a shared drive on all of your nodes, you need to create different
directories for each machine.
Suppose your shared drive is /foo then you
HOD supports a PBS environment, namely Torque. Torque is the vastly
improved fork of OpenPBS. You may be able to get HOD working on OpenPBS,
or better still persuade your cluster admins to upgrade to a more recent
version of Torque (e.g. at least 2.1.x)
Craig
On 22/07/28164 20:59, Udaya
Thank you Craig. My cluster has got Torque. Can you please point me
something which will have detailed explanation about using HOD on Torque.
On Tue, May 4, 2010 at 10:17 PM, Craig Macdonald cra...@dcs.gla.ac.ukwrote:
HOD supports a PBS environment, namely Torque. Torque is the vastly
improved
Udaya,
Following link will help you for HOD on torque.
http://hadoop.apache.org/common/docs/r0.20.0/hod_user_guide.html
Thanks,
---
Peeyush
On Tue, 2010-05-04 at 22:49 +0530, Udaya Lakshmi wrote:
Thank you Craig. My cluster has got Torque. Can you please point me
something which will have
On May 4, 2010, at 7:46 AM, Udaya Lakshmi wrote:
Hi,
I am given an account on a cluster which uses OpenPBS as the cluster
management software. The only way I can run a job is by submitting it to
OpenPBS. How to run mapreduce programs on it? Is there any possible work
around?
Take a look
Thank you.
Udaya.
On Wed, May 5, 2010 at 12:23 AM, Allen Wittenauer
awittena...@linkedin.comwrote:
On May 4, 2010, at 7:46 AM, Udaya Lakshmi wrote:
Hi,
I am given an account on a cluster which uses OpenPBS as the cluster
management software. The only way I can run a job is by
The SequenceFile is not text file, so you can not see the content by
invoking unix command cat.
But you can get the text content by using hadoop command : hadoop fs -text
src
On Sun, Feb 7, 2010 at 8:51 AM, Andiana Squazo Ringa
andriana.ri...@gmail.com wrote:
Hi,
I have written to a
Thanks a lot Jeff.
Ringa.
On Sun, Feb 7, 2010 at 10:30 PM, Jeff Zhang zjf...@gmail.com wrote:
The SequenceFile is not text file, so you can not see the content by
invoking unix command cat.
But you can get the text content by using hadoop command : hadoop fs -text
src
On Sun, Feb 7,
Hi,
Actually, I just made the change suggested by Aaron and my code worked. But I
still would like to know why does the setJarbyClass() method have to be called
when the Main class and the Map and Reduce classes are in the same package ?
Thank You
Abhishek Agrawal
SUNY- Buffalo
When you set up the Job object, do you call job.setJarByClass(Map.class)?
That will tell Hadoop which jar file to ship with the job and to use for
classloading in your code.
- Aaron
On Thu, Nov 26, 2009 at 11:56 PM, aa...@buffalo.edu wrote:
Hi,
I am running the job from command line. The
Do you run the map reduce job in command line or IDE? in map reduce mode,
you should put the jar containing the map and reduce class in your classpath
Jeff Zhang
On Fri, Nov 27, 2009 at 2:19 PM, aa...@buffalo.edu wrote:
Hello Everybody,
I have a doubt in Haddop and was
Hi,
I am running the job from command line. The job runs fine in the local mode
but something happens when I try to run the job in the distributed mode.
Abhishek Agrawal
SUNY- Buffalo
(716-435-7122)
On Fri 11/27/09 2:31 AM , Jeff Zhang zjf...@gmail.com sent:
Do you run the map reduce job
But reducer can do some preparations during map process. It can
distribute map output across nodes that will work as reducers.
Copying and sorting map output is also time costuming process (maybe,
more consuming than reduce itself). For example, piece job run log on
40node cluster
could be
Well the inputs to those reducers would be the empty set, they
wouldn't have anything to do and their output would also be nil as
well.
If you are doing something like this, and your operation is
communitive, consider using a combiner so that you don't shuffle as
much data. A large amount of
hey,
Yes the hadoop system attempts to assign map tasks to data local, but
why would you be worried about this for 5 values? The max value size
in hbase is Integer.MAX_VALUE, so it's not like you have much data to
shuffle. Once your blobs ~ 64mb or so, it might make more sense to
use HDFS
Thanks Ryan
I was just explaining with an example .. I have TBs of data to work
with.Just i wanted to know that scheduler TRIES to assign the reduce phase
to keep the data local (i.e.,TRYING to assign it to the machine with
machine with greater num of key values).
I was just explaining it with
Ryan,
In older versions of HBase, when we did not attempt any data locality,
we had a few users running jobs that became network i/o bound. It
wasn't a latency issue it was a bandwidth issue.
That's actually when/why an attempt at better data locality for HBase MR
was made in the first
JG
Can you please elaborate on the last statement for some.. by giving an
example or some kind of scenario in which it can take place where MR jobs
involve huge amount of data.
Thanks.
On Fri, Aug 21, 2009 at 11:24 PM, Jonathan Gray jl...@streamy.com wrote:
Ryan,
In older versions of HBase,
I really couldn't be specific.
The more data that has to be moved across the wire, the more network i/o.
For example, if you have very large values, and a very large table, and
you have that as the input to your MR. You could potentially be network
i/o bound.
It should be very easy to test
JG,
In one of your above replies , you have said that datalocality was not
considered in older versions of HBase , Is there any development on the
same in 0.20 RC1/2 or 0.19.x ? If no can you tell me where that patch can be
available so that i can test my programs .
Thanks in advance
On Sat,
On Thu, Aug 20, 2009 at 9:42 AM, john smith js1987.sm...@gmail.com wrote:
Hi all ,
I have one small doubt . Kindly answer it even if it sounds silly.
No questions are silly.. Dont worry
Iam using Map Reduce in HBase in distributed mode . I have a table which
spans across 5 region
What Amandeep said.
Also, one clarification for you. You mentioned the reduce task moving
map output across regionservers. Remember, HBase is just a MapReduce
input source or output sink. The sort/shuffle/reduce is a part of
Hadoop MapReduce and has nothing to do with HBase directly. It
Aamandeep , Gray and Purtell thanks for your replies .. I have found them
very useful.
You said to increase the number of reduce tasks . Suppose the number of
reduce tasks is more than number of distinct map output keys , some of the
reduce processes may go waste ? is that the case?
Also I have
Thanks for all your replies guys ,.As bharath said , what is the case when
number of reducers becomes more than number of distinct Map key outputs?
On Fri, Aug 21, 2009 at 9:39 AM, bharath vissapragada
bharathvissapragada1...@gmail.com wrote:
Aamandeep , Gray and Purtell thanks for your
Hi Rakhi!
On Wed, Aug 12, 2009 at 11:49 AM, Rakhi Khatwani rkhatw...@gmail.comwrote:
Hi,
I am not very clear as to how does the mem cache thing works.
MemCache was a name that was used and caused some confusion of what the
purpose of it is.
It has now been renamed to MemStore and is
You can try it: start a 3 node cluster and create a file with replication 5.
The answer is that each data-node can store only one replica of a block.
So in your case you will get an exception on close() saying the file cannot
be fully replicated.
Thanks,
--Konstantin
Rakhi Khatwani wrote:
Hi,
A similar question-
If in an N node cluster, a file's replication is set to N (replicate on each
node) and later if a node goes down, will HDFS throw an exception since the
file's replication has gone down below the specified number ?
Thanks,
Tarandeep
On Wed, Aug 12, 2009 at 12:11 PM,
the method looks fine. Put some logging inside the reduce method to trace
the inputs to the reduce. Here's an example... change IntWritable to Text
in your case...
static class ReadTableReduce2 extends MapReduceBase implements
TableReduceText, IntWritable{
SortedMapText, Text buzz = new
Hi Amar,
I just have tried. Everything worked as expected. I guess user A in your
experiment was a superuser so that he could read anything.
Nicholas Sze
/// permission testing //
drwx-wx-wx - nicholas supergroup 0 2009-04-13 10:55
On Sep 29, 2008, at 3:11 AM, Geethajini C wrote:
Hi everyone,
In the example MultiFileWordCount.java
(hadoop-0.17.0), what happens when the statement
JobClient.runJob(job);is executed. What methods will be
called in sequence?
This might help:
Have you tried enabling DEBUG-level logging? Filters have lots of
logging around state changes. Might help figure this issue. You might
need to add extra logging around line #2401 in HStore.
(I just spent some time trying to bend my head around whats going on.
Filters are run at the Store
Hi Again
In my previous example I seem to have misplaced a new keyword (new
myvalue1.getBytes() where it should have been myvalue1.getBytes()).
On another note my program hangs when I supply my own filter to the
scanner (I suppose it's clear that the nodes don't know my class so
there should be
64 matches
Mail list logo