Thanks a lot wellington and bejoy for your inputs will try out this api and
sequence file
On Wed, Oct 5, 2011 at 1:17 AM, Wellington Chevreuil <
wellington.chevre...@gmail.com> wrote:
> Yes, Sadak,
>
> Within this API, you'll copy your files into Hadoop HDFS as you do
> when writing to an Out
Yes, Sadak,
Within this API, you'll copy your files into Hadoop HDFS as you do
when writing to an OutputStream. It will be replicated in your
cluster's HDFS then.
Cheers.
2011/10/4 visioner sadak :
> Hey thanks wellington just a thought will my data be replicated as well coz
> i thought tht mapp
Yes Sadak. The API would do the splitting for you, no need of MR for that.
It'd be better keeping the file sizes atleast same as an hdfs block size.
Sequence file is definitely a good choice. If you are looking out for a
process and then archival of input then look into HAR (hadoop archives as
well
Hi Sadak
You really don't need to fire a map reduce job to copy files from
a local file system to hdfs. You can do it in two easy ways
*Using linux CLI* - if you are going in with a shell script. The most
convenient option and handy.
hadoop fs -copyFromLocal
*Using JAVA API*
//load t
Hey thanks wellington just a thought will my data be replicated as well coz
i thought tht mapper does the job of breaking data in to pieces and
distribution and reducer will do the joining and combining while fetching
data back thts why was confused to use a MR..can i use this API for
uploading a l
Hey Sadak,
you don't need to write a MR job for that. You can make your java
program use Hadoop Java API for that. You would need to use FileSystem
(http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html)
and Path
(http://hadoop.apache.org/common/docs/current/api/in
Hello guys,
I would like to know how to do file uploads in HDFS using
java,is it to be done using map reduce what if i have a large number of
small files should i use sequence file along with map reduce???,It will be
great if you can provide some sort of information...
Hi,
We have class hierarchy for output value for both mapper as well as reducer
class as parent (abstract class) , child1,child2,…
We have mapper class which is
specified with its outputvalue class as parent class ; the map function
will emit either child1 or child2 depending on the logi
Yes. The reason I pointed it to him was that it seems like he's trying
to do something with Hadoop for which MapReduce may not be right
execution model. Yarn/MRv2 gives you the ability to try other
execution models. As you pointed out, it may require some extra
development, but it is more flexible
Joey,
Is yarn just a synonym for MRv2? And if so he would still have to create a
custom application master for his job type right?
Matt
-Original Message-
From: Joey Echeverria [mailto:j...@cloudera.com]
Sent: Tuesday, October 04, 2011 11:06 AM
To: mapreduce-user@hadoop.apache.org
Subj
As long as your reduce task can kick off the MR job asynchronously then it
shouldn't be too much of an issue but it could very quickly result in a
deadlock otherwise. If you set this up as two stages 1) to kick off the
recursive MR and 2) analyze the final result set then it should work but off
You may want to check out Yarn, coming in Hadoop 0.23:
https://issues.apache.org/jira/browse/MAPREDUCE-279
-Joey
On Tue, Oct 4, 2011 at 11:45 AM, Yaron Gonen wrote:
> Hi,
> Hadoop tasks are always stacked to form a linear user-managed workflow (a
> reduce step cannot start before all previous m
Hi,
Hadoop tasks are always stacked to form a linear user-managed workflow (a
reduce step cannot start before all previous mappers have stopped etc). This
may be problematic in recursive tasks: for example in a BFS we will not get
any output until the longest branch has been reached.
In order to so
13 matches
Mail list logo