Hi
I am also woring on join using MapReduce
i think instead of finding postion of table in RawKeyValuIterator.
what we can do modify context.write method to alway write key as table name
or id
then we dont need to find postion we can get Key and Value from
"reducerContext"
befor calling reducer.ru
I am researching a Hadoop solution for an existing application that requires a
directory structure full of data for processing.
To make the Hadoop solution work I need to deploy the data directory to each DN
when the job is executed.I know this isn't new and commonly done with a
Distributed Cach
have you checked firewall on namenode.
If you are running ubuntu and namenode port is 8020 command is
-> ufw allow 8020
Thanks and Regards,
Rishi Yadav
InfoObjects Inc || http://www.infoobjects.com *(Big Data Solutions)*
On Mon, Apr 8, 2013 at 6:57 PM, Azuryy Yu wrote:
> can you use command
Harsh, thanks for the quick reply.While I am a Hadoop-newbie, I find I am
explaining Hadoop install, config, job processing to newer-newbies. Thus the
desire and need for more details.John
> From: ha...@cloudera.com
> Date: Tue, 9 Apr 2013 09:16:49 +0530
> Subject: Re: mr default=local?
> To: use
Hi,
This directory is used as part of the 'DistributedCache' feature. (
http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
There is a configuration key "local.cache.size" which controls the amount
of data stored under DistributedCache. The default limit is 10GB. However,
Really Thanks.
But the returned URL is wrong. And the localhost is the real URL, as i tested
successfully with curl using "localhost".
Can anybody help me translate the curl to Python httplib?
curl -i -X PUT -T
"http://:/webhdfs/v1/?op=CREATE"
I test it using python httplib, and receive the righ
Hey John,
Sorta unclear on what is prompting this question (to answer it more
specifically) but my response below:
On Tue, Apr 9, 2013 at 9:05 AM, John Meza wrote:
> The default mode for hadoop is Standalone, PsuedoDistributed and Fully
> Distributed modes. It is configured for Psuedo and Fully
The default mode for hadoop is Standalone, PsuedoDistributed and Fully
Distributed modes. It is configured for Psuedo and Fully Distributed via
configuration file, but defaults to Standalone otherwise (correct?).
Question about the -defaulting- mechanism:-Does it get the -default-
configurat
Hey Mark,
Gzip codec creates extension .gzip, not .deflate (which is
DeflateCodec). You may want to re-check your settings.
Impala questions are best resolved at its current user and developer
community at https://groups.google.com/a/cloudera.org/forum/#!forum/impala-user.
Impala does currently s
can you use command "jps" on your localhost to see if there is NameNode
process running?
On Tue, Apr 9, 2013 at 2:27 AM, Bjorn Jonsson wrote:
> Yes, the namenode port is not open for your cluster. I had this problem
> to. First, log into your namenode and do netstat -nap to see what ports are
>
impala can work with compressed files, but it's sequence file, not
compressed directly.
On Tue, Apr 9, 2013 at 7:48 AM, Mark wrote:
> Trying to determine what the best format to use for storing daily logs. We
> recently switch from snappy (.snappy) to gzip (.deflate) but I'm wondering
> if ther
Trying to determine what the best format to use for storing daily logs. We
recently switch from snappy (.snappy) to gzip (.deflate) but I'm wondering if
there is something better? Our main clients for these daily logs are pig and
hive using an external table. We were thinking about testing out i
I am new to hadoop and maven. I will like to compile the hadoop from the
source and install it. I am following instructions from
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
So far, i have managed to download hadoop source code and from the source
dire
Yes, the namenode port is not open for your cluster. I had this problem to.
First, log into your namenode and do netstat -nap to see what ports are
listening. You can do service --status-all to see if the namenode service
is running. Basically you need Hadoop to bind to the correct ip (an
external
Hi,
I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3.
There is some mapreduce job running on my server. After some time, I found that
my folder /tmp/hadoop-root/mapred/local/archive has 14G size.
How to configure this and limit the size? I do not want to waste my sp
Hi All,
I have setup a single node cluster(release hadoop-1.0.4). Following is the
configuration used -
core-site.xml :-
fs.default.name
hdfs://localhost:54310
masters:-
localhost
slaves:-
localhost
I am able to successfully format the Namenode and perform files system
operation
Hi all,
I'm new to Hadoop and am posting my first message on this list. I have
downloaded and installed the hadoop_1.1.1-1_x86_64.deb distro and have a
couple of issues which are blocking me from progressing.
I'm working through the 'Hadoop - The Definitive Guide' book and am trying
to set up a t
That's nice! Thank you very much.
No i try to get flume to work. It should collect all the files, (also the
log files from the Task Tracker).
Best Regards,
Christian.
2013/3/28 Arun C Murthy
> Use 'rumen', it's part of Hadoop.
>
> On Mar 19, 2013, at 3:56 AM, Christian Schneider wrote:
>
> Hi
On your first call, Hadoop will return a URL pointing to a datanode in the
Location header of the 307 response. On your second call, you have to use that
URL instead of constructing your own. You can see the specific documentation
here:
http://hadoop.apache.org/docs/r1.0.4/webhdfs.html#CREATE
R
Hi all,
I'm new to Hadoop and am posting my first message on this list. I have
downloaded and installed the hadoop_1.1.1-1_x86_64.deb distro and have a
couple of issues which are blocking me from progressing.
I'm working through the 'Hadoop - The Definitive Guide' book and am trying
to set up a t
20 matches
Mail list logo