Re: How can I record some position of context in Reduce()?

2013-04-08 Thread Vikas Jadhav
Hi I am also woring on join using MapReduce i think instead of finding postion of table in RawKeyValuIterator. what we can do modify context.write method to alway write key as table name or id then we dont need to find postion we can get Key and Value from "reducerContext" befor calling reducer.ru

Distributed cache: how big is too big?

2013-04-08 Thread John Meza
I am researching a Hadoop solution for an existing application that requires a directory structure full of data for processing. To make the Hadoop solution work I need to deploy the data directory to each DN when the job is executed.I know this isn't new and commonly done with a Distributed Cach

Re: Problem accessing HDFS from a remote machine

2013-04-08 Thread Rishi Yadav
have you checked firewall on namenode. If you are running ubuntu and namenode port is 8020 command is -> ufw allow 8020 Thanks and Regards, Rishi Yadav InfoObjects Inc || http://www.infoobjects.com *(Big Data Solutions)* On Mon, Apr 8, 2013 at 6:57 PM, Azuryy Yu wrote: > can you use command

RE: mr default=local?

2013-04-08 Thread John Meza
Harsh, thanks for the quick reply.While I am a Hadoop-newbie, I find I am explaining Hadoop install, config, job processing to newer-newbies. Thus the desire and need for more details.John > From: ha...@cloudera.com > Date: Tue, 9 Apr 2013 09:16:49 +0530 > Subject: Re: mr default=local? > To: use

Re: How to configure mapreduce archive size?

2013-04-08 Thread Hemanth Yamijala
Hi, This directory is used as part of the 'DistributedCache' feature. ( http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). There is a configuration key "local.cache.size" which controls the amount of data stored under DistributedCache. The default limit is 10GB. However,

Re:RES: I want to call HDFS REST api to upload a file using httplib.

2013-04-08 Thread ??????PHP
Really Thanks. But the returned URL is wrong. And the localhost is the real URL, as i tested successfully with curl using "localhost". Can anybody help me translate the curl to Python httplib? curl -i -X PUT -T "http://:/webhdfs/v1/?op=CREATE" I test it using python httplib, and receive the righ

Re: mr default=local?

2013-04-08 Thread Harsh J
Hey John, Sorta unclear on what is prompting this question (to answer it more specifically) but my response below: On Tue, Apr 9, 2013 at 9:05 AM, John Meza wrote: > The default mode for hadoop is Standalone, PsuedoDistributed and Fully > Distributed modes. It is configured for Psuedo and Fully

mr default=local?

2013-04-08 Thread John Meza
The default mode for hadoop is Standalone, PsuedoDistributed and Fully Distributed modes. It is configured for Psuedo and Fully Distributed via configuration file, but defaults to Standalone otherwise (correct?). Question about the -defaulting- mechanism:-Does it get the -default- configurat

Re: Best format to use

2013-04-08 Thread Harsh J
Hey Mark, Gzip codec creates extension .gzip, not .deflate (which is DeflateCodec). You may want to re-check your settings. Impala questions are best resolved at its current user and developer community at https://groups.google.com/a/cloudera.org/forum/#!forum/impala-user. Impala does currently s

Re: Problem accessing HDFS from a remote machine

2013-04-08 Thread Azuryy Yu
can you use command "jps" on your localhost to see if there is NameNode process running? On Tue, Apr 9, 2013 at 2:27 AM, Bjorn Jonsson wrote: > Yes, the namenode port is not open for your cluster. I had this problem > to. First, log into your namenode and do netstat -nap to see what ports are >

Re: Best format to use

2013-04-08 Thread Azuryy Yu
impala can work with compressed files, but it's sequence file, not compressed directly. On Tue, Apr 9, 2013 at 7:48 AM, Mark wrote: > Trying to determine what the best format to use for storing daily logs. We > recently switch from snappy (.snappy) to gzip (.deflate) but I'm wondering > if ther

Best format to use

2013-04-08 Thread Mark
Trying to determine what the best format to use for storing daily logs. We recently switch from snappy (.snappy) to gzip (.deflate) but I'm wondering if there is something better? Our main clients for these daily logs are pig and hive using an external table. We were thinking about testing out i

how to install hadoop 2.0.3 in standalone mode

2013-04-08 Thread jim jimm
I am new to hadoop and maven. I will like to compile the hadoop from the source and install it. I am following instructions from http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html So far, i have managed to download hadoop source code and from the source dire

Re: Problem accessing HDFS from a remote machine

2013-04-08 Thread Bjorn Jonsson
Yes, the namenode port is not open for your cluster. I had this problem to. First, log into your namenode and do netstat -nap to see what ports are listening. You can do service --status-all to see if the namenode service is running. Basically you need Hadoop to bind to the correct ip (an external

How to configure mapreduce archive size?

2013-04-08 Thread Xia_Yang
Hi, I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. There is some mapreduce job running on my server. After some time, I found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size. How to configure this and limit the size? I do not want to waste my sp

Problem accessing HDFS from a remote machine

2013-04-08 Thread Saurabh Jain
Hi All, I have setup a single node cluster(release hadoop-1.0.4). Following is the configuration used - core-site.xml :- fs.default.name hdfs://localhost:54310 masters:- localhost slaves:- localhost I am able to successfully format the Namenode and perform files system operation

Question about RPM Hadoop disto user and keys (re-posting with subject this time)

2013-04-08 Thread Edd Grant
Hi all, I'm new to Hadoop and am posting my first message on this list. I have downloaded and installed the hadoop_1.1.1-1_x86_64.deb distro and have a couple of issues which are blocking me from progressing. I'm working through the 'Hadoop - The Definitive Guide' book and am trying to set up a t

Re: Parsing the JobTracker Job Logs

2013-04-08 Thread Christian Schneider
That's nice! Thank you very much. No i try to get flume to work. It should collect all the files, (also the log files from the Task Tracker). Best Regards, Christian. 2013/3/28 Arun C Murthy > Use 'rumen', it's part of Hadoop. > > On Mar 19, 2013, at 3:56 AM, Christian Schneider wrote: > > Hi

RES: I want to call HDFS REST api to upload a file using httplib.

2013-04-08 Thread MARCOS MEDRADO RUBINELLI
On your first call, Hadoop will return a URL pointing to a datanode in the Location header of the 307 response. On your second call, you have to use that URL instead of constructing your own. You can see the specific documentation here: http://hadoop.apache.org/docs/r1.0.4/webhdfs.html#CREATE R

[no subject]

2013-04-08 Thread Edd Grant
Hi all, I'm new to Hadoop and am posting my first message on this list. I have downloaded and installed the hadoop_1.1.1-1_x86_64.deb distro and have a couple of issues which are blocking me from progressing. I'm working through the 'Hadoop - The Definitive Guide' book and am trying to set up a t