Re: Sqoop issue related to Hadoop

2013-08-28 Thread bejoy . hadoop
Hi Raj The easiest approach to pull out task log is using JT web UI. Got to JT web UI, drill down on the sqoop job. You'll get a list of failed/killed tasks, your failed thask would be in there. Clicking on that task would give you the logs for the same. Regards Bejoy KS Sent from remote de

Re: New hadoop 1.2 single node installation giving problems

2013-07-23 Thread bejoy . hadoop
Hi Ashish In your hdfs-site.xml within tag you need to have the tag and inside a tag you can have , and tags. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Ashish Umrani Date: Tue, 23 Jul 2013 09:28:00 To: Reply-To: user@hadoop.apache

Re:

2013-05-30 Thread bejoy . hadoop
Job, You need to set in on every hive session/CLI client. This property is a job level one and it is used to indicate which pool/queue a job should be submitted on to. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: "Job Thomas" Date: Thu, 30

Re: Hadoop Installation Mappers setting

2013-05-23 Thread bejoy . hadoop
When you take a mapreduce tasks, you need CPU cycles to do the processing, not just memory. So ideally based on the processor type(hyperthreaded or not) compute the available cores. Then may be compute as, one core for each task slot. Regards Bejoy KS Sent from remote device, Please excuse ty

Re: Hadoop Installation Mappers setting

2013-05-23 Thread bejoy . hadoop
Hi I assume the question is on how many slots. It dependents on - the child/task jvm size and the available memory. - available number of cores Your available memory for tasks is total memory - memory used for OS and other services running on your box. Other services include non hadoop serv

Re: Viewing snappy compressed files

2013-05-21 Thread bejoy . hadoop
If you have snappy codec in io.compression.codecs then you can easily decompress the data out of hdfs directly with a simple command. hadoop fs -text Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Jean-Marc Spaggiari Date: Tue, 21 May 2013 12:

Re: Basic Doubt in Hadoop

2013-04-17 Thread bejoy . hadoop
You are correct, map outputs are stored in LFS not in HDFS. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Ramesh R Nair Date: Wed, 17 Apr 2013 13:06:32 To: ; Subject: Re: Basic Doubt in Hadoop Hi Bejoy, Regarding the output of Map phase,

Re: How to balance reduce job

2013-04-16 Thread bejoy . hadoop
Yes, That is a valid point. The partitioner might do non uniform distribution and reducers can be unevenly loaded. But this doesn't change the number of reducers and its distribution across nodes. The bottom issue as I understand is that his reduce tasks are scheduled on just a few nodes. Reg

Re: How to balance reduce job

2013-04-16 Thread bejoy . hadoop
Uniform Data distribution across HDFS is one of the factor that ensures map tasks are uniformly distributed across nodes. But reduce tasks doesn't depend on data distribution it is purely based on slot availability. Regards Bejoy KS Sent from remote device, Please excuse typos -Original

Re: How to balance reduce job

2013-04-16 Thread bejoy . hadoop
Hi Rauljin Few things to check here. What is the number of reduce slots in each Task Tracker? What is the number of reduce tasks for your job? Based on the availability of slots the reduce tasks are scheduled on TTs. You can do the following Set the number of reduce tasks to 8 or more. Play wi

Re: Basic Doubt in Hadoop

2013-04-16 Thread bejoy . hadoop
The data is in HDFS in case of WordCount MR sample. In hdfs, you have the metadata in NameNode and actual data as blocks replicated across DataNodes. In case of reducer, If a reducer is running on a particular node then you have one replica of the blocks in the same node (If there is no space

Re: How to configure mapreduce archive size?

2013-04-16 Thread bejoy . hadoop
Also, You need to change the value for 'local.cache.size' in core-site.x.l not in core-default.xml. If you need to override any property in config files do it in *-site.xml not in *-default.xml. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: b

Re: How to configure mapreduce archive size?

2013-04-16 Thread bejoy . hadoop
You can get your Job.xml for each jobs from The JT web UI. Click on the job, on the specific job page you'll get this. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Date: Tue, 16 Apr 2013 12:45:26 To: Reply-To: user@hadoop.apache.org Subject:

Re: VM reuse!

2013-04-16 Thread bejoy . hadoop
Hi Rahul AFAIK there is no guarantee that 1 task would be on N1 and another on N2. Both can be on N1 as well. JT has no notion of JVM reuse. It doesn't consider that for task scheduling. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Rahul Bha

Re: Question regarding hadoop jar command usage

2013-03-13 Thread bejoy . hadoop
Hi Any node would submit the job to JobTracker which distributes the jar to TaskTrackers and individual tasks are executed on nodes across the cluster. MR tasks are executed across the cluster. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Kay

Re: adding space on existing datanode ?

2013-02-25 Thread bejoy . hadoop
Hi Brice By adding a new storage location to dfs.data.dir you are not incrementing the replication factor. You are giving one mode location for the blocks to be copied for that data node. There is no new DataNode added. A new data node would be live only if tweak your configs and start a new D

Re: Newbie Debuggin Question

2013-02-21 Thread bejoy . hadoop
Hi Sai The location you are seeing should be the mapred.local.dir . From my understanding the files in distributed cache would be available in that location while you are running the job and would be cleaned up at the end of it. Regards Bejoy KS Sent from remote device, Please excuse typos

Re: ISSUE :Hadoop with HANA using sqoop

2013-02-21 Thread bejoy . hadoop
Hi Samir Looks like there is some syntax issue with the sql query generated internally . Can you try doing a Sqoop import by specifying the query with -query option. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: samir das mohapatra Date: Thu,

Re: ISSUE :Hadoop with HANA using sqoop

2013-02-20 Thread bejoy . hadoop
Hi Sameer The query "SELECT t.* FROM hgopalan.hana_training AS t WHERE 1=0" Is first executed by SQOOP to fetch the metadata. The actual data fetch happens as part of individual queries from each task which would be a sub query of the whole input query. Regards Bejoy KS Sent from remote d

Re: Submitting MapReduce job from remote server using JobClient

2013-01-24 Thread bejoy . hadoop
Hi Amit, Apart for the hadoop jars, Do you have the same config files ($HADOOP_HOME/conf) that are in the cluster on your analytics server as well? If you are having the default config files in analytics server then your MR job would be running locally and not on the cluster. Regards Bejoy K

Re: Where do/should .jar files live?

2013-01-22 Thread bejoy . hadoop
Hi Chris In larger clusters it is better to have an edge/client node where all the user jars reside and you trigger your MR jobs from here. A client/edge node is a server with hadoop jars and conf but hosting no daemons. In smaller clusters one DN might act as the client node and you can execut

Re: Hadoop Cluster

2013-01-22 Thread bejoy . hadoop
Hi Savitha HA is a new feature n hadoop introduced in Hadoop 2.x releases. So It is a new feature on top of Hadoop cluster. Ganglia is one of the widely used tools to monitor the cluster in detail. On a basic hdfs and mapreduce level, the JobTracker and NameNode web UI would give you a good co

Re: modifying existing wordcount example

2013-01-16 Thread bejoy . hadoop
Hi Jamal You can use Distributed Cache only if the file to be distributed is small. Mapreduce should be dealing with larger datasets so you should expect the output file to get larger. In simple straight forward manner. You can get the second data set processed then merge the fist output with

Re: hadoop namenode recovery

2013-01-14 Thread bejoy . hadoop
Hi Panshul SecondaryNameNode is rather known as check point node. At periodic intervals it merges the editlog from NN with FS image to prevent the edit log from growing too large. This is its main functionality. At any point the SNN would have the latest fs image but not the updated edit log.

Re: Map Failure reading .gz (gzip) files

2013-01-14 Thread bejoy . hadoop
Hi Terry When the file is unzipped and zipped, what is the number of map tasks running in each case? If the file is large, I assume the below should be the case. gz is not splttable compression codec so the whole file would be processed by a single mapper. And this might be causing the job to

Re: probably very stupid question

2013-01-14 Thread bejoy . hadoop
Hi Jamal I believe a reduce side join is what you are looking for. You can use MultipleInputs and achieve a reduce side join to achieve this. http://kickstarthadoop.blogspot.com/2011/09/joins-with-plain-map-reduce.html Regards Bejoy KS Sent from remote device, Please excuse typos -Origi

Re: hadoop namenode recovery

2013-01-14 Thread bejoy . hadoop
Hi Panshul, Usually for reliability there will be multiple dfs.name.dir configured. Of which one would be a remote location such as a nfs mount. So that even if the NN machine crashes on a whole you still have the fs image and edit log in nfs mount. This can be utilized for reconstructing the

Re: I am running MapReduce on a 30G data on 1master/2 slave, but failed.

2013-01-10 Thread bejoy . hadoop
Hi To add on to Harsh's comments. You need not have to change the task time out. In your map/reduce code, you can increment a counter or report status intermediate on intervals so that there is communication from the task and hence won't have a task time out. Every map and reduce task run on

Re: Writing a sequence file

2013-01-04 Thread bejoy . hadoop
Hi Peter Did you ensure that using SequenceFileOutputFormat from the right package? Based on the API you are using, mapred or mapreduce you need to use the OutputFormat from the corresponding package. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- Fr

Re: more reduce tasks

2013-01-03 Thread bejoy . hadoop
Hi Chen, You do have an option in hadoop to achieve this if you want the merged file in LFS. 1) Run your job with n number of reducers. And you'll have n files in the output dir. 2) Issue a hadoop fs -getmerge command to get the files in output dir merged into a single file in LFS (In recent

Re: Reg: No space left on device Exception

2012-12-07 Thread bejoy . hadoop
Hi Manoj Go to the JT web UI, browse to the failed tasks. Identify which task threw the space related error. Ssh to that node and check the disk space on that node. Some partitions might have got 100% filled. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message

Re: Problem using distributed cache

2012-12-07 Thread bejoy . hadoop
Hi Peter Can you try the following in your code 1. Driver class to implement Tools interface 2. Do a getConfiguration() rather than creating a new conf instance. DC should be working with the above mentioned modifications to code. Sent on my BlackBerry® from Vodafone -Original Message-