Re: All nodes are not used

2016-08-09 Thread Mahesh Balija
Hi Madhav, The behaviour to me sounds normal. If the Block Size is 128 MB there could possibly be ~24 Mappers (i.e., containers used). You cannot use entire cluster as the blocks could be only in the nodes being used. You should not try using the entire cluster resources for following reason

Re: reducer gets values with empty attributes

2013-04-30 Thread Mahesh Balija
Hi Alex, Can you please attach your code? and the sample input data. Best, Mahesh Balija, Calsoft Labs. On Tue, Apr 30, 2013 at 2:29 AM, alx...@aim.com wrote: Hello, I try to write mapreduce program in hadoop -1.0.4. using mapred libs. I have a map function which gets

Re: namenode memory test

2013-04-24 Thread Mahesh Balija
Can you manually go into the directory configured for hadoop.tmp.dir under core-site.xml and do an ls -l to find the disk usage details, it will have fsimage, edits, fstime, VERSION. or the basic commands like, hadoop fs -du hadoop fsck On Wed, Apr 24, 2013 at 7:56 AM, 自己 zx4866...@163.com

Re: Hadoop sampler related query!

2013-04-24 Thread Mahesh Balija
of the whole program. Best, Mahesh Balija, Calsoft Labs. On Wed, Apr 24, 2013 at 12:37 PM, Rahul Bhattacharjee rahul.rec@gmail.com wrote: Thanks for the response Mahesh. I thought of this , but do not know why is this limitation. While sampling to pick up certain records and run our logic over

Re: Hadoop sampler related query!

2013-04-23 Thread Mahesh Balija
based on the Mapper outkey type. Best, Mahesh Balija, CalsoftLabs. On Tue, Apr 23, 2013 at 4:12 PM, Rahul Bhattacharjee rahul.rec@gmail.com wrote: + mapred dev On Tue, Apr 16, 2013 at 2:19 PM, Rahul Bhattacharjee rahul.rec@gmail.com wrote: Hi, I have a question related

Re: Need help optimizing reducer

2013-03-05 Thread Mahesh Balija
be faster upto 66%. In order to speed up your program you may either have to have more number of reducers or make your reducer code as optimized as possible. Best, Mahesh Balija, Calsoft Labs. On Tue, Mar 5, 2013 at 1:27 AM, Austin Chungath austi...@gmail.com wrote: Hi all, I have 1 reducer

Re: Running terasort with 1 map task

2013-02-26 Thread Mahesh Balija
does passing the dfs.block.size=134217728 resolves your issue? or is it something else fixed your problem? On Tue, Feb 26, 2013 at 6:04 PM, Arindam Choudhury arindamchoudhu...@gmail.com wrote: sorry my bad, it solved On Tue, Feb 26, 2013 at 1:22 PM, Arindam Choudhury

Re: WordPairCount Mapreduce question.

2013-02-25 Thread Mahesh Balija
the keys are sorted, because of this implementation the records are read from the stream directly and sorted without the need to deserializing them into Objects. Best, Mahesh Balija, CalsoftLabs. On Sun, Feb 24, 2013 at 5:01 PM, Sai Sai saigr...@yahoo.in wrote: Thanks Mahesh for your help

Re: WordPairCount Mapreduce question.

2013-02-23 Thread Mahesh Balija
Please check the in-line answers... On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai saigr...@yahoo.in wrote: Hello I have a question about how Mapreduce sorting works internally with multiple columns. Below r my classes using 2 columns in an input file given below. 1st question: About the

Re: Regarding Hadoop

2013-02-14 Thread Mahesh Balija
in the Hadoop eco-system includes Mahout, Hive, Pig etc has their own applications. One important note is that Hadoop run on a commodity hardware. Best, Mahesh Balija, Calsoft Labs. On Fri, Feb 15, 2013 at 12:08 PM, SrinivasaRao Kongar ksrinu...@gmail.comwrote: Hi sir, What is Hadoop technology

Re: number input files to mapreduce job

2013-02-12 Thread Mahesh Balija
Hi Vikas, You can get the FileSystem instance by calling FileSystem.get(Configuration); Once you get the FileSystem instance you can use FileSystem.listStatus(InputPath); to get the fileStatus instances. Best, Mahesh Balija, Calsoft Labs. On Tue, Feb 12, 2013

Re: fresher in hadoop

2013-02-10 Thread Mahesh Balija
The best way is to first learn the concepts thoroughly and then if you like you can also contribute to Hadoop projects. After than prolly it is better to find some BigData based projects. Best, Mahesh Balija, CalsoftLabs. On Mon, Feb 11, 2013 at 10:32 AM, Monkey2Code monkey2c...@gmail.com wrote

Re: SequenceFileOutputFormat - Wrong Key Class

2013-02-03 Thread Mahesh Balija
as key, value. You should get to know through the API documentation. So make sure that you are using right key value pairs. Thanks, Mahesh Balija, CalsoftLabs. On Fri, Feb 1, 2013 at 10:41 PM, Anbarasan Murthy anbu992...@hotmail.comwrote: I am getting the following Exception message when i

Re: SequenceFileOutputFormat - Custom Type Key Value

2013-02-01 Thread Mahesh Balija
instances based on how you are defining the MR job. Best, Mahesh Balija, CalsoftLabs. On Fri, Feb 1, 2013 at 6:37 PM, Anbarasan Murthy anbarasa...@hcl.comwrote: By default SequenceFileOutputFormat expects the Input – LongWritable Output – Text ** ** I would like to know how

Re: mappers-node relationship

2013-01-25 Thread Mahesh Balija
and mapred.tasktracker.reduce.tasks.maximum. Also they run in parallel. Best, Mahesh Balija, CalsoftLabs. On Fri, Jan 25, 2013 at 1:16 PM, jamal sasha jamalsha...@gmail.com wrote: Hi. A very very lame question. Does numbers of mapper depends on the number of nodes I have? How I imagine map-reduce

Re: Copy files from remote folder to HDFS

2013-01-24 Thread Mahesh Balija
is a data collection and aggregation framework and NOT a file transfer tool and may NOT be a good choice when you actually want to copy the files as-is onto your cluster (NOT 100% sure as I am also working on that). Thanks, Mahesh Balija, CalsoftLabs. On Fri, Jan 25, 2013 at 6:39 AM, Panshul Whisper

Re: How to Backup HDFS data ?

2013-01-24 Thread Mahesh Balija
Hi Steve, On top of Harsh answer, other than Backup there is a feature called Snapshot offered by some third party vendors like MapR. Though its not really a backup it is just a point for which you can revert back at any point in time. Best, Mahesh Balija, CalsoftLabs

Re: How to copy log files from remote windows machine to Hadoop cluster

2013-01-20 Thread Mahesh Balija
Hi Mirko, Thanks for your reply. It works for me as well. Now I was able to mount the folder on the master node and configured Flume such that it can either poll for logs in real time or even for periodic retrieval. Thanks, Mahesh Balija. Calsof Labs. On Thu, Jan 17, 2013

How to copy log files from remote windows machine to Hadoop cluster

2013-01-17 Thread Mahesh Balija
Hi, My log files are generated and saved in a windows machine. Now I have to move those remote files to the Hadoop cluster (HDFS) either in synchronous or asynchronous way. I have gone through flume (Various source types) but was not helpful. Please suggest whether there

Re: How to copy log files from remote windows machine to Hadoop cluster

2013-01-17 Thread Mahesh Balija
Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Thu, Jan 17, 2013 at 4:09 PM, Mahesh Balija balijamahesh@gmail.com wrote: I have studied Flume but I didn't find any thing useful in my case. My requirement is there is a directory in Windows machine, in which the files

Re: Hadoop execution sequence

2013-01-15 Thread Mahesh Balija
client is responsible for processing individual file in order. Best, Mahesh Balija, Calsoft Labs. On Tue, Jan 15, 2013 at 7:55 PM, Panshul Whisper ouchwhis...@gmail.comwrote: Hello, I was wondering if hadoop performs the map reduce operations on the data in maintaining he order or sequence

Re: I am running MapReduce on a 30G data on 1master/2 slave, but failed.

2013-01-10 Thread Mahesh Balija
cause these kind of issues based on the operation you do in your reducer. Can you put some logs in your reducer and try to trace out what is happening. Best, Mahesh Balija, Calsoft Labs. On Fri, Jan 11, 2013 at 8:53 AM, yaotian yaot...@gmail.com wrote: I have 1 hadoop master which name

Re: How to interpret the progress meter?

2013-01-10 Thread Mahesh Balija
Hi Smith, In my experience usually the first 40% to around 70% the actual process will occur the remaining would be devoted to write/flush the data to the output files, usually this may take more time. Best, Mahesh Balija, Calsoft Labs. On Fri, Jan 11, 2013 at 9:32 AM, Roy Smith r

Re: Differences between 'mapped' and 'mapreduce' packages

2013-01-07 Thread Mahesh Balija
changes in 20 api may be for backward compatibility mapred package is still in existence. There are few classes which exist in 19 api and those are not supported in 0.20.* version. Best, Mahesh Balija, Calsoft Labs. On Mon, Jan 7, 2013 at 11:44 PM, Oleg Zhurakousky oleg.zhurakou...@gmail.com

Re: Binary Search in map reduce

2013-01-07 Thread Mahesh Balija
say 1 - the graph and 2 - changes and value will be the actual value. Now the only thing left for you is to append your changes to the actual key and emit the final result. Best, Mahesh Balija, Calsoft Labs. On Tue, Jan 8, 2013 at 5:47 AM, jamal sasha jamalsha...@gmail.com wrote

Re: Best practices forking with files in Hadop MR jobs

2012-12-11 Thread Mahesh Balija
. Best, Mahesh Balija, CalSoft Labs. On Tue, Dec 11, 2012 at 11:29 AM, Ivan Ryndin iryn...@gmail.com wrote: Hi all, I have following question: What are the best practices working with files in Hadoop? I need to process a lot of log files, that arrive to Hadoop every minute. And I have multiple

Re: Query about Speculative Execution

2012-12-06 Thread Mahesh Balija
of the fast running once or early completing task. Best, Mahesh Balija, Calsoft Labs. On Thu, Dec 6, 2012 at 8:27 PM, Ajay Srivastava ajay.srivast...@guavus.comwrote: Hi, What is the behavior of jobTracker if speculative execution is off and a task on data node is running extremely slow

Re: understanding performance

2012-12-04 Thread Mahesh Balija
in the cluster. This can be one possibility why there are fluctuations in your job performance. Best, Mahesh Balija, Calsoft Labs. On Mon, Dec 3, 2012 at 8:57 PM, Cogan, Peter (Peter) peter.co...@alcatel-lucent.com wrote: Hi there, I've been doing some performance testing with hadoop

Re: Input splits for sequence file input

2012-12-02 Thread Mahesh Balija
and generates key-value pairs. InputFormat also handle records that may be split on the FileSplit boundary (i.e., different blocks). Please check this link for more information, http://wiki.apache.org/hadoop/HadoopMapReduce Best, Mahesh Balija, Calsoft Labs. On Mon, Dec 3, 2012

Re: Trouble with Word Count example

2012-11-29 Thread Mahesh Balija
Hi Sandeep, For me everything seems to be alright. Can you tell us how are you running this job? Best, Mahesh.B. Calsoft Labs. On Thu, Nov 29, 2012 at 9:01 PM, Sandeep Jangra sandeepjan...@gmail.comwrote: Hello everyone, Like most others I am also running into some

Re: discrepancy du in dfs are fs

2012-11-28 Thread Mahesh Balija
Hi Chris, Can you try the following in your local machine, du -b myfile.txt and compare this with the hadoop fs -du myfile.txt. Best, Mahesh Balija, Calsoft Labs. On Wed, Nov 28, 2012 at 7:43 PM, listenbru...@gmx.net wrote: Hi all, I wonder wy

Re: Get JobInProgress given jobId

2012-11-28 Thread Mahesh Balija
Hi Pedro, You can get the JobInProgress instance from JobTracker. JobInProgress getJob(JobID jobid); Best, Mahesh Balija, Calsoft Labs. On Wed, Nov 28, 2012 at 10:41 PM, Pedro Sá da Costa psdc1...@gmail.comwrote: I'm building a Java class and given a JobID, how can

Re: Failed to call hadoop API

2012-11-27 Thread Mahesh Balija
(). If this doesnot works for you, please tell what you are trying to do? Thanks, Mahesh Balija, Calsoft Labs. On Tue, Nov 27, 2012 at 5:37 PM, GHui ugi...@gmail.com wrote: I call the sentence JobID id = new JobID() of hadoop API with JNI. But when my program run to this sentence, it exits. And no errors

Re: advice

2012-11-27 Thread Mahesh Balija
basics of HDFS, MapReduce architectures, and then concepts like combiners, partitioner, recordreader, inputformats, outputformats etc Best, Mahesh Balija, Calsoft Labs.

Re: MapReduce APIs

2012-11-26 Thread Mahesh Balija
Hi AK, I don't really understand what is stopping you to use the job.getConfiguration() method to pass the configuration instance to DistributedCache.addCacheFile(URI, job.getConfiguration()). Only thing you need to do is pass the URI and configuration object (getting it from

Re: Moving files

2012-11-25 Thread Mahesh Balija
path. Best, Mahesh Balija, Calsoft Labs. On Sun, Nov 25, 2012 at 8:04 AM, David Parks davidpark...@yahoo.com wrote: I want to move a file in HDFS after a job using the Java API, I'm trying this command but I always get false (could not rename): Path from = new Path(hdfs://localhost

Re: Best practice for storage of data that changes

2012-11-25 Thread Mahesh Balija
files directly then you have to use any commercial Hadoop packages like MapR which supports updating the HDFS files. Best, Mahesh Balija, Calsoft Labs. On Sun, Nov 25, 2012 at 9:40 AM, bharath vissapragada bharathvissapragada1...@gmail.com wrote: Hi Jeff, Please look at [1] . You can store your

Re: Social media data

2012-11-16 Thread Mahesh Balija
Hi Prabhu, For Twitter there are different types for obtaining feeds like gardenhose and FireHose etc. Some may be free and some are paid, like that you can look for other social media options. Best, Mahesh Balija, Calsoft Labs. On Thu, Nov 15, 2012 at 11:35 PM

Re: Doubt on Input and Output Mapper - Key value pairs

2012-11-07 Thread Mahesh Balija
associated with a given key and sends the key and List of values to the reducer function. Best, Mahesh Balija. On Wed, Nov 7, 2012 at 6:09 PM, Ramasubramanian Narayanan ramasubramanian.naraya...@gmail.com wrote: Hi, Which of the following is correct w.r.t mapper. (a) It accepts a single key-value

Hbase DeleteAll is not working

2012-05-13 Thread Mahesh Balija
Hi, I am trying to delete the whole row from hbase in my production cluster in two ways, 1) I have written a mapreduce program to remove many rows which satisfy certain condition to do that, The key is the hbase row key only, and the value is Delete, I am