Re: R environment with Hadoop

2013-04-10 Thread Mahesh Balija
Mahout is an alternative for R, if you are NOT aware of. Thanks, Mahesh Balija, CalsoftLabs. On Thu, Apr 11, 2013 at 12:25 AM, Ted Yu wrote: > There is RHadoop. > > Maybe there are other platforms. > > > On Wed, Apr 10, 2013 at 11:49 AM, Shah, Rahul1 wrote: > >>

Re: Hadoop sampler related query!

2013-04-23 Thread Mahesh Balija
based on the Mapper outkey type. Best, Mahesh Balija, CalsoftLabs. On Tue, Apr 23, 2013 at 4:12 PM, Rahul Bhattacharjee < rahul.rec@gmail.com> wrote: > + mapred dev > > > On Tue, Apr 16, 2013 at 2:19 PM, Rahul Bhattacharjee < > rahul.rec@gmail.com> wrote: >

Re: namenode memory test

2013-04-24 Thread Mahesh Balija
Can you manually go into the directory configured for hadoop.tmp.dir under core-site.xml and do an ls -l to find the disk usage details, it will have fsimage, edits, fstime, VERSION. or the basic commands like, hadoop fs -du hadoop fsck On Wed, Apr 24, 2013 at 7:56 AM, 自己 wrote: > Hi, I would

Re: Hadoop sampler related query!

2013-04-24 Thread Mahesh Balija
whole program. Best, Mahesh Balija, Calsoft Labs. On Wed, Apr 24, 2013 at 12:37 PM, Rahul Bhattacharjee < rahul.rec@gmail.com> wrote: > Thanks for the response Mahesh. I thought of this , but do not know why is > this limitation. > > While sampling to pick up certain records

Re: Writing data from HDFS to Tpae

2013-04-25 Thread Mahesh Balija
Can you do the following, hadoop fs -copyToLocal Best, Mahesh Balija, CalsoftLabs. On Wed, Apr 24, 2013 at 12:12 PM, G, Prashanthi wrote: > I want to load my HDFS data directly to a tape or external storage > device. > > Please let me know if there is any wa

Re: reducer gets values with empty attributes

2013-04-29 Thread Mahesh Balija
Hi Alex, Can you please attach your code? and the sample input data. Best, Mahesh Balija, Calsoft Labs. On Tue, Apr 30, 2013 at 2:29 AM, wrote: > > Hello, > > I try to write mapreduce program in hadoop -1.0.4. using mapred libs. I have > a map function which ge

Re: Doubt on Input and Output Mapper - Key value pairs

2012-11-07 Thread Mahesh Balija
associated with a given key and sends the key and List of values to the reducer function. Best, Mahesh Balija. On Wed, Nov 7, 2012 at 6:09 PM, Ramasubramanian Narayanan < ramasubramanian.naraya...@gmail.com> wrote: > Hi, > > Which of the following is correct w.r.t mapper. > > (a) It

Re: Social media data

2012-11-16 Thread Mahesh Balija
Hi Prabhu, For Twitter there are different types for obtaining feeds like "gardenhose" and "FireHose" etc. Some may be free and some are paid, like that you can look for other social media options. Best, Mahesh Balija, Calsoft Labs. On Thu, Nov

Re: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to java.lang.String

2012-11-20 Thread Mahesh Balija
/LongWritable and value will be Text, so when the framework is trying to pass those LongWritable to your mapper it is throwing the classcast exception at runtime. Best, Mahesh Balija, Calsoft Labs. On Tue, Nov 20, 2012 at 12:41 AM, Harsh J wrote: > Hi, > > 1. Map/Reduce in 1.x.

Re: debugging hadoop streaming programs (first code)

2012-11-20 Thread Mahesh Balija
am NOT sure for python, but one suggestion is can you run your Python code (Map unit & reduce unit) locally on your input data and see whether your logic has any issues. Best, Mahesh Balija, Calsoft Labs. On Tue, Nov 20, 2012 at 6:50 AM, jamal sasha wrote: > > > > Hi, > This

Re: Moving files

2012-11-25 Thread Mahesh Balija
delete the current path. Best, Mahesh Balija, Calsoft Labs. On Sun, Nov 25, 2012 at 8:04 AM, David Parks wrote: > I want to move a file in HDFS after a job using the Java API, I'm trying > this command but I always get false (could not rename): > > Path from = new > Path(&qu

Re: Best practice for storage of data that changes

2012-11-25 Thread Mahesh Balija
g the hdfs files directly then you have to use any commercial Hadoop packages like MapR which supports updating the HDFS files. Best, Mahesh Balija, Calsoft Labs. On Sun, Nov 25, 2012 at 9:40 AM, bharath vissapragada < bharathvissapragada1...@gmail.com> wrote: > Hi Jeff, > > Ple

Re: MapReduce APIs

2012-11-26 Thread Mahesh Balija
Hi AK, I don't really understand what is stopping you to use the job.getConfiguration() method to pass the configuration instance to DistributedCache.addCacheFile(URI, job.getConfiguration()). Only thing you need to do is pass the URI and configuration object (getting it from o

Re: Failed to call hadoop API

2012-11-27 Thread Mahesh Balija
this doesnot works for you, please tell what you are trying to do? Thanks, Mahesh Balija, Calsoft Labs. On Tue, Nov 27, 2012 at 5:37 PM, GHui wrote: > > I call the sentence "JobID id = new JobID()" of hadoop API with JNI. But > when my program run to this sentence, it exits. And

Re: ClassNotFoundException: org.jdom2.JDOMException

2012-11-27 Thread Mahesh Balija
all nodes in your cluster. Best, Mahesh Balija, Calsoft Labs. On Tue, Nov 27, 2012 at 6:49 PM, dyuti a wrote: > Hi Bharath, > yes i have added all those jars. > > Thanks, > dti > > On Tue, Nov 27, 2012 at 6:35 PM, bharath vissapragada < > bharathvissapragada1..

Re: advice

2012-11-27 Thread Mahesh Balija
r being so vague. > -> Its better start learning basics of HDFS, MapReduce architectures, and then concepts like combiners, partitioner, recordreader, inputformats, outputformats etc Best, Mahesh Balija, Calsoft Labs.

Re: discrepancy du in dfs are fs

2012-11-28 Thread Mahesh Balija
Hi Chris, Can you try the following in your local machine, du -b myfile.txt and compare this with the hadoop fs -du myfile.txt. Best, Mahesh Balija, Calsoft Labs. On Wed, Nov 28, 2012 at 7:43 PM, wrote: > > Hi all, > > I wonder wy there is

Re: Get JobInProgress given jobId

2012-11-28 Thread Mahesh Balija
Hi Pedro, You can get the JobInProgress instance from JobTracker. JobInProgress getJob(JobID jobid); Best, Mahesh Balija, Calsoft Labs. On Wed, Nov 28, 2012 at 10:41 PM, Pedro Sá da Costa wrote: > I'm building a Java class and given a JobID, how can I

Re: discrepancy du in dfs are fs

2012-11-29 Thread Mahesh Balija
HDFS data is compressed/sequence data. Best, Mahesh Balija, Calsoft Labs. On Thu, Nov 29, 2012 at 8:48 PM, Kartashov, Andy wrote: > I also show some discrepancy Sqoop'ing data from MySQL. Both MySQL > "select count(*) from.." and "sqoop -eval -query "select count(

Re: Trouble with Word Count example

2012-11-29 Thread Mahesh Balija
Hi Sandeep, For me everything seems to be alright. Can you tell us how are you running this job? Best, Mahesh.B. Calsoft Labs. On Thu, Nov 29, 2012 at 9:01 PM, Sandeep Jangra wrote: > Hello everyone, > > Like most others I am also running into some problems while running

Re: Trouble with Word Count example

2012-11-29 Thread Mahesh Balija
. Also you can try your luck by running the JOB in old and new versions. Best, Mahesh Balija, Calsoft Labs. On Fri, Nov 30, 2012 at 2:16 AM, Sandeep Jangra wrote: > Hi Harsh, > > I tried putting the generic option first, but it throws exception file > not found. >

Re: Input splits for sequence file input

2012-12-02 Thread Mahesh Balija
generates key-value pairs. InputFormat also handle records that may be split on the FileSplit boundary (i.e., different blocks). Please check this link for more information, http://wiki.apache.org/hadoop/HadoopMapReduce Best, Mahesh Balija, Calsoft Labs. On Mon, Dec 3, 2012

Re: understanding performance

2012-12-04 Thread Mahesh Balija
cluster. This can be one possibility why there are fluctuations in your job performance. Best, Mahesh Balija, Calsoft Labs. On Mon, Dec 3, 2012 at 8:57 PM, Cogan, Peter (Peter) < peter.co...@alcatel-lucent.com> wrote: > Hi there, > > I've been doing some performance

Re: Query about Speculative Execution

2012-12-06 Thread Mahesh Balija
ut of the fast running once or early completing task. Best, Mahesh Balija, Calsoft Labs. On Thu, Dec 6, 2012 at 8:27 PM, Ajay Srivastava wrote: > Hi, > > What is the behavior of jobTracker if speculative execution is off and a > task on data node is running extremely slow? > Will t

Re: Best practices forking with files in Hadop MR jobs

2012-12-11 Thread Mahesh Balija
t one. Best, Mahesh Balija, CalSoft Labs. On Tue, Dec 11, 2012 at 11:29 AM, Ivan Ryndin wrote: > Hi all, > > I have following question: > What are the best practices working with files in Hadoop? > > I need to process a lot of log files, that arrive to Hadoop every minute. > An

Re: Differences between 'mapped' and 'mapreduce' packages

2013-01-07 Thread Mahesh Balija
changes in 20 api may be for backward compatibility mapred package is still in existence. There are few classes which exist in 19 api and those are not supported in 0.20.* version. Best, Mahesh Balija, Calsoft Labs. On Mon, Jan 7, 2013 at 11:44 PM, Oleg Zhurakousky < oleg.zhura

Re: Binary Search in map reduce

2013-01-07 Thread Mahesh Balija
ll be some constant say 1 -> the graph and 2 -> changes and value will be the actual value. Now the only thing left for you is to append your changes to the actual key and emit the final result. Best, Mahesh Balija, Calsoft Labs. On Tue, Jan 8, 2013 at 5:47 AM, jamal sasha wrote: >

Re: I am running MapReduce on a 30G data on 1master/2 slave, but failed.

2013-01-10 Thread Mahesh Balija
cause these kind of issues based on the operation you do in your reducer. Can you put some logs in your reducer and try to trace out what is happening. Best, Mahesh Balija, Calsoft Labs. On Fri, Jan 11, 2013 at 8:53 AM, yaotian wrote: > I have 1 hadoop master which name node locates an

Re: How to interpret the progress meter?

2013-01-10 Thread Mahesh Balija
Hi Smith, In my experience usually the first 40% to around 70% the actual process will occur the remaining would be devoted to write/flush the data to the output files, usually this may take more time. Best, Mahesh Balija, Calsoft Labs. On Fri, Jan 11, 2013 at 9:32 AM, Roy Smith

Re: Hadoop execution sequence

2013-01-15 Thread Mahesh Balija
client is responsible for processing individual file in order. Best, Mahesh Balija, Calsoft Labs. On Tue, Jan 15, 2013 at 7:55 PM, Panshul Whisper wrote: > Hello, > > I was wondering if hadoop performs the map reduce operations on the data > in maintaining he order or sequence of d

How to copy log files from remote windows machine to Hadoop cluster

2013-01-17 Thread Mahesh Balija
Hi, My log files are generated and saved in a windows machine. Now I have to move those remote files to the Hadoop cluster (HDFS) either in synchronous or asynchronous way. I have gone through flume (Various source types) but was not helpful. Please suggest whether there I

Re: How to copy log files from remote windows machine to Hadoop cluster

2013-01-17 Thread Mahesh Balija
wrote: > Give Flume (http://flume.apache.org/) a chance to collect your data. > > Mirko > > > > 2013/1/17 sirenfei > >> ftp auto upload? >> >> >> 2013/1/17 Mahesh Balija : >> > the Hadoop cluster (HDFS) either in synchronous or asynchronou >> > >

Re: How to copy log files from remote windows machine to Hadoop cluster

2013-01-17 Thread Mahesh Balija
atever you have tried?? > > Warm Regards, > Tariq > https://mtariq.jux.com/ > cloudfront.blogspot.com > > > On Thu, Jan 17, 2013 at 4:09 PM, Mahesh Balija > wrote: > >> I have studied Flume but I didn't find any thing useful in my case. >> My requirement

Re: How to copy log files from remote windows machine to Hadoop cluster

2013-01-20 Thread Mahesh Balija
Hi Mirko, Thanks for your reply. It works for me as well. Now I was able to mount the folder on the master node and configured Flume such that it can either poll for logs in real time or even for periodic retrieval. Thanks, Mahesh Balija. Calsof Labs. On Thu, Jan 17, 2013

Re: Copy files from remote folder to HDFS

2013-01-24 Thread Mahesh Balija
data collection and aggregation framework and NOT a file transfer tool and may NOT be a good choice when you actually want to copy the files as-is onto your cluster (NOT 100% sure as I am also working on that). Thanks, Mahesh Balija, CalsoftLabs. On Fri, Jan 25, 2013 at 6:39 AM, Panshul Whisper

Re: How to Backup HDFS data ?

2013-01-24 Thread Mahesh Balija
Hi Steve, On top of Harsh answer, other than Backup there is a feature called Snapshot offered by some third party vendors like MapR. Though its not really a backup it is just a point for which you can revert back at any point in time. Best, Mahesh Balija, CalsoftLabs

Re: mappers-node relationship

2013-01-25 Thread Mahesh Balija
mapred.tasktracker.reduce.tasks.maximum. Also they run in parallel. Best, Mahesh Balija, CalsoftLabs. On Fri, Jan 25, 2013 at 1:16 PM, jamal sasha wrote: > Hi. > A very very lame question. > Does numbers of mapper depends on the number of nodes I have? > How I imagine map-reduce is

Re: SequenceFileOutputFormat - Custom Type Key & Value

2013-02-01 Thread Mahesh Balija
instances based on how you are defining the MR job. Best, Mahesh Balija, CalsoftLabs. On Fri, Feb 1, 2013 at 6:37 PM, Anbarasan Murthy wrote: > By default SequenceFileOutputFormat expects the > > Input – LongWritable > > Output – Text > > ** ** > > I wo

Re: SequenceFileOutputFormat - Wrong Key Class

2013-02-03 Thread Mahesh Balija
key, value. You should get to know through the API documentation. So make sure that you are using right key value pairs. Thanks, Mahesh Balija, CalsoftLabs. On Fri, Feb 1, 2013 at 10:41 PM, Anbarasan Murthy wrote: > I am getting the following Exception message when i try to output T

Re: fresher in hadoop

2013-02-10 Thread Mahesh Balija
The best way is to first learn the concepts thoroughly and then if you like you can also contribute to Hadoop projects. After than prolly it is better to find some BigData based projects. Best, Mahesh Balija, CalsoftLabs. On Mon, Feb 11, 2013 at 10:32 AM, Monkey2Code wrote: > Hi am fresher

Re: number input files to mapreduce job

2013-02-12 Thread Mahesh Balija
Hi Vikas, You can get the FileSystem instance by calling FileSystem.get(Configuration); Once you get the FileSystem instance you can use FileSystem.listStatus(InputPath); to get the fileStatus instances. Best, Mahesh Balija, Calsoft Labs. On Tue, Feb 12, 2013

Re: Regarding Hadoop

2013-02-14 Thread Mahesh Balija
ks in the Hadoop eco-system includes Mahout, Hive, Pig etc has their own applications. One important note is that Hadoop run on a commodity hardware. Best, Mahesh Balija, Calsoft Labs. On Fri, Feb 15, 2013 at 12:08 PM, SrinivasaRao Kongar wrote: > > Hi sir, > > What is Hadoop te

Re: How To Load Data Between two cluster

2013-02-22 Thread Mahesh Balija
versa I am NOT sure whether this is the optimized solution, prolly you can check for other approaches. Case 2:- After case 1 you can build Hive tables on the HDFS (Cluster2) Best, Mahesh Balija, CalsoftLabs. On Fri, Feb 22, 2013 at 12:07 PM, samir das mohapatra

Re: WordPairCount Mapreduce question.

2013-02-23 Thread Mahesh Balija
Please check the in-line answers... On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai wrote: > > Hello > > I have a question about how Mapreduce sorting works internally with > multiple columns. > > Below r my classes using 2 columns in an input file given below. > > 1st question: About the method hashCo

Re: WordPairCount Mapreduce question.

2013-02-25 Thread Mahesh Balija
keys are sorted, because of this implementation the records are read from the stream directly and sorted without the need to deserializing them into Objects. Best, Mahesh Balija, CalsoftLabs. On Sun, Feb 24, 2013 at 5:01 PM, Sai Sai wrote: > Thanks Mahesh for your help. > > Wondering

Re: Running terasort with 1 map task

2013-02-26 Thread Mahesh Balija
does passing the dfs.block.size=134217728 resolves your issue? or is it something else fixed your problem? On Tue, Feb 26, 2013 at 6:04 PM, Arindam Choudhury < arindamchoudhu...@gmail.com> wrote: > sorry my bad, it solved > > > On Tue, Feb 26, 2013 at 1:22 PM, Arindam Choudhury < > arindamchoudhu

Re: mapper combiner and partitioner for particular dataset

2013-03-05 Thread Mahesh Balija
different cases. Harsh, please correct me if I am wrong. Best, Mahesh Balija, Calsoft Labs. On Mon, Mar 4, 2013 at 8:32 PM, Vikas Jadhav wrote: > Thank You for reply > > Can u please elaborate because i am not getting wht does following means > in programming enviornment > > > y

Re: Hadoop file system

2013-03-05 Thread Mahesh Balija
. Best, Mahesh Balija, Calsoft Labs. On Tue, Mar 5, 2013 at 10:43 AM, AMARNATH, Balachandar < balachandar.amarn...@airbus.com> wrote: > > Hi, > > I am new to hdfs. In my java application, I need to perform ‘similar > operation’ over large number of files. I would like to

Re: Need help optimizing reducer

2013-03-05 Thread Mahesh Balija
ght be faster upto 66%. In order to speed up your program you may either have to have more number of reducers or make your reducer code as optimized as possible. Best, Mahesh Balija, Calsoft Labs. On Tue, Mar 5, 2013 at 1:27 AM, Austin Chungath wrote: > Hi all, > > I have 1 reduce

Re: All nodes are not used

2016-08-09 Thread Mahesh Balija
Hi Madhav, The behaviour to me sounds normal. If the Block Size is 128 MB there could possibly be ~24 Mappers (i.e., containers used). You cannot use entire cluster as the blocks could be only in the nodes being used. You should not try using the entire cluster resources for following reason The