Hi Madhav,
The behaviour to me sounds normal.
If the Block Size is 128 MB there could possibly be ~24 Mappers (i.e.,
containers used).
You cannot use entire cluster as the blocks could be only in the nodes
being used.
You should not try using the entire cluster resources for following reason
Hi Alex,
Can you please attach your code? and the sample input data.
Best,
Mahesh Balija,
Calsoft Labs.
On Tue, Apr 30, 2013 at 2:29 AM, alx...@aim.com wrote:
Hello,
I try to write mapreduce program in hadoop -1.0.4. using mapred libs. I have
a map function which gets
Can you manually go into the directory configured for hadoop.tmp.dir under
core-site.xml and do an ls -l to find the disk usage details, it will have
fsimage, edits, fstime, VERSION.
or the basic commands like,
hadoop fs -du
hadoop fsck
On Wed, Apr 24, 2013 at 7:56 AM, 自己 zx4866...@163.com
of
the whole program.
Best,
Mahesh Balija,
Calsoft Labs.
On Wed, Apr 24, 2013 at 12:37 PM, Rahul Bhattacharjee
rahul.rec@gmail.com wrote:
Thanks for the response Mahesh. I thought of this , but do not know why is
this limitation.
While sampling to pick up certain records and run our logic over
based on the Mapper outkey type.
Best,
Mahesh Balija,
CalsoftLabs.
On Tue, Apr 23, 2013 at 4:12 PM, Rahul Bhattacharjee
rahul.rec@gmail.com wrote:
+ mapred dev
On Tue, Apr 16, 2013 at 2:19 PM, Rahul Bhattacharjee
rahul.rec@gmail.com wrote:
Hi,
I have a question related
be faster upto 66%. In order to speed up your program you may either have
to have more number of reducers or make your reducer code as optimized as
possible.
Best,
Mahesh Balija,
Calsoft Labs.
On Tue, Mar 5, 2013 at 1:27 AM, Austin Chungath austi...@gmail.com wrote:
Hi all,
I have 1 reducer
does passing the dfs.block.size=134217728 resolves your issue? or is it
something else fixed your problem?
On Tue, Feb 26, 2013 at 6:04 PM, Arindam Choudhury
arindamchoudhu...@gmail.com wrote:
sorry my bad, it solved
On Tue, Feb 26, 2013 at 1:22 PM, Arindam Choudhury
the keys are sorted, because of this
implementation the records are read from the stream directly and sorted
without the need to deserializing them into Objects.
Best,
Mahesh Balija,
CalsoftLabs.
On Sun, Feb 24, 2013 at 5:01 PM, Sai Sai saigr...@yahoo.in wrote:
Thanks Mahesh for your help
Please check the in-line answers...
On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai saigr...@yahoo.in wrote:
Hello
I have a question about how Mapreduce sorting works internally with
multiple columns.
Below r my classes using 2 columns in an input file given below.
1st question: About the
in the Hadoop eco-system includes Mahout, Hive,
Pig etc has their own applications.
One important note is that Hadoop run on a commodity hardware.
Best,
Mahesh Balija,
Calsoft Labs.
On Fri, Feb 15, 2013 at 12:08 PM, SrinivasaRao Kongar
ksrinu...@gmail.comwrote:
Hi sir,
What is Hadoop technology
Hi Vikas,
You can get the FileSystem instance by calling
FileSystem.get(Configuration);
Once you get the FileSystem instance you can use
FileSystem.listStatus(InputPath); to get the fileStatus instances.
Best,
Mahesh Balija,
Calsoft Labs.
On Tue, Feb 12, 2013
The best way is to first learn the concepts thoroughly and then if you like
you can also contribute to Hadoop projects.
After than prolly it is better to find some BigData based projects.
Best,
Mahesh Balija,
CalsoftLabs.
On Mon, Feb 11, 2013 at 10:32 AM, Monkey2Code monkey2c...@gmail.com wrote
as key, value. You should get to
know through the API documentation.
So make sure that you are using right key value pairs.
Thanks,
Mahesh Balija,
CalsoftLabs.
On Fri, Feb 1, 2013 at 10:41 PM, Anbarasan Murthy anbu992...@hotmail.comwrote:
I am getting the following Exception message when i
instances based on how you are defining the MR job.
Best,
Mahesh Balija,
CalsoftLabs.
On Fri, Feb 1, 2013 at 6:37 PM, Anbarasan Murthy anbarasa...@hcl.comwrote:
By default SequenceFileOutputFormat expects the
Input – LongWritable
Output – Text
** **
I would like to know how
and
mapred.tasktracker.reduce.tasks.maximum.
Also they run in parallel.
Best,
Mahesh Balija,
CalsoftLabs.
On Fri, Jan 25, 2013 at 1:16 PM, jamal sasha jamalsha...@gmail.com wrote:
Hi.
A very very lame question.
Does numbers of mapper depends on the number of nodes I have?
How I imagine map-reduce
is a
data collection and aggregation framework and NOT a file transfer tool and
may NOT be a good choice when you actually want to copy the files as-is
onto your cluster (NOT 100% sure as I am also working on that).
Thanks,
Mahesh Balija,
CalsoftLabs.
On Fri, Jan 25, 2013 at 6:39 AM, Panshul Whisper
Hi Steve,
On top of Harsh answer, other than Backup there is a feature
called Snapshot offered by some third party vendors like MapR.
Though its not really a backup it is just a point for which you
can revert back at any point in time.
Best,
Mahesh Balija,
CalsoftLabs
Hi Mirko,
Thanks for your reply. It works for me as well.
Now I was able to mount the folder on the master node and
configured Flume such that it can either poll for logs in real time or even
for periodic retrieval.
Thanks,
Mahesh Balija.
Calsof Labs.
On Thu, Jan 17, 2013
Hi,
My log files are generated and saved in a windows machine.
Now I have to move those remote files to the Hadoop cluster (HDFS)
either in synchronous or asynchronous way.
I have gone through flume (Various source types) but was not helpful.
Please suggest whether there
Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Thu, Jan 17, 2013 at 4:09 PM, Mahesh Balija balijamahesh@gmail.com
wrote:
I have studied Flume but I didn't find any thing useful in my case.
My requirement is there is a directory in Windows machine, in which the
files
client is responsible for processing individual file in order.
Best,
Mahesh Balija,
Calsoft Labs.
On Tue, Jan 15, 2013 at 7:55 PM, Panshul Whisper ouchwhis...@gmail.comwrote:
Hello,
I was wondering if hadoop performs the map reduce operations on the data
in maintaining he order or sequence
cause these kind of issues based on the operation you
do in your reducer.
Can you put some logs in your reducer and try to trace out what
is happening.
Best,
Mahesh Balija,
Calsoft Labs.
On Fri, Jan 11, 2013 at 8:53 AM, yaotian yaot...@gmail.com wrote:
I have 1 hadoop master which name
Hi Smith,
In my experience usually the first 40% to around 70% the actual
process will occur the remaining would be devoted to write/flush the data
to the output files, usually this may take more time.
Best,
Mahesh Balija,
Calsoft Labs.
On Fri, Jan 11, 2013 at 9:32 AM, Roy Smith r
changes in 20 api may be for backward
compatibility mapred package is still in existence.
There are few classes which exist in 19 api and those are not
supported in 0.20.* version.
Best,
Mahesh Balija,
Calsoft Labs.
On Mon, Jan 7, 2013 at 11:44 PM, Oleg Zhurakousky
oleg.zhurakou...@gmail.com
say 1 - the graph and 2 - changes and value will be the actual value.
Now the only thing left for you is to append your changes to the
actual key and emit the final result.
Best,
Mahesh Balija,
Calsoft Labs.
On Tue, Jan 8, 2013 at 5:47 AM, jamal sasha jamalsha...@gmail.com wrote
.
Best,
Mahesh Balija,
CalSoft Labs.
On Tue, Dec 11, 2012 at 11:29 AM, Ivan Ryndin iryn...@gmail.com wrote:
Hi all,
I have following question:
What are the best practices working with files in Hadoop?
I need to process a lot of log files, that arrive to Hadoop every minute.
And I have multiple
of the fast running once
or early completing task.
Best,
Mahesh Balija,
Calsoft Labs.
On Thu, Dec 6, 2012 at 8:27 PM, Ajay Srivastava
ajay.srivast...@guavus.comwrote:
Hi,
What is the behavior of jobTracker if speculative execution is off and a
task on data node is running extremely slow
in the cluster.
This can be one possibility why there are fluctuations in your
job performance.
Best,
Mahesh Balija,
Calsoft Labs.
On Mon, Dec 3, 2012 at 8:57 PM, Cogan, Peter (Peter)
peter.co...@alcatel-lucent.com wrote:
Hi there,
I've been doing some performance testing with hadoop
and generates key-value pairs.
InputFormat also handle records that may be split on the
FileSplit boundary (i.e., different blocks).
Please check this link for more information,
http://wiki.apache.org/hadoop/HadoopMapReduce
Best,
Mahesh Balija,
Calsoft Labs.
On Mon, Dec 3, 2012
Hi Sandeep,
For me everything seems to be alright.
Can you tell us how are you running this job?
Best,
Mahesh.B.
Calsoft Labs.
On Thu, Nov 29, 2012 at 9:01 PM, Sandeep Jangra sandeepjan...@gmail.comwrote:
Hello everyone,
Like most others I am also running into some
Hi Chris,
Can you try the following in your local machine,
du -b myfile.txt
and compare this with the hadoop fs -du myfile.txt.
Best,
Mahesh Balija,
Calsoft Labs.
On Wed, Nov 28, 2012 at 7:43 PM, listenbru...@gmx.net wrote:
Hi all,
I wonder wy
Hi Pedro,
You can get the JobInProgress instance from JobTracker.
JobInProgress getJob(JobID jobid);
Best,
Mahesh Balija,
Calsoft Labs.
On Wed, Nov 28, 2012 at 10:41 PM, Pedro Sá da Costa psdc1...@gmail.comwrote:
I'm building a Java class and given a JobID, how can
().
If this doesnot works for you, please tell what you are
trying to do?
Thanks,
Mahesh Balija,
Calsoft Labs.
On Tue, Nov 27, 2012 at 5:37 PM, GHui ugi...@gmail.com wrote:
I call the sentence JobID id = new JobID() of hadoop API with JNI. But
when my program run to this sentence, it exits. And no errors
basics of HDFS, MapReduce architectures, and
then concepts like combiners, partitioner, recordreader, inputformats,
outputformats etc
Best,
Mahesh Balija,
Calsoft Labs.
Hi AK,
I don't really understand what is stopping you to use the
job.getConfiguration() method to pass the configuration instance to
DistributedCache.addCacheFile(URI, job.getConfiguration()).
Only thing you need to do is pass the URI and configuration
object (getting it from
path.
Best,
Mahesh Balija,
Calsoft Labs.
On Sun, Nov 25, 2012 at 8:04 AM, David Parks davidpark...@yahoo.com wrote:
I want to move a file in HDFS after a job using the Java API, I'm trying
this command but I always get false (could not rename):
Path from = new
Path(hdfs://localhost
files directly
then you have to use any commercial Hadoop packages like MapR which
supports updating the HDFS files.
Best,
Mahesh Balija,
Calsoft Labs.
On Sun, Nov 25, 2012 at 9:40 AM, bharath vissapragada
bharathvissapragada1...@gmail.com wrote:
Hi Jeff,
Please look at [1] . You can store your
Hi Prabhu,
For Twitter there are different types for obtaining feeds
like gardenhose and FireHose etc.
Some may be free and some are paid, like that you can look
for other social media options.
Best,
Mahesh Balija,
Calsoft Labs.
On Thu, Nov 15, 2012 at 11:35 PM
associated with a given key and sends the key and List of values to the
reducer function.
Best,
Mahesh Balija.
On Wed, Nov 7, 2012 at 6:09 PM, Ramasubramanian Narayanan
ramasubramanian.naraya...@gmail.com wrote:
Hi,
Which of the following is correct w.r.t mapper.
(a) It accepts a single key-value
Hi,
I am trying to delete the whole row from hbase in my production
cluster in two ways,
1) I have written a mapreduce program to remove many rows which
satisfy certain condition to do that,
The key is the hbase row key only, and the value is
Delete, I am
40 matches
Mail list logo