Hi,
Default DFS block size is 64 MB. Does this mean that if I put file less than 64
MB on HDFS, it will not be divided any further?
I have lots and lots if XMLs and I would like to process them directly.
Currently I am converting them to Sequence files (10 XMLs per sequence file)
and the puttin
Hi,
I am implementing the MapRunnable interface to create the Map jobs.
I have large data set for processing. (Data size is around 10 GB).
I have 1 master and 10 slaves cluster.
When I run my program, hadoop will process data successfully.
After processing, I am collecting all data (all are files
Hi,
I am trying to use Hadoop for Lucene index creation. I have to create multiple
indexes based on contents of the files (i.e. if author is "hrishikesh", it
should be added to a index for "hrishikesh". There has to be a separate index
for every author). For this, I am keeping multiple IndexWri
Hi,
If I have one cluster with around 20 machines, can I submit different MR jobs
from different machines in cluster? Are there any precautions to be taken? (I
want to start Nutch crawl as one job and Katta indexing as another job)
--Hrishi
DISCLAIMER
==
This e-mail may contain privileg
Hi,
Is there any relationship between how many map and how many reduce tasks I am
running per node and what is capacity (RAM, CPU) of my NameNode?
i.e. if I want to run more maps and more reduce tasks per node then RAM of
NameNode should be high?
Similarly does NameNode capacity should be driven