Dealing with low space cluster

2012-06-14 Thread Ondřej Klimpera
Hello, we're testing application on 8 nodes, where each node has 20GB of local storage available. What we are trying to achieve is to get more than 20GB to be processed on this cluster. Is there a way how to distribute the data on the cluster? There is also one shared NFS storage disk with 1

Re: Dealing with low space cluster

2012-06-14 Thread praveenesh kumar
I don't know whether this will work or not.. but you can give it a shot..(I am assuming you are having 8 nodes as hadoop cluster) 1. Mount 1 TB hard disk to one of the DN. 2. Put it to HDFS. I think once its on HDFS.. it will automatically gets distributed. Regards, Praveenesh On Thu, Jun 14, 20

Re: Dealing with low space cluster

2012-06-14 Thread Harsh J
Ondřej, If by processing you mean trying to write out (map outputs) > 20 GB of data per map task, that may not be possible, as the outputs need to be materialized and the disk space is the constraint there. Or did I not understand you correctly (in thinking you are asking about MapReduce)? Cause

Re: Dealing with low space cluster

2012-06-14 Thread praveenesh kumar
@Harsh --- I was wondering...although it doesn't make much/any sense --- if a person wants to store the files only on HDFS (something like a backup) consider the above hardware scenario --- no MR processing, In that case, it should be possible to have a file with a size more than 20 GB to be store

Re: Dealing with low space cluster

2012-06-14 Thread Ondřej Klimpera
Hello, you're right. That's exactly what I ment. And your answer is exactly what I thought. I was just wondering if Hadoop can distribute the data to other node's local storages if own local space is full. Thanks On 06/14/2012 03:38 PM, Harsh J wrote: Ondřej, If by processing you mean tryi

Re: Dealing with low space cluster

2012-06-14 Thread Harsh J
Praveenesh, Yes, you are absolutely right, you can indeed store >20 GB per file on such a cluster (and have it replicated properly) due to the the HDFS' chunking writes into smaller sized blocks. On Thu, Jun 14, 2012 at 7:23 PM, praveenesh kumar wrote: > @Harsh --- > > I was wondering...although

Re: Dealing with low space cluster

2012-06-14 Thread Harsh J
Ondřej, That isn't currently possible with local storage FS. Your 1 TB NFS point can help but I suspect it may act as a slow-down point if nodes use it in parallel. Perhaps mount it only on 3-4 machines (or less), instead of all, to avoid that? On Thu, Jun 14, 2012 at 7:28 PM, Ondřej Klimpera wr

Re: Dealing with low space cluster

2012-06-14 Thread Ondřej Klimpera
Thanks, I'll try. One more question, I've got few more nodes, which can be added to the cluster. But how to do that? If I understand it (according to Hadoop's wiki pages): 1. On master node - edit slaves file and add IP addresses of new nodes (everything clear) 2. log in to each newly added

Re: Dealing with low space cluster

2012-06-14 Thread Harsh J
Hi, If you aren't using access lists (include/exclude), just place conf files (same as other slaves, or tweaked where necessary), and start them. They will join automatically and you will see them on the live nodes list immediately. You do not need to run the refreshing commands when not using the