Hello,
I have a hadoop cluster of 5 nodes with a total of available HDFS space 130
GB with replication set to 5.
I have a file of 115 GB, which needs to be copied to the HDFS and processed.
Do I need to have anymore HDFS space for performing all processing without
running into any problems? or is
If the replication factor is 5 you will need at least 5x the space if the
file. So this is not going tobe enough.
On Thursday, January 10, 2013, Panshul Whisper wrote:
Hello,
I have a hadoop cluster of 5 nodes with a total of available HDFS space
130 GB with replication set to 5.
I have a
If the file is a txt file, you could get a good compression ratio. Changing
the replication to 3 and the file will fit. But not sure what your usecase
is what you want to achieve by putting this data there. Any transformation
on this data and you would need more space to save the transformed data.
Thank you for the response.
Actually it is not a single file, I have JSON files that amount to 115 GB,
these JSON files need to be processed and loaded into a Hbase data tables
on the same cluster for later processing. Not considering the disk space
required for the Hbase storage, If I reduce the
finish elementary school first. (plus, minus operations at least)
On Thu, Jan 10, 2013 at 7:23 PM, Panshul Whisper ouchwhis...@gmail.comwrote:
Thank you for the response.
Actually it is not a single file, I have JSON files that amount to 115 GB,
these JSON files need to be processed and
115 * 5 = 575 Minimum GB you need, keep in mind on minimal, and you will
have other disk space needs too...
∞
Shashwat Shriparv
On Fri, Jan 11, 2013 at 11:19 AM, Alexander Pivovarov
apivova...@gmail.comwrote:
finish elementary school first. (plus, minus operations at least)
On Thu, Jan