RE: partition file by content based through HDFS

John Lilley Sun, 11 May 2014 21:55:46 -0700

To second Mirko, HDFS isn’t concerned with content or formats.  That would be 
analogous to asking specific content to end up on specific disk sectors in a 
normal file.  If you want to partition data by content, use MapReduce/Pig/Hive 
etc to segregate the data into files, perhaps naming the files to indicate the 
key split.

But this kind of begs the question “why”?  MapReduce has built-in support for 
data partitioning on the fly in the “mappers” and you don’t really need to do 
anything.  Is that too slow for your needs?

john

From: Mirko Kämpf [mailto:mirko.kae...@gmail.com]
Sent: Sunday, May 11, 2014 2:54 PM
To: user@hadoop.apache.org
Subject: Re: partition file by content based through HDFS

Hi,

HDFS blocks are not "content aware". Such a separation like you requested, 
could be done via Hive or Pig with some lines of code, than you would have 
multiple files which can be organized in partitions as well, but such 
partitions are on a different abstraction level, not on blocks, but within hive 
tables.

Best wishes,
Mirko

2014-05-11 14:41 GMT+01:00 Karim Awara 
<karim.aw...@kaust.edu.sa<mailto:karim.aw...@kaust.edu.sa>>:
Hi,
When a user is uploading a file from the local disk to its HDFS, can I make it 
partition the file into blocks based on its content?  Meaning, if I have a file 
with one integer column, can i say, I want the hdfs block to have even numbers?

--
Best Regards,
Karim Ahmed Awara

________________________________
This message and its contents, including attachments are intended solely for 
the original recipient. If you are not the intended recipient or have received 
this message in error, please notify me immediately and delete this message 
from your computer system. Any unauthorized use or distribution is prohibited. 
Please consider the environment before printing this email.

RE: partition file by content based through HDFS

Reply via email to