Re: Small File to HDFS

Ted Yu Fri, 04 Sep 2015 09:04:42 -0700

bq. Then use hbase

+1


On Fri, Sep 4, 2015 at 9:00 AM, Jörn Franke <jornfra...@gmail.com> wrote:

> Then use hbase or similar. You originally wrote it was just for storing.
>
> Le ven. 4 sept. 2015 à 16:30, Tao Lu <taolu2...@gmail.com> a écrit :
>
>> Basically they need NOSQL like random update access.
>>
>>
>>
>>
>>
>> On Fri, Sep 4, 2015 at 9:56 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> What about concurrent access (read / update) to the small file with same
>>> key ?
>>>
>>> That can get a bit tricky.
>>>
>>> On Thu, Sep 3, 2015 at 2:47 PM, Jörn Franke <jornfra...@gmail.com>
>>> wrote:
>>>
>>>> Well it is the same as in normal hdfs, delete file and put a new one
>>>> with the same name works.
>>>>
>>>> Le jeu. 3 sept. 2015 à 21:18,  <nib...@free.fr> a écrit :
>>>>
>>>>> HAR archive seems a good idea , but just a last question to be sure to
>>>>> do the best choice :
>>>>> - Is it possible to override (remove/replace) a file inside the HAR ?
>>>>> Basically the name of my small files will be the keys of my records ,
>>>>> and sometimes I will need to replace the content of a file by a new 
>>>>> content
>>>>> (remove/replace)
>>>>>
>>>>>
>>>>> Tks a lot
>>>>> Nicolas
>>>>>
>>>>> ----- Mail original -----
>>>>> De: "Jörn Franke" <jornfra...@gmail.com>
>>>>> À: nib...@free.fr
>>>>> Cc: user@spark.apache.org
>>>>> Envoyé: Jeudi 3 Septembre 2015 19:29:42
>>>>> Objet: Re: Small File to HDFS
>>>>>
>>>>>
>>>>>
>>>>> Har is transparent and hardly any performance overhead. You may decide
>>>>> not to compress or use a fast compression algorithm, such as snappy
>>>>> (recommended)
>>>>>
>>>>>
>>>>>
>>>>> Le jeu. 3 sept. 2015 à 16:17, < nib...@free.fr > a écrit :
>>>>>
>>>>>
>>>>> My main question in case of HAR usage is , is it possible to use Pig
>>>>> on it and what about performances ?
>>>>>
>>>>> ----- Mail original -----
>>>>> De: "Jörn Franke" < jornfra...@gmail.com >
>>>>> À: nib...@free.fr , user@spark.apache.org
>>>>> Envoyé: Jeudi 3 Septembre 2015 15:54:42
>>>>> Objet: Re: Small File to HDFS
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Store them as hadoop archive (har)
>>>>>
>>>>>
>>>>> Le mer. 2 sept. 2015 à 18:07, < nib...@free.fr > a écrit :
>>>>>
>>>>>
>>>>> Hello,
>>>>> I'am currently using Spark Streaming to collect small messages
>>>>> (events) , size being <50 KB , volume is high (several millions per day)
>>>>> and I have to store those messages in HDFS.
>>>>> I understood that storing small files can be problematic in HDFS , how
>>>>> can I manage it ?
>>>>>
>>>>> Tks
>>>>> Nicolas
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>
>>>>>
>>>
>>
>>
>> --
>> ------------------------------------------------
>> Thanks!
>> Tao
>>
>

Re: Small File to HDFS

Reply via email to