Basically they need NOSQL like random update access.




On Fri, Sep 4, 2015 at 9:56 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> What about concurrent access (read / update) to the small file with same
> key ?
>
> That can get a bit tricky.
>
> On Thu, Sep 3, 2015 at 2:47 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>
>> Well it is the same as in normal hdfs, delete file and put a new one with
>> the same name works.
>>
>> Le jeu. 3 sept. 2015 à 21:18,  <nib...@free.fr> a écrit :
>>
>>> HAR archive seems a good idea , but just a last question to be sure to
>>> do the best choice :
>>> - Is it possible to override (remove/replace) a file inside the HAR ?
>>> Basically the name of my small files will be the keys of my records ,
>>> and sometimes I will need to replace the content of a file by a new content
>>> (remove/replace)
>>>
>>>
>>> Tks a lot
>>> Nicolas
>>>
>>> ----- Mail original -----
>>> De: "Jörn Franke" <jornfra...@gmail.com>
>>> À: nib...@free.fr
>>> Cc: user@spark.apache.org
>>> Envoyé: Jeudi 3 Septembre 2015 19:29:42
>>> Objet: Re: Small File to HDFS
>>>
>>>
>>>
>>> Har is transparent and hardly any performance overhead. You may decide
>>> not to compress or use a fast compression algorithm, such as snappy
>>> (recommended)
>>>
>>>
>>>
>>> Le jeu. 3 sept. 2015 à 16:17, < nib...@free.fr > a écrit :
>>>
>>>
>>> My main question in case of HAR usage is , is it possible to use Pig on
>>> it and what about performances ?
>>>
>>> ----- Mail original -----
>>> De: "Jörn Franke" < jornfra...@gmail.com >
>>> À: nib...@free.fr , user@spark.apache.org
>>> Envoyé: Jeudi 3 Septembre 2015 15:54:42
>>> Objet: Re: Small File to HDFS
>>>
>>>
>>>
>>>
>>> Store them as hadoop archive (har)
>>>
>>>
>>> Le mer. 2 sept. 2015 à 18:07, < nib...@free.fr > a écrit :
>>>
>>>
>>> Hello,
>>> I'am currently using Spark Streaming to collect small messages (events)
>>> , size being <50 KB , volume is high (several millions per day) and I have
>>> to store those messages in HDFS.
>>> I understood that storing small files can be problematic in HDFS , how
>>> can I manage it ?
>>>
>>> Tks
>>> Nicolas
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>


-- 
------------------------------------------------
Thanks!
Tao

Reply via email to