Then use hbase or similar. You originally wrote it was just for storing.

Le ven. 4 sept. 2015 à 16:30, Tao Lu <taolu2...@gmail.com> a écrit :

> Basically they need NOSQL like random update access.
>
>
>
>
>
> On Fri, Sep 4, 2015 at 9:56 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> What about concurrent access (read / update) to the small file with same
>> key ?
>>
>> That can get a bit tricky.
>>
>> On Thu, Sep 3, 2015 at 2:47 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>>
>>> Well it is the same as in normal hdfs, delete file and put a new one
>>> with the same name works.
>>>
>>> Le jeu. 3 sept. 2015 à 21:18,  <nib...@free.fr> a écrit :
>>>
>>>> HAR archive seems a good idea , but just a last question to be sure to
>>>> do the best choice :
>>>> - Is it possible to override (remove/replace) a file inside the HAR ?
>>>> Basically the name of my small files will be the keys of my records ,
>>>> and sometimes I will need to replace the content of a file by a new content
>>>> (remove/replace)
>>>>
>>>>
>>>> Tks a lot
>>>> Nicolas
>>>>
>>>> ----- Mail original -----
>>>> De: "Jörn Franke" <jornfra...@gmail.com>
>>>> À: nib...@free.fr
>>>> Cc: user@spark.apache.org
>>>> Envoyé: Jeudi 3 Septembre 2015 19:29:42
>>>> Objet: Re: Small File to HDFS
>>>>
>>>>
>>>>
>>>> Har is transparent and hardly any performance overhead. You may decide
>>>> not to compress or use a fast compression algorithm, such as snappy
>>>> (recommended)
>>>>
>>>>
>>>>
>>>> Le jeu. 3 sept. 2015 à 16:17, < nib...@free.fr > a écrit :
>>>>
>>>>
>>>> My main question in case of HAR usage is , is it possible to use Pig on
>>>> it and what about performances ?
>>>>
>>>> ----- Mail original -----
>>>> De: "Jörn Franke" < jornfra...@gmail.com >
>>>> À: nib...@free.fr , user@spark.apache.org
>>>> Envoyé: Jeudi 3 Septembre 2015 15:54:42
>>>> Objet: Re: Small File to HDFS
>>>>
>>>>
>>>>
>>>>
>>>> Store them as hadoop archive (har)
>>>>
>>>>
>>>> Le mer. 2 sept. 2015 à 18:07, < nib...@free.fr > a écrit :
>>>>
>>>>
>>>> Hello,
>>>> I'am currently using Spark Streaming to collect small messages (events)
>>>> , size being <50 KB , volume is high (several millions per day) and I have
>>>> to store those messages in HDFS.
>>>> I understood that storing small files can be problematic in HDFS , how
>>>> can I manage it ?
>>>>
>>>> Tks
>>>> Nicolas
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>
>
>
> --
> ------------------------------------------------
> Thanks!
> Tao
>

Reply via email to