You may consider storing it in one big HDFS file, and to keep appending new
messages to it.

For instance,
one message  -> zip it -> append it to the HDFS as one line

On Wed, Sep 2, 2015 at 12:43 PM, <nib...@free.fr> wrote:

> Hi,
> I already store them in MongoDB in parralel for operational access and
> don't want to add an other database in the loop
> Is it the only solution ?
>
> Tks
> Nicolas
>
> ----- Mail original -----
> De: "Ted Yu" <yuzhih...@gmail.com>
> À: nib...@free.fr
> Cc: "user" <user@spark.apache.org>
> Envoyé: Mercredi 2 Septembre 2015 18:34:17
> Objet: Re: Small File to HDFS
>
>
> Instead of storing those messages in HDFS, have you considered storing
> them in key-value store (e.g. hbase) ?
>
>
> Cheers
>
>
> On Wed, Sep 2, 2015 at 9:07 AM, < nib...@free.fr > wrote:
>
>
> Hello,
> I'am currently using Spark Streaming to collect small messages (events) ,
> size being <50 KB , volume is high (several millions per day) and I have to
> store those messages in HDFS.
> I understood that storing small files can be problematic in HDFS , how can
> I manage it ?
>
> Tks
> Nicolas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
------------------------------------------------
Thanks!
Tao

Reply via email to