You may consider storing it in one big HDFS file, and to keep appending new messages to it.
For instance, one message -> zip it -> append it to the HDFS as one line On Wed, Sep 2, 2015 at 12:43 PM, <nib...@free.fr> wrote: > Hi, > I already store them in MongoDB in parralel for operational access and > don't want to add an other database in the loop > Is it the only solution ? > > Tks > Nicolas > > ----- Mail original ----- > De: "Ted Yu" <yuzhih...@gmail.com> > À: nib...@free.fr > Cc: "user" <user@spark.apache.org> > Envoyé: Mercredi 2 Septembre 2015 18:34:17 > Objet: Re: Small File to HDFS > > > Instead of storing those messages in HDFS, have you considered storing > them in key-value store (e.g. hbase) ? > > > Cheers > > > On Wed, Sep 2, 2015 at 9:07 AM, < nib...@free.fr > wrote: > > > Hello, > I'am currently using Spark Streaming to collect small messages (events) , > size being <50 KB , volume is high (several millions per day) and I have to > store those messages in HDFS. > I understood that storing small files can be problematic in HDFS , how can > I manage it ? > > Tks > Nicolas > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- ------------------------------------------------ Thanks! Tao