Then use hbase or similar. You originally wrote it was just for storing. Le ven. 4 sept. 2015 à 16:30, Tao Lu <taolu2...@gmail.com> a écrit :
> Basically they need NOSQL like random update access. > > > > > > On Fri, Sep 4, 2015 at 9:56 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >> What about concurrent access (read / update) to the small file with same >> key ? >> >> That can get a bit tricky. >> >> On Thu, Sep 3, 2015 at 2:47 PM, Jörn Franke <jornfra...@gmail.com> wrote: >> >>> Well it is the same as in normal hdfs, delete file and put a new one >>> with the same name works. >>> >>> Le jeu. 3 sept. 2015 à 21:18, <nib...@free.fr> a écrit : >>> >>>> HAR archive seems a good idea , but just a last question to be sure to >>>> do the best choice : >>>> - Is it possible to override (remove/replace) a file inside the HAR ? >>>> Basically the name of my small files will be the keys of my records , >>>> and sometimes I will need to replace the content of a file by a new content >>>> (remove/replace) >>>> >>>> >>>> Tks a lot >>>> Nicolas >>>> >>>> ----- Mail original ----- >>>> De: "Jörn Franke" <jornfra...@gmail.com> >>>> À: nib...@free.fr >>>> Cc: user@spark.apache.org >>>> Envoyé: Jeudi 3 Septembre 2015 19:29:42 >>>> Objet: Re: Small File to HDFS >>>> >>>> >>>> >>>> Har is transparent and hardly any performance overhead. You may decide >>>> not to compress or use a fast compression algorithm, such as snappy >>>> (recommended) >>>> >>>> >>>> >>>> Le jeu. 3 sept. 2015 à 16:17, < nib...@free.fr > a écrit : >>>> >>>> >>>> My main question in case of HAR usage is , is it possible to use Pig on >>>> it and what about performances ? >>>> >>>> ----- Mail original ----- >>>> De: "Jörn Franke" < jornfra...@gmail.com > >>>> À: nib...@free.fr , user@spark.apache.org >>>> Envoyé: Jeudi 3 Septembre 2015 15:54:42 >>>> Objet: Re: Small File to HDFS >>>> >>>> >>>> >>>> >>>> Store them as hadoop archive (har) >>>> >>>> >>>> Le mer. 2 sept. 2015 à 18:07, < nib...@free.fr > a écrit : >>>> >>>> >>>> Hello, >>>> I'am currently using Spark Streaming to collect small messages (events) >>>> , size being <50 KB , volume is high (several millions per day) and I have >>>> to store those messages in HDFS. >>>> I understood that storing small files can be problematic in HDFS , how >>>> can I manage it ? >>>> >>>> Tks >>>> Nicolas >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >> > > > -- > ------------------------------------------------ > Thanks! > Tao >