Hello, Yes but : - In the Java API I don't find a API to create a HDFS archive - As soon as I receive a message (with messageID) I need to replace the old existing file by the new one (name of file being the messageID), is it possible with archive ?
Tks Nicolas ----- Mail original ----- De: "Jörn Franke" <jornfra...@gmail.com> À: nib...@free.fr, "user" <user@spark.apache.org> Envoyé: Lundi 28 Septembre 2015 23:53:56 Objet: Re: HDFS small file generation problem Use hadoop archive Le dim. 27 sept. 2015 à 15:36, < nib...@free.fr > a écrit : Hello, I'm still investigating my small file generation problem generated by my Spark Streaming jobs. Indeed, my Spark Streaming jobs are receiving a lot of small events (avg 10kb), and I have to store them inside HDFS in order to treat them by PIG jobs on-demand. The problem is the fact that I generate a lot of small files in HDFS (several millions) and it can be problematic. I investigated to use Hbase or Archive file but I don't want to do it finally. So, what about this solution : - Spark streaming generate on the fly several millions of small files in HDFS - Each night I merge them inside a big daily file - I launch my PIG jobs on this big file ? Other question I have : - Is it possible to append a big file (daily) by adding on the fly my event ? Tks a lot Nicolas --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org