Basically they need NOSQL like random update access.
On Fri, Sep 4, 2015 at 9:56 AM, Ted Yu <yuzhih...@gmail.com> wrote: > What about concurrent access (read / update) to the small file with same > key ? > > That can get a bit tricky. > > On Thu, Sep 3, 2015 at 2:47 PM, Jörn Franke <jornfra...@gmail.com> wrote: > >> Well it is the same as in normal hdfs, delete file and put a new one with >> the same name works. >> >> Le jeu. 3 sept. 2015 à 21:18, <nib...@free.fr> a écrit : >> >>> HAR archive seems a good idea , but just a last question to be sure to >>> do the best choice : >>> - Is it possible to override (remove/replace) a file inside the HAR ? >>> Basically the name of my small files will be the keys of my records , >>> and sometimes I will need to replace the content of a file by a new content >>> (remove/replace) >>> >>> >>> Tks a lot >>> Nicolas >>> >>> ----- Mail original ----- >>> De: "Jörn Franke" <jornfra...@gmail.com> >>> À: nib...@free.fr >>> Cc: user@spark.apache.org >>> Envoyé: Jeudi 3 Septembre 2015 19:29:42 >>> Objet: Re: Small File to HDFS >>> >>> >>> >>> Har is transparent and hardly any performance overhead. You may decide >>> not to compress or use a fast compression algorithm, such as snappy >>> (recommended) >>> >>> >>> >>> Le jeu. 3 sept. 2015 à 16:17, < nib...@free.fr > a écrit : >>> >>> >>> My main question in case of HAR usage is , is it possible to use Pig on >>> it and what about performances ? >>> >>> ----- Mail original ----- >>> De: "Jörn Franke" < jornfra...@gmail.com > >>> À: nib...@free.fr , user@spark.apache.org >>> Envoyé: Jeudi 3 Septembre 2015 15:54:42 >>> Objet: Re: Small File to HDFS >>> >>> >>> >>> >>> Store them as hadoop archive (har) >>> >>> >>> Le mer. 2 sept. 2015 à 18:07, < nib...@free.fr > a écrit : >>> >>> >>> Hello, >>> I'am currently using Spark Streaming to collect small messages (events) >>> , size being <50 KB , volume is high (several millions per day) and I have >>> to store those messages in HDFS. >>> I understood that storing small files can be problematic in HDFS , how >>> can I manage it ? >>> >>> Tks >>> Nicolas >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> > -- ------------------------------------------------ Thanks! Tao