Yes the most recent version yes, or you can use phoenix on top of hbase. I recommend to try out both and see which one is the most suitable.
Le sam. 3 oct. 2015 à 13:13, nibiau <nib...@free.fr> a écrit : > Hello, > Thanks if I understand correctly Hive can be a usable to my context ? > > Nicolas > > > > > Envoyé depuis mon appareil mobile Samsung > > Jörn Franke <jornfra...@gmail.com> a écrit : > > If you use transactional tables in hive together with insert, update, > delete then it does the "concatenate " for you automatically in regularly > intervals. Currently this works only with tables in orc.format (stored as > orc) > > Le sam. 3 oct. 2015 à 11:45, <nib...@free.fr> a écrit : > >> Hello, >> So, does Hive is a solution for my need : >> - I receive small messages (10KB) identified by ID (product ID for >> example) >> - Each message I receive is the last picture of my product ID, so I just >> want basically to store last picture products inside HDFS >> in order to process batch on it later. >> >> If I use Hive I suppose I have to use INSERT and UPDATE records and >> periodically CONCATENATE. >> After a CONCATENATE I suppose the records are still updatable. >> >> Tks to confirm if it can be solution for my use case. Or any other idea.. >> >> Thanks a lot ! >> Nicolas >> >> >> ----- Mail original ----- >> De: "Jörn Franke" <jornfra...@gmail.com> >> À: nib...@free.fr, "Brett Antonides" <banto...@gmail.com> >> Cc: user@spark.apache.org >> Envoyé: Samedi 3 Octobre 2015 11:17:51 >> Objet: Re: HDFS small file generation problem >> >> >> >> You can update data in hive if you use the orc format >> >> >> >> Le sam. 3 oct. 2015 à 10:42, < nib...@free.fr > a écrit : >> >> >> Hello, >> Finally Hive is not a solution as I cannot update the data. >> And for archive file I think it would be the same issue. >> Any other solutions ? >> >> Nicolas >> >> ----- Mail original ----- >> De: nib...@free.fr >> À: "Brett Antonides" < banto...@gmail.com > >> Cc: user@spark.apache.org >> Envoyé: Vendredi 2 Octobre 2015 18:37:22 >> Objet: Re: HDFS small file generation problem >> >> Ok thanks, but can I also update data instead of insert data ? >> >> ----- Mail original ----- >> De: "Brett Antonides" < banto...@gmail.com > >> À: user@spark.apache.org >> Envoyé: Vendredi 2 Octobre 2015 18:18:18 >> Objet: Re: HDFS small file generation problem >> >> >> >> >> >> >> >> >> I had a very similar problem and solved it with Hive and ORC files using >> the Spark SQLContext. >> * Create a table in Hive stored as an ORC file (I recommend using >> partitioning too) >> * Use SQLContext.sql to Insert data into the table >> * Use SQLContext.sql to periodically run ALTER TABLE...CONCATENATE to >> merge your many small files into larger files optimized for your HDFS block >> size >> * Since the CONCATENATE command operates on files in place it is >> transparent to any downstream processing >> >> Cheers, >> Brett >> >> >> >> >> >> >> >> >> >> On Fri, Oct 2, 2015 at 3:48 PM, < nib...@free.fr > wrote: >> >> >> Hello, >> Yes but : >> - In the Java API I don't find a API to create a HDFS archive >> - As soon as I receive a message (with messageID) I need to replace the >> old existing file by the new one (name of file being the messageID), is it >> possible with archive ? >> >> Tks >> Nicolas >> >> ----- Mail original ----- >> De: "Jörn Franke" < jornfra...@gmail.com > >> À: nib...@free.fr , "user" < user@spark.apache.org > >> Envoyé: Lundi 28 Septembre 2015 23:53:56 >> Objet: Re: HDFS small file generation problem >> >> >> >> >> >> Use hadoop archive >> >> >> >> Le dim. 27 sept. 2015 à 15:36, < nib...@free.fr > a écrit : >> >> >> Hello, >> I'm still investigating my small file generation problem generated by my >> Spark Streaming jobs. >> Indeed, my Spark Streaming jobs are receiving a lot of small events (avg >> 10kb), and I have to store them inside HDFS in order to treat them by PIG >> jobs on-demand. >> The problem is the fact that I generate a lot of small files in HDFS >> (several millions) and it can be problematic. >> I investigated to use Hbase or Archive file but I don't want to do it >> finally. >> So, what about this solution : >> - Spark streaming generate on the fly several millions of small files in >> HDFS >> - Each night I merge them inside a big daily file >> - I launch my PIG jobs on this big file ? >> >> Other question I have : >> - Is it possible to append a big file (daily) by adding on the fly my >> event ? >> >> Tks a lot >> Nicolas >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >>