Hive was originally not designed for updates, because it was.purely warehouse focused, the most recent one can do updates, deletes etc in a transactional way. However, you may also use Hbase with phoenix for that depending on your other functional and non-functional requirements
Le sam. 3 oct. 2015 à 16:48, <nib...@free.fr> a écrit : > Thanks a lot, why you said "the most recent version" ? > > ----- Mail original ----- > De: "Jörn Franke" <jornfra...@gmail.com> > À: "nibiau" <nib...@free.fr> > Cc: banto...@gmail.com, user@spark.apache.org > Envoyé: Samedi 3 Octobre 2015 13:56:43 > Objet: Re: RE : Re: HDFS small file generation problem > > > > Yes the most recent version yes, or you can use phoenix on top of hbase. I > recommend to try out both and see which one is the most suitable. > > > > Le sam. 3 oct. 2015 à 13:13, nibiau < nib...@free.fr > a écrit : > > > > > Hello, > Thanks if I understand correctly Hive can be a usable to my context ? > > > Nicolas > > > > > > > > > > Envoyé depuis mon appareil mobile Samsung > Jörn Franke < jornfra...@gmail.com > a écrit : > > > > If you use transactional tables in hive together with insert, update, > delete then it does the "concatenate " for you automatically in regularly > intervals. Currently this works only with tables in orc.format (stored as > orc) > > > > > Le sam. 3 oct. 2015 à 11:45, < nib...@free.fr > a écrit : > > > Hello, > So, does Hive is a solution for my need : > - I receive small messages (10KB) identified by ID (product ID for example) > - Each message I receive is the last picture of my product ID, so I just > want basically to store last picture products inside HDFS > in order to process batch on it later. > > If I use Hive I suppose I have to use INSERT and UPDATE records and > periodically CONCATENATE. > After a CONCATENATE I suppose the records are still updatable. > > Tks to confirm if it can be solution for my use case. Or any other idea.. > > Thanks a lot ! > Nicolas > > > ----- Mail original ----- > De: "Jörn Franke" < jornfra...@gmail.com > > À: nib...@free.fr , "Brett Antonides" < banto...@gmail.com > > Cc: user@spark.apache.org > Envoyé: Samedi 3 Octobre 2015 11:17:51 > Objet: Re: HDFS small file generation problem > > > > You can update data in hive if you use the orc format > > > > Le sam. 3 oct. 2015 à 10:42, < nib...@free.fr > a écrit : > > > Hello, > Finally Hive is not a solution as I cannot update the data. > And for archive file I think it would be the same issue. > Any other solutions ? > > Nicolas > > ----- Mail original ----- > De: nib...@free.fr > À: "Brett Antonides" < banto...@gmail.com > > Cc: user@spark.apache.org > Envoyé: Vendredi 2 Octobre 2015 18:37:22 > Objet: Re: HDFS small file generation problem > > Ok thanks, but can I also update data instead of insert data ? > > ----- Mail original ----- > De: "Brett Antonides" < banto...@gmail.com > > À: user@spark.apache.org > Envoyé: Vendredi 2 Octobre 2015 18:18:18 > Objet: Re: HDFS small file generation problem > > > > > > > > > I had a very similar problem and solved it with Hive and ORC files using > the Spark SQLContext. > * Create a table in Hive stored as an ORC file (I recommend using > partitioning too) > * Use SQLContext.sql to Insert data into the table > * Use SQLContext.sql to periodically run ALTER TABLE...CONCATENATE to > merge your many small files into larger files optimized for your HDFS block > size > * Since the CONCATENATE command operates on files in place it is > transparent to any downstream processing > > Cheers, > Brett > > > > > > > > > > On Fri, Oct 2, 2015 at 3:48 PM, < nib...@free.fr > wrote: > > > Hello, > Yes but : > - In the Java API I don't find a API to create a HDFS archive > - As soon as I receive a message (with messageID) I need to replace the > old existing file by the new one (name of file being the messageID), is it > possible with archive ? > > Tks > Nicolas > > ----- Mail original ----- > De: "Jörn Franke" < jornfra...@gmail.com > > À: nib...@free.fr , "user" < user@spark.apache.org > > Envoyé: Lundi 28 Septembre 2015 23:53:56 > Objet: Re: HDFS small file generation problem > > > > > > Use hadoop archive > > > > Le dim. 27 sept. 2015 à 15:36, < nib...@free.fr > a écrit : > > > Hello, > I'm still investigating my small file generation problem generated by my > Spark Streaming jobs. > Indeed, my Spark Streaming jobs are receiving a lot of small events (avg > 10kb), and I have to store them inside HDFS in order to treat them by PIG > jobs on-demand. > The problem is the fact that I generate a lot of small files in HDFS > (several millions) and it can be problematic. > I investigated to use Hbase or Archive file but I don't want to do it > finally. > So, what about this solution : > - Spark streaming generate on the fly several millions of small files in > HDFS > - Each night I merge them inside a big daily file > - I launch my PIG jobs on this big file ? > > Other question I have : > - Is it possible to append a big file (daily) by adding on the fly my > event ? > > Tks a lot > Nicolas > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >