ndredi 2 Octobre 2015 18:37:22
Objet: Re: HDFS small file generation problem
Ok thanks, but can I also update data instead of insert data ?
- Mail original -
De: "Brett Antonides" <banto...@gmail.com>
À: user@spark.apache.org
Envoyé: Vendredi 2 Octobre 2015 18:18:18
Objet
;Jörn Franke" <jornfra...@gmail.com>
À: nib...@free.fr, "Brett Antonides" <banto...@gmail.com>
Cc: user@spark.apache.org
Envoyé: Samedi 3 Octobre 2015 11:17:51
Objet: Re: HDFS small file generation problem
You can update data in hive if you use the orc format
Le
nks a lot !
> Nicolas
>
>
> - Mail original -
> De: "Jörn Franke" <jornfra...@gmail.com>
> À: nib...@free.fr, "Brett Antonides" <banto...@gmail.com>
> Cc: user@spark.apache.org
> Envoyé: Samedi 3 Octobre 2015 11:17:51
> Objet: Re: H
nides" <banto...@gmail.com>
> Cc: user@spark.apache.org
> Envoyé: Samedi 3 Octobre 2015 11:17:51
> Objet: Re: HDFS small file generation problem
>
>
>
> You can update data in hive if you use the orc format
>
>
>
> Le sam. 3 oct. 2015 à 10:42, < nib...@
gt; - Mail original -----
>> De: "Jörn Franke" <jornfra...@gmail.com>
>> À: nib...@free.fr, "Brett Antonides" <banto...@gmail.com>
>> Cc: user@spark.apache.org
>> Envoyé: Samedi 3 Octobre 2015 11:17:51
>> Objet: Re: HDFS small fil
; Nicolas
>
> - Mail original -
> De: nib...@free.fr
> À: "Brett Antonides" <banto...@gmail.com>
> Cc: user@spark.apache.org
> Envoyé: Vendredi 2 Octobre 2015 18:37:22
> Objet: Re: HDFS small file generation problem
>
> Ok thanks, but can I also upda
user@spark.apache.org
Envoyé: Samedi 3 Octobre 2015 11:17:51
Objet: Re: HDFS small file generation problem
You can update data in hive if you use the orc format
Le sam. 3 oct. 2015 à 10:42, < nib...@free.fr > a écrit :
Hello,
Finally Hive is not a solution as I cannot update the data.
Thanks a lot, why you said "the most recent version" ?
- Mail original -
De: "Jörn Franke" <jornfra...@gmail.com>
À: "nibiau" <nib...@free.fr>
Cc: banto...@gmail.com, user@spark.apache.org
Envoyé: Samedi 3 Octobre 2015 13:56:43
Objet: Re: RE : Re:
@spark.apache.org
> Envoyé: Samedi 3 Octobre 2015 13:56:43
> Objet: Re: RE : Re: HDFS small file generation problem
>
>
>
> Yes the most recent version yes, or you can use phoenix on top of hbase. I
> recommend to try out both and see which one is the most suitable.
>
>
&
: "Jörn Franke" <jornfra...@gmail.com>
À: nib...@free.fr, "user" <user@spark.apache.org>
Envoyé: Lundi 28 Septembre 2015 23:53:56
Objet: Re: HDFS small file generation problem
Use hadoop archive
Le dim. 27 sept. 2015 à 15:36, < nib...@free.fr > a écrit :
Mail original -
> De: "Jörn Franke" <jornfra...@gmail.com>
> À: nib...@free.fr, "user" <user@spark.apache.org>
> Envoyé: Lundi 28 Septembre 2015 23:53:56
> Objet: Re: HDFS small file generation problem
>
>
>
> Use hadoop archive
>
>
>
Ok thanks, but can I also update data instead of insert data ?
- Mail original -
De: "Brett Antonides" <banto...@gmail.com>
À: user@spark.apache.org
Envoyé: Vendredi 2 Octobre 2015 18:18:18
Objet: Re: HDFS small file generation problem
I had a very similar pr
Use hadoop archive
Le dim. 27 sept. 2015 à 15:36, a écrit :
> Hello,
> I'm still investigating my small file generation problem generated by my
> Spark Streaming jobs.
> Indeed, my Spark Streaming jobs are receiving a lot of small events (avg
> 10kb), and I have to store them
I would suggest not to write small files to hdfs. rather you can hold them
in memory, maybe off heap. and then you may flush it to hdfs using another
job. similar to https://github.com/ptgoetz/storm-hdfs (not sure if spark
already has something like it)
On Sun, Sep 27, 2015 at 11:36 PM,
You could try a couple of things
a) use Kafka for stream processing, store current incoming events and spark
streaming job ouput in Kafka rather than on HDFS and dual write to HDFS too
(in a micro batched mode), so every x minutes. Kafka is more suited to
processing lots of small events/
b)
15 matches
Mail list logo