Streaming : WAL ignored

2018-05-24 Thread Walid Lezzar
Hi, I have a spark streaming application running on yarn that consumes from a jms source. I have the checkpointing and WAL enabled to ensure zero data loss. However, When I suddenly kill my application and restarts it, sometimes it recovers the data from the WAL but sometimes it doesn’t !! In a

Re: How to read the schema of a partitioned dataframe without listing all the partitions ?

2018-04-27 Thread Walid Lezzar
bal Temporary View. Temporary views in Spark SQL are session-scoped and > will disappear if the session that creates it terminates. If you want to have > a temporary view that is shared among all sessions and keep alive until the > Spark application terminates, you can create a global tem

How to read the schema of a partitioned dataframe without listing all the partitions ?

2018-04-27 Thread Walid LEZZAR
Hi, I have a parquet on S3 partitioned by day. I have 2 years of data (-> about 1000 partitions). With spark, when I just want to know the schema of this parquet without even asking for a single row of data, spark tries to list all the partitions and the nested partitions of the parquet. Which mak

Re: Spark Metrics : Why is the Sink class declared private[spark] ?

2016-04-02 Thread Walid Lezzar
gt;> On Sat, Apr 2, 2016 at 6:48 AM, Walid Lezzar wrote: >> Hi, >> >> I looked into the spark code at how spark report metrics using the >> MetricsSystem class. I've seen that the spark MetricsSystem class when >> instantiated parses the metrics.properties

Spark Metrics : Why is the Sink class declared private[spark] ?

2016-04-01 Thread Walid Lezzar
Hi, I looked into the spark code at how spark report metrics using the MetricsSystem class. I've seen that the spark MetricsSystem class when instantiated parses the metrics.properties file, tries to find the sinks class name and load them dinamically. It would be great to implement my own sink

Constantly increasing Spark streaming heap memory

2016-02-20 Thread Walid LEZZAR
Hi, I'm running a Spark Streaming job that pulls data from Kafka (using the direct approach method - without receiver) and pushes it into elasticsearch. The job is running fine but I was suprised once I opened jconsole to monitor it : I noticed that the heap memory is constantly increasing until t

[Spark 1.6] Spark Streaming - java.lang.AbstractMethodError

2016-01-07 Thread Walid LEZZAR
Hi, We have been using spark streaming for a little while now. Until now, we were running our spark streaming jobs in spark 1.5.1 and it was working well. Yesterday, we upgraded to spark 1.6.0 without any changes in the code. But our streaming jobs are not working any more. We are getting an "Abs