Re: Spark and Accumulo Delegation tokens

2018-03-22 Thread Saisai Shao
I think you can build your own Accumulo credential provider as similar to HadoopDelegationTokenProvider out of Spark, Spark already provided an interface "ServiceCredentialProvider" for user to plug-in customized credential provider. Thanks Jerry 2018-03-23 14:29 GMT+08:00 Jorge Machado : > Hi G

Spark and Accumulo Delegation tokens

2018-03-22 Thread Jorge Machado
Hi Guys, I’m on the middle of writing a spark Datasource connector for Apache Spark to connect to Accumulo Tablets, because we have Kerberos it get’s a little trick because Spark only handles the Delegation Tokens from Hbase, hive and hdfs. Would be a PR for a implementation of HadoopDelegati

Re: Structured Streaming Spark 2.3 Query

2018-03-22 Thread Bowden, Chris
Use a streaming query listener that tracks repetitive progress events for the same batch id. If x amount of time has elapsed given repetitive progress events for the same batch id, the source is not providing new offsets and stream execution is not scheduling new micro batches. See also: spark.

Structured Streaming Spark 2.3 Query

2018-03-22 Thread Aakash Basu
Hi, What is the way to stop a Spark Streaming job if there is no data inflow for an arbitrary amount of time (eg: 2 mins)? Thanks, Aakash.

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-22 Thread Fawze Abujaber
Hi Shmuel, Did you compile the code against the right branch for Spark 1.6. I tested it and it looks working and now i'm testing the branch for a wide tests, Please use the branch for Spark 1.6 On Fri, Mar 23, 2018 at 12:43 AM, Shmuel Blitz wrote: > Hi Rohit, > > Thanks for sharing this great

Apache Spark Structured Streaming - Kafka Streaming - Option to ignore checkpoint

2018-03-22 Thread M Singh
Hi: I am working on a realtime application using spark structured streaming (v 2.2.1). The application reads data from kafka and if there is a failure, I would like to ignore the checkpoint.  Is there any configuration to just read from last kafka offset after a failure and ignore any offset che

Re: Apache Spark Structured Streaming - Kafka Consumer cannot fetch records for offset exception

2018-03-22 Thread Tathagata Das
Structured Streaming AUTOMATICALLY saves the offsets in a checkpoint directory that you provide. And when you start the query again with the same directory it will just pick up where it left off. https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#recovering-from-failur

Apache Spark Structured Streaming - Kafka Consumer cannot fetch records for offset exception

2018-03-22 Thread M Singh
Hi: I am working with Spark (2.2.1) and Kafka (0.10) on AWS EMR and for the last few days, after running the application for 30-60 minutes get exception from Kafka Consumer included below. The structured streaming application is processing 1 minute worth of data from kafka topic. So I've tried

Re: [Structured Streaming] Application Updates in Production

2018-03-22 Thread Tathagata Das
Yes indeed, we dont directly support schema migration of state as of now. However, depending on what stateful operator you are using, you can work around it. For example, if you are using mapGroupsWithState / flatMapGroupsWithState, you can save explicitly convert your state to avro-encoded bytes a

java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to Case class

2018-03-22 Thread Yong Zhang
I am trying to research a custom Aggregator implementation, and following the example in the Spark sample code here: https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/sql/UserDefinedTypedAggregation.scala But I cannot use it in the agg function, and g

Transaction Examplefor spark streaming in Spark2.2

2018-03-22 Thread KhajaAsmath Mohammed
Hi Cody, I am following to implement the exactly once semantics and also utilize storing the offsets in database. Question I have is how to use hive instead of traditional datastores. write to hive will be successful even though there is any issue with saving offsets into DB. Could you please corr

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-22 Thread Shmuel Blitz
Hi Rohit, Thanks for sharing this great tool. I tried running a spark job with the tool, but it failed with an *IncompatibleClassChangeError *Exception. I have opened an issue on Github.( https://github.com/qubole/sparklens/issues/1) Shmuel On Thu, Mar 22, 2018 at 5:05 PM, Shmuel Blitz wrote:

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-22 Thread Shmuel Blitz
Thanks. We will give this a try and report back. Shmuel On Thu, Mar 22, 2018 at 4:22 PM, Rohit Karlupia wrote: > Thanks everyone! > Please share how it works and how it doesn't. Both help. > > Fawaze, just made few changes to make this work with spark 1.6. Can you > please try building from br

Re: Is there a mutable dataframe spark structured streaming 2.3.0?

2018-03-22 Thread kant kodali
Thanks all! On Thu, Mar 22, 2018 at 2:08 AM, Jorge Machado wrote: > DataFrames are not mutable. > > Jorge Machado > > > On 22 Mar 2018, at 10:07, Aakash Basu wrote: > > Hey, > > I faced the same issue a couple of days back, kindly go through the mail > chain with "*Multiple Kafka Spark Streamin

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-22 Thread Rohit Karlupia
Thanks everyone! Please share how it works and how it doesn't. Both help. Fawaze, just made few changes to make this work with spark 1.6. Can you please try building from branch *spark_1.6* thanks, rohitk On Thu, Mar 22, 2018 at 10:18 AM, Fawze Abujaber wrote: > It's super amazing i see

Re: Need config params while doing rdd.foreach or map

2018-03-22 Thread ayan guha
Spark context runs in driver whereas the func inside foreach runs in executor. You can pass on the param in the func so it is available in executor On Thu, 22 Mar 2018 at 8:18 pm, Kamalanathan Venkatesan < kamalanatha...@in.ey.com> wrote: > Hello All, > > > > I have custom parameter say for examp

Need config params while doing rdd.foreach or map

2018-03-22 Thread Kamalanathan Venkatesan
Hello All, I have custom parameter say for example file name added to the conf of spark context example SparkConf.set(INPUT_FILE_NAME, fileName). I need this value inside foreach performed on an RDD, but the when access spark context inside foreach, I receive spark context is null exception!

Re: Is there a mutable dataframe spark structured streaming 2.3.0?

2018-03-22 Thread Jorge Machado
DataFrames are not mutable. Jorge Machado > On 22 Mar 2018, at 10:07, Aakash Basu wrote: > > Hey, > > I faced the same issue a couple of days back, kindly go through the mail > chain with "Multiple Kafka Spark Streaming Dataframe Join query" as subject, > TD and Chris has cleared my doubts

Re: Is there a mutable dataframe spark structured streaming 2.3.0?

2018-03-22 Thread Aakash Basu
Hey, I faced the same issue a couple of days back, kindly go through the mail chain with "*Multiple Kafka Spark Streaming Dataframe Join query*" as subject, TD and Chris has cleared my doubts, it would help you too. Thanks, Aakash. On Thu, Mar 22, 2018 at 7:50 AM, kant kodali wrote: > Hi All,

Re: Spark Druid Ingestion

2018-03-22 Thread nayan sharma
Hey Jorge, Thanks for responding. Can you elaborate on the user permission part ? HDFS or local ? As of now, hdfs path -> hdfs://n2pl-pa-hdn220.xxx.xxx:8020/user/yarn/.sparkStaging/application_1521457397747_0013/__spark_libs__8247917347016008883.zip already has complete access for yarn user

Spark Druid Ingestion

2018-03-22 Thread nayan sharma
Hi All,As druid uses Hadoop MapReduce to ingest batch data but I am trying spark for ingesting data into druid taking reference from https://github.com/metamx/druid-spark-batchBut we are stuck at the following error.Application Log:—>2018-03-20T07:54:28,782 INFO [task-runner-0-priority-0] org.apach

Re: Spark Druid Ingestion

2018-03-22 Thread Jorge Machado
Seems to me permissions problems ! Can you check your user / folder permissions ? Jorge Machado > On 22 Mar 2018, at 08:21, nayan sharma wrote: > > Hi All, > As druid uses Hadoop MapReduce to ingest batch data but I am trying spark for > ingesting data into druid taking reference from