Re: how to change temp directory when spark write data ?

2018-12-05 Thread Sandip Mehta
tryspark.local.dir property. On Wed, Dec 5, 2018 at 1:42 PM JF Chen wrote: > I have two spark apps writing data to one directory. I notice they share > one temp directory, and the spark fist finish writing will clear the temp > directory and the slower one may throw "No lease on *** File does n

Re: [Structured Streaming] Reuse computation result

2018-02-01 Thread Sandip Mehta
You can use persist() or cache() operation on DataFrame. On Tue, Dec 26, 2017 at 4:02 PM Shu Li Zheng wrote: > Hi all, > > I have a scenario like this: > > val df = dataframe.map().filter() > // agg 1 > val query1 = df.sum.writeStream.start > // agg 2 > val query2 = df.count.writeStream.start >

Stateful Aggregation Using flatMapGroupsWithState

2017-12-16 Thread Sandip Mehta
Hi All, I am getting following error message while applying *flatMapGroupsWithState.* *Exception in thread "main" org.apache.spark.sql.AnalysisException: flatMapGroupsWithState in update mode is not supported with aggregation on a streaming DataFrame/Dataset;;* Following is what I am trying to d

Re: Row Encoder For DataSet

2017-12-10 Thread Sandip Mehta
. Regards Sandeep On Fri, Dec 8, 2017 at 12:47 PM Georg Heiler wrote: > You are looking for an UADF. > Sandip Mehta schrieb am Fr. 8. Dez. 2017 um > 06:20: > >> Hi, >> >> I want to group on certain columns and then for every group wants to >> apply custom U

Re: Row Encoder For DataSet

2017-12-07 Thread Sandip Mehta
s this you want ? > > On Fri, Dec 8, 2017 at 11:51 AM, Sandip Mehta > wrote: > >> Hi, >> >> During my aggregation I end up having following schema. >> >> Row(Row(val1,val2), Row(val1,val2,val3...)) >> >> val values = Seq( >>

Row Encoder For DataSet

2017-12-07 Thread Sandip Mehta
Hi, During my aggregation I end up having following schema. Row(Row(val1,val2), Row(val1,val2,val3...)) val values = Seq( (Row(10, 11), Row(10, 2, 11)), (Row(10, 11), Row(10, 2, 11)), (Row(20, 11), Row(10, 2, 11)) ) 1st tuple is used to group the relevant records for aggregation.

Number Of Jobs In Spark Streaming

2016-03-04 Thread Sandip Mehta
Hi All, Is it fair to say that, number of jobs in a given spark streaming application is equal to number of actions in an application? Regards Sandeep - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional co

Spark SQL - Reading HCatalog Table

2015-12-03 Thread Sandip Mehta
Hi All, I have a table created in Hive and stored/read using HCatalog. Table is in ORC format. I want to read this table in Spark SQL and do the join with RDDs. How can i connect to HCatalog and get data from Spark SQL? SM

Re: Calculating Timeseries Aggregation

2015-11-19 Thread Sandip Mehta
set of Spark jobs that pulled the raw data from Cassandra, > computed the aggregations and more complex metrics, and wrote it back to the > relevant Cassandra tables. These jobs ran periodically every few minutes. > > Regards, > Sanket > > On Thu, Nov 19, 2015 at 8:09 AM, San

Re: Calculating Timeseries Aggregation

2015-11-18 Thread Sandip Mehta
d third approaches are more > efficient. > > TD > > > On Wed, Nov 18, 2015 at 7:15 AM, Sandip Mehta <mailto:sandip.mehta@gmail.com>> wrote: > TD thank you for your reply. > > I agree on data store requirement. I am using HBase as an underlying store. > &g

Re: Calculating Timeseries Aggregation

2015-11-18 Thread Sandip Mehta
tems. Like a database, or a key-value store. Spark Streaming would > just aggregate and push the necessary data to the data store. > > TD > > On Sat, Nov 14, 2015 at 9:32 PM, Sandip Mehta <mailto:sandip.mehta@gmail.com>> wrote: > Hi, > > I am working on req

Calculating Timeseries Aggregation

2015-11-14 Thread Sandip Mehta
Hi, I am working on requirement of calculating real time metrics and building prototype on Spark streaming. I need to build aggregate at Seconds, Minutes, Hours and Day level. I am not sure whether I should calculate all these aggregates as different Windowed function on input DStream or sh

Re: Checkpointing an InputDStream from Kafka

2015-11-07 Thread Sandip Mehta
I believe you’ll have to use another way of creating StreamingContext by passing create function in getOrCreate function. private def setupSparkContext(): StreamingContext = { val streamingSparkContext = { val sparkConf = new SparkConf().setAppName(config.appName).setMaster(config.master)

Re: [Spark Streaming] Design Patterns forEachRDD

2015-10-21 Thread Sandip Mehta
Does this help ? final JavaHBaseContext hbaseContext = new JavaHBaseContext(javaSparkContext, conf); customerModels.foreachRDD(new Function, Void>() { private static final long serialVersionUID = 1L; @Override public Void call(JavaRDD currentRDD) throws Exception { J

Re: Call Site - Spark Context

2015-10-01 Thread Sandip Mehta
re in the user’s code an RDD is > created. > --- > Robin East > Spark GraphX in Action Michael Malak and Robin East > Manning Publications Co. > http://www.manning.com/books/spark-graphx-in-action > <http://www.manning.com/books/spark-graphx-

Re: Call Site - Spark Context

2015-10-01 Thread Sandip Mehta
re in the user’s code an RDD is > created. > --- > Robin East > Spark GraphX in Action Michael Malak and Robin East > Manning Publications Co. > http://www.manning.com/books/spark-graphx-in-action > <http://www.manning.com/books/spark-graphx-

Call Site - Spark Context

2015-10-01 Thread Sandip Mehta
Hi, I wanted to understand what is the purpose of Call Site in Spark Context? Regards SM - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org