Look at Spark jobserver namedRDD that are supposed to be thread safe...

2017-04-24 16:01 GMT+02:00 Hemanth Gudela <hemanth.gud...@qvantel.com>:

> Hello Gene,
>
>
>
> Thanks, but Alluxio did not solve my spark streaming use case because my
> source parquet files in Alluxio in-memory are not ”appended” but are
> periodically being ”overwritten” due to the nature of business need.
>
> Spark jobs fail when trying to read parquet files at the same time when
> other job is writing parquet files in Alluxio.
>
>
>
> Could you suggest a way to synchronize parquet reads and writes in Allxio
> in-memory. i.e. when one spark job is writing a dataframe as parquet file
> in alluxio in-memory, the other spark jobs trying to read must wait until
> the write is finished.
>
>
>
> Thanks,
>
> Hemanth
>
>
>
> *From: *Gene Pang <gene.p...@gmail.com>
> *Date: *Monday, 24 April 2017 at 16.41
> *To: *vincent gromakowski <vincent.gromakow...@gmail.com>
> *Cc: *Hemanth Gudela <hemanth.gud...@qvantel.com>, "user@spark.apache.org"
> <user@spark.apache.org>, Felix Cheung <felixcheun...@hotmail.com>
>
> *Subject: *Re: Spark SQL - Global Temporary View is not behaving as
> expected
>
>
>
> As Vincent mentioned, Alluxio helps with sharing data across different
> Spark contexts. This blog post about Spark dataframes and Alluxio
> discusses that use case
> <https://alluxio.com/blog/effective-spark-dataframes-with-alluxio>.
>
>
>
> Thanks,
>
> Gene
>
>
>
> On Sat, Apr 22, 2017 at 2:14 AM, vincent gromakowski <
> vincent.gromakow...@gmail.com> wrote:
>
> Look at alluxio for sharing across drivers or spark jobserver
>
>
>
> Le 22 avr. 2017 10:24 AM, "Hemanth Gudela" <hemanth.gud...@qvantel.com> a
> écrit :
>
> Thanks for your reply.
>
>
>
> Creating a table is an option, but such approach slows down reads & writes
> for a real-time analytics streaming use case that I’m currently working on.
>
> If at all global temporary view could have been accessible across
> sessions/spark contexts, that would have simplified my usecase a lot.
>
>
>
> But yeah, thanks for explaining the behavior of global temporary view, now
> it’s clear J
>
>
>
> -Hemanth
>
>
>
> *From: *Felix Cheung <felixcheun...@hotmail.com>
> *Date: *Saturday, 22 April 2017 at 11.05
> *To: *Hemanth Gudela <hemanth.gud...@qvantel.com>, "user@spark.apache.org"
> <user@spark.apache.org>
> *Subject: *Re: Spark SQL - Global Temporary View is not behaving as
> expected
>
>
>
> Cross session is this context is multiple spark sessions from the same
> spark context. Since you are running two shells, you are having different
> spark context.
>
>
>
> Do you have to you a temp view? Could you create a table?
>
>
>
> _____________________________
> From: Hemanth Gudela <hemanth.gud...@qvantel.com>
> Sent: Saturday, April 22, 2017 12:57 AM
> Subject: Spark SQL - Global Temporary View is not behaving as expected
> To: <user@spark.apache.org>
>
>
> Hi,
>
>
>
> According to documentation
> <http://spark.apache.org/docs/latest/sql-programming-guide.html#global-temporary-view>,
> global temporary views are cross-session accessible.
>
>
>
> But when I try to query a global temporary view from another spark shell
> like thisà
>
> *Instance 1 of spark-shell*
>
> ----------------------------------
>
> scala> spark.sql("select 1 as col1").createGlobalTempView("gView1")
>
>
>
> *Instance 2 of spark-shell *(while Instance 1 of spark-shell is still
> alive)
>
> ---------------------------------
>
> scala> spark.sql("select * from global_temp.gView1").show()
>
> org.apache.spark.sql.AnalysisException: Table or view not found:
> `global_temp`.`gView1`
>
> 'Project [*]
>
> +- 'UnresolvedRelation `global_temp`.`gView1`
>
>
>
> I am expecting that global temporary view created in shell 1 should be
> accessible in shell 2, but it isn’t!
>
> Please correct me if I missing something here.
>
>
>
> Thanks (in advance),
>
> Hemanth
>
>
>
>
>

Reply via email to