Hi there,
Read your question and I do believe you are on right path. But what could be
worth checking is - are you able to connect to s3 bucket from your worker
nodes.
I did read that you are able to do it from your machine but since write
happens at the the worker end, it might be worth checkin
I hope you are using the Query object that is returned by the Structured
streaming, right?
Returned object contains a lot of information about each query and tracking
state of the object should be helpful.
Hope this may help, if not can you please share more details with examples?
Best,
A
--
If I understand correctly, you need to create a UDF (if you are using java
Extend appropriate UDF e.g. UDF1, UDF2 ..etc depending on number of
arguments and have this static list as a member variable in your class.
You can use this udf as filter in your stream directly.
On Tue, Feb 21, 2017 at 8:
Oh that's easy ... just add this to the above statement for each duplicate
column -
.drop(rightDF.col("x")).drop(rightDF.col("y")).
thanks!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Left-Right-Outer-join-on-multiple-Columns-tp26293p26328.html
Sen
did you try this -
DataFrame joinedDf_intersect =
leftDf.select("x", "y", "z")
.join(rightDf,leftDf.col("x").equalTo(rightDf.col("x"))
.and(leftDf.col("y").equalTo(rightDf.col("y"))), "left_outer") ;
Hope that helps.
On Mon, Feb 22, 2016 at 12:22 PM, praneshvyas [via Apache Spark User List] <
m
Did you get any resolution for this?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-YARN-using-Java-1-8-fails-tp24925p25039.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
Hi there,
I have saved my records in to parquet format and am using Spark1.5. But when
I try to fetch the columns it throws exception*
java.lang.ClassCastException: java.lang.Long cannot be cast to
org.apache.spark.unsafe.types.UTF8String*.
This filed is saved as String while writing parquet. so