[Spark][Security] UGI credentials lost between driver and executor in yarn mode

2018-03-20 Thread Ajith shetty
Hi all I see UGI credentials (ex sparkCookie) shared from driver to executor is being lost on driver side in yarn mode. Below is the analysis on start of thriftserver, Step 1. SparkSubmit create submit env which does a loginUserFromKeytab "main@1" prio=5 tid=0x1 nid=NA runnable java.lang.

"Spark.jars not adding jars to classpath"

2018-03-20 Thread Ankit Agrahari
I am trying to add my custom jar in spark job using "spark.jars" property. Although I can read the info in logs of jar getting added but when I check the jars that are added to class path, I doesn't find it.Below are the functions that I also have tried it out. 1)spark.jars 2)spark.driver.extraLibr

Re: Spark.ml roadmap 2.3.0 and beyond

2018-03-20 Thread Stephen Boesch
awesome thanks Joseph 2018-03-20 14:51 GMT-07:00 Joseph Bradley : > The promised roadmap JIRA: https://issues.apache. > org/jira/browse/SPARK-23758 > > Note it doesn't have much explicitly listed yet, but committers can add > items as they agree to shepherd them. (Committers, make sure to check

Re: Spark.ml roadmap 2.3.0 and beyond

2018-03-20 Thread Joseph Bradley
The promised roadmap JIRA: https://issues.apache.org/jira/browse/SPARK-23758 Note it doesn't have much explicitly listed yet, but committers can add items as they agree to shepherd them. (Committers, make sure to check what you're currently listed as shepherding!) The links for searching can be

Re: pyspark DataFrameWriter ignores customized settings?

2018-03-20 Thread Ryan Blue
To clarify what's going on here: dfs.blocksize and dfs.block.size set the HDFS block size (the spark.hadoop. prefix adds this to the Hadoop configuration). The Parquet "block size" is more accurately called the "row group size", but is set using the unfortunately-named property parquet.block.size.

Re: Beginner searching for guidance with Jira and issues

2018-03-20 Thread Joseph Torres
Hi! I can't speak for the other tasks, but SPARK-23444 I'd expect to be pretty complicated. It's not obvious what the right strategy is, and there's a bunch of minor stuff that needs to be cleaned up (e.g. tasks shouldn't print cancellation warnings when cancellation is expected). If you're inter

Beginner searching for guidance with Jira and issues

2018-03-20 Thread Efim Poberezkin
Good time of day, I'd like to contribute to Spark development, but find it difficult to get into the process. I'm somewhat overwhelmed by Spark's Jira as it's hard for me to figure out the complexity of tasks and choose an appropriate one. I've surfed Jira for some time and have selected a few i