ShuffledHashJoin's selection criteria

2018-04-23 Thread Jacek Laskowski
!RowOrdering.isOrderable(leftKeys) => How is !RowOrdering.isOrderable(leftKeys) possible in the second case? I must be missing something...again :( Please help. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spar

[SQL] Number of buckets in metrics of FileSourceScanExec?

2018-04-20 Thread Jacek Laskowski
Hi, With bucketing support enabled by default in 2.3, I think that the number of buckets should be included in the metrics of FileSourceScanExec. WDYT? Shall I report an enhancement in JIRA? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly

Re: Difficulties building spark-master with sbt

2018-02-08 Thread Jacek Laskowski
Hi, s,sbt ./build/sbt,./build/sbt In other words, don't execute sbt with ./build/sbt, but ./build/sbt itself (you don't even have to install sbt to build spark as it's included in the repo and the script uses it internally) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski

[SQL] Why CatalogImpl.refreshTable considers views special (vs SessionCatalog.refreshTable)?

2018-02-03 Thread Jacek Laskowski
/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala#L483 [2] https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala#L750-L754 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering

Are InterpretedProjection and InterpretedMutableProjection of any use at all?

2018-02-03 Thread Jacek Laskowski
Hi, I've ran across InterpretedProjection and InterpretedMutableProjection that seem of no use, esp. InterpretedMutableProjection. What's their purpose in Spark SQL? Why aren't they marked as @deprecated? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https

Re: data source v2 online meetup

2018-02-02 Thread Jacek Laskowski
Hi Reynold, That in general is a very good idea to get the community engaged (even if most people would just listen / hide in the dark like myself). I know no other open source project at ASF or elsewhere that such an initiative was even tried. Kudos for the idea! Pozdrawiam, Jacek Laskowski

[SQL] Tests for ExtractFiltersAndInnerJoins.flattenJoin

2018-01-30 Thread Jacek Laskowski
the plans. I'm wondering if I should file a task in JIRA for this or just send a pull request? I'd appreciate some guidance. [1] https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala#L167 Pozdrawiam, Jacek Laskowski

Nondeterministic Catalyst expressions -- trait and property?!

2018-01-29 Thread Jacek Laskowski
the trait)? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https://twitter.com/jaceklaskowski

Re: Why Dataset.hint uses logicalPlan (= analyzed not planWithBarrier)?

2018-01-26 Thread Jacek Laskowski
ble identifier is resolvable). That would help understanding that part of Spark SQL a little better (i.e. writing a unit test with logical rules and such). Should I fill an issue in JIRA for this? Any suggestions how to do it the right way? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskow

Why Dataset.hint uses logicalPlan (= analyzed not planWithBarrier)?

2018-01-25 Thread Jacek Laskowski
, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https://twitter.com/jaceklaskowski

Nightly Builds in the docs (in spark-nightly/spark-master-bin/latest? Can't seem to find it)

2018-01-21 Thread Jacek Laskowski
Hi, http://spark.apache.org/developer-tools.html#nightly-builds reads: > Spark nightly packages are available at: > Latest master build: https://people.apache.org/~pwendell/spark-nightly/spark-master-bin/latest but the URL gives 404. Is this intended? Pozdrawiam, Jacek Laskowski

DDLUtils.isDatasourceTable vs HiveExternalCatalog.isDatasourceTable

2018-01-17 Thread Jacek Laskowski
%93#L1393 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https://twitter.com

Re: Whole-stage codegen and SparkPlan.newPredicate

2018-01-16 Thread Jacek Laskowski
Thanks for looking into it, Kazuaki! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow

Re: Inner join with the table itself

2018-01-15 Thread Jacek Laskowski
Hi Michael, -dev +user What's the query? How do you "fool spark"? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams http

Remove or rename? What does ResolvedDataSourceSuite test?

2018-01-13 Thread Jacek Laskowski
/ResolvedDataSourceSuite.scala Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kafka-streams Follow me at https://twitter.com

Re: Why some queries use logical.stats while others analyzed.stats?

2018-01-06 Thread Jacek Laskowski
rk/sql/catalyst/plans/logical/basicLogicalOperators.scala#L895 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams https://bit.ly/mastering-kaf

Why some queries use logical.stats while others analyzed.stats?

2018-01-04 Thread Jacek Laskowski
catalyst.plans.logical.statsEstimation.SizeInBytesOnlyStatsPlanVisitor$.default(SizeInBytesOnlyStatsPlanVisitor.scala:27) // analyzed logical plan works fine scala> names.queryExecution.analyzed.stats res23: org.apache.spark.sql.catalyst.plans.logical.Statistics = Statistics(sizeInBytes=48.0 B, hints=none) Pozdrawiam, Jacek

FileSystem.getContentSummary for total size stats in DetermineTableStats VS CommandUtils?

2018-01-02 Thread Jacek Laskowski
r/sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala#L66-L73 [2] https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala?utf8=%E2%9C%93#L126 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski

Whole-stage codegen and SparkPlan.newPredicate

2017-12-30 Thread Jacek Laskowski
fun$mapPartitionsWithIndexInternal$1$$anonfun$apply$24.apply(RDD.scala:816) ... Is this a bug or does it work as intended? Why? [1] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala?utf8=%E2%9C%93#L386 Pozdrawiam, Jacek Laskowski ht

Re: [01/51] [partial] spark-website git commit: 2.2.1 generated doc

2017-12-17 Thread Jacek Laskowski
Hi Sean, What does "Not all the pieces are released yet" mean if you don't mind me asking? 2.2.1 has already been announced, hasn't it? [1] [1] http://spark.apache.org/news/spark-2-2-1-released.html Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured

Re: [01/51] [partial] spark-website git commit: 2.2.1 generated doc

2017-12-17 Thread Jacek Laskowski
, but not http://spark.apache.org/docs/latest :( Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Thu, Dec 14

Re: GenerateExec, CodegenSupport and supportCodegen flag off?!

2017-12-12 Thread Jacek Laskowski
because --> "Disable generate codegen since it fails my workload." - Wished he included the workload to showcase the issue :( Looks like there are a bunch of wise people already on it so I'll just listen... Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured St

Re: GenerateExec, CodegenSupport and supportCodegen flag off?!

2017-12-11 Thread Jacek Laskowski
in whole-stage codegen it can extend CodegenSupport trait and enable accessing GenericInternalRow by turning supportCodegen flag off. I can understand how badly that can read, but without help from Spark SQL devs that's all I can figure out myself. Any help appreciated. Pozdrawiam, Jacek Laskowski

GenerateExec, CodegenSupport and supportCodegen flag off?!

2017-12-10 Thread Jacek Laskowski
/scala/org/apache/spark/sql/execution/GenerateExec.scala#L58-L64 [2] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala#L125 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly

Re: RDD[internalRow] -> DataSet

2017-12-09 Thread Jacek Laskowski
Hi Satyajit, That's exactly what Dataset.rdd does --> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala?utf8=%E2%9C%93#L2916-L2921 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly/sp

Re: BUILD FAILURE due to...not found: value AnalysisBarrier in spark-catalyst_2.11?

2017-12-09 Thread Jacek Laskowski
/sql/catalyst/plans/logical/basicLogicalOperators.scala?utf8=%E2%9C%93#L890 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https:

BUILD FAILURE due to...not found: value AnalysisBarrier in spark-catalyst_2.11?

2017-12-08 Thread Jacek Laskowski
child [error] ^ [error] 8 errors found [error] Compile failed at Dec 8, 2017 5:58:10 PM [8.170s] [INFO] Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly/spark-structur

Deprecating UserDefinedGenerator logical operator?

2017-12-08 Thread Jacek Laskowski
che/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L2092 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski

Re: private methods in mllib

2017-12-01 Thread Jacek Laskowski
Hi Sahm, Unless I'm mistaken [1], but org.apache.spark.mllib is put on hold and is considered @deprecated these days. That'd explain why "so many things made private". [1] https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/package.scala#L21 Pozdraw

Re: [SQL] Why no numOutputRows metric for LocalTableScanExec in webUI?

2017-11-17 Thread Jacek Laskowski
ScanExec does (and so does BroadcastExchangeExec, but that's not a data source so may have different reasons). [1] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala#L31-L32 Pozdrawiam, Jacek Laskowski https://about.me/J

Re: [SQL] Why no numOutputRows metric for LocalTableScanExec in webUI?

2017-11-16 Thread Jacek Laskowski
xec. Could anyone explain it in more detail? I'd appreciate. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jacek

[SQL] Why no numOutputRows metric for LocalTableScanExec in webUI?

2017-11-15 Thread Jacek Laskowski
/localhost:4040/SQL/execution/?id=0 shows no metrics for LocalTableScan. Is this intended? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski

[SS] Why does StreamingQueryManager.notifyQueryTermination use id and runId (not just id)?

2017-10-27 Thread Jacek Laskowski
/streaming/StreamingQueryManager.scala#L335 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski

Re: Structured Streaming and Hive

2017-09-30 Thread Jacek Laskowski
Hi, Guessing it's a timing issue. Once you started the query the batch 0 did not have rows to save or didn't start yet (it's a separate thread) and so spark.sql ran once and saved nothing. You should rather use foreach writer to save results to Hive. Jacek On 29 Sep 2017 11:36 am, "HanPan"

Re: Welcoming Tejas Patil as a Spark committer

2017-09-30 Thread Jacek Laskowski
Hi, Oh, yeah. Seen Tejas here and there in the commits. Well deserved. Jacek On 29 Sep 2017 9:58 pm, "Matei Zaharia" wrote: Hi all, The Spark PMC recently added Tejas Patil as a committer on the project. Tejas has been contributing across several areas of Spark for a

Re: A little Scala 2.12 help

2017-09-19 Thread Jacek Laskowski
Hi, Nice catch, Sean! Learnt this today. They did say you could learn a lot with Spark! :) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming (Apache Spark 2.2+) https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering

Re: [SS] Bug in StreamExecution? currentBatchId and getBatchDescriptionString for web UI

2017-09-10 Thread Jacek Laskowski
Hi, Please disregard my finding. It does not seem a bug, but just a small "dead code" as "init" will never be displayed in web UI = the minimum batch id can ever be 0 and so getBatchDescriptionString could be a little "improved". Sorry for the noise. Pozdrawi

[SS] Bug in StreamExecution? currentBatchId and getBatchDescriptionString for web UI

2017-09-09 Thread Jacek Laskowski
05b0ad1a504e0d6213cf9d331#diff-6532dd3b63bdab0364fbcf2303e290e4R294 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming (Apache Spark 2.2+) https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at http

Re: [SS] Writing a test for a possible bug in StateStoreSaveExec with Append output mode?

2017-09-04 Thread Jacek Laskowski
the state for the key? Example's coming up. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming (Apache Spark 2.2+) https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com

[SS] Writing a test for a possible bug in StateStoreSaveExec with Append output mode?

2017-09-03 Thread Jacek Laskowski
close to a test and that I could use? Thanks for any help you may offer! Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming (Apache Spark 2.2+) https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me

[SS] New numSavedStates metric for StateStoreRestoreExec for saved state?

2017-09-01 Thread Jacek Laskowski
://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala#L206 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming (Apache Spark 2.2+) https://bit.ly/spark-structured-streaming Mastering

Fwd: [jira] [Commented] (SPARK-21728) Allow SparkSubmit to use logging

2017-08-30 Thread Jacek Laskowski
(but it was at least 2 days ago) :( I'm using the master at https://github.com/apache/spark/commit/fba9cc8466dccdcd1f6f372ea7962e7ae9e09be1. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming (Apache Spark 2.2+) https://bit.ly/spark-structured-streaming

[SS] Collapsing EventTimeWatermark logical operators?

2017-08-12 Thread Jacek Laskowski
timestamp#773,value#774L] Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski - To unsubscribe e-mail:

[SS] watermark, eventTime and "StreamExecution: Streaming query made progress"

2017-08-11 Thread Jacek Laskowski
ommit" : 22 }, "eventTime" : { "avg" : "2017-08-11T07:04:23.782Z", "max" : "2017-08-11T07:04:28.282Z", "min" : "2017-08-11T07:04:19.282Z", "watermark" : "2017-08-11T07:04:08.282Z" }, [1] h

Re: Welcoming Hyukjin Kwon and Sameer Agarwal as committers

2017-08-08 Thread Jacek Laskowski
Hi, Congrats!! Looks like Sean is gonna be less busy these days ;-) Jacek On 7 Aug 2017 5:53 p.m., "Matei Zaharia" wrote: > Hi everyone, > > The Spark PMC recently voted to add Hyukjin Kwon and Sameer Agarwal as > committers. Join me in congratulating both of them and

Fwd: spark git commit: [SPARK-21472][SQL] Introduce ArrowColumnVector as a reader for Arrow vectors.

2017-07-21 Thread Jacek Laskowski
. SUCCESS [01:41 min] [INFO] Spark Project SQL .. FAILURE [02:14 min] Is this only me or others suffer from it too? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2 https://bit.ly/mastering

Re: 2.2.0 under Unreleased Versions in JIRA?

2017-07-16 Thread Jacek Laskowski
Confirmed. Thanks a lot, Sean. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Sun, Jul 16, 2017 at 3:02 PM, Sean Owen <so...@cloudera.com> wrote:

2.2.0 under Unreleased Versions in JIRA?

2017-07-16 Thread Jacek Laskowski
Hi, Just noticed that 2.2.0 label is under Unreleased Versions in JIRA. Since it's out, I think 2.2.1 and 2.3.0 are valid only. Correct? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https

Why does Spark SQL use custom spark.sql.execution.id local property not SparkContext.setJobGroup?

2017-06-21 Thread Jacek Laskowski
/apache/spark/sql/execution/SQLExecution.scala#L63 [2] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L265 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2 https://bit.ly

New metrics for WindowExec with number of partitions and frames?

2017-05-26 Thread Jacek Laskowski
Hi, Currently WindowExec gives no metrics in the web UI's Details for Query page. What do you think about adding the number of partitions and frames? That could certainly be super useful, but am unsure if that's the kind of metrics Spark SQL shows in the details. Pozdrawiam, Jacek Laskowski

Which one preferred -- Dataset.ofRows vs SparkSession.baseRelationToDataFrame?

2017-05-15 Thread Jacek Laskowski
t looks so similar to the others [3] [3] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L2940-L2942 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow m

Re: [KafkaSourceProvider] Why topic option and column without reverting to path as the least priority?

2017-05-04 Thread Jacek Laskowski
https://issues.apache.org/jira/browse/SPARK-20597 I'm going to send a PR soon. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Mon, May 1, 2017 at 8:26 PM

[KafkaSourceProvider] Why topic option and column without reverting to path as the least priority?

2017-05-01 Thread Jacek Laskowski
/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L145 [2] https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala#L163 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2 h

GROUPING SETS as Dataset operator? Ordinals support?

2017-04-25 Thread Jacek Laskowski
in groupBy and orderBy, but doesn't seem supported in GROUPING SETS. What do you think about adding the features to Spark SQL? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com

Why separate SessionStateBuilder? (it's BaseSessionStateBuilder)

2017-04-23 Thread Jacek Laskowski
Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski - To unsubscribe e-mail: dev-unsubscr

Catalyst: unary or binary expressions that are not UnaryExpressions or BinaryExpressions? Why?

2017-03-29 Thread Jacek Laskowski
! p.s. Just a side note, since Unevaluated is an Expression why not extend from Unevaluated directly? I can understand why "extends Expression with Unevaluable" could be very valuable, but wish I hear what was the main motivation behind it. Thanks doubled! Pozdrawiam, Jacek Laskowski

[SQL] Registering custom Rule[LogicalPlan] using extendedResolutionRules by overriding SparkSession, SessionState, and Analyzer only?

2017-03-23 Thread Jacek Laskowski
hub.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L107 [3] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/ExperimentalMethods.scala Pozdrawiam, Jacek Laskowski https://medium.com/@jacek

Re: Should we consider a Spark 2.1.1 release?

2017-03-19 Thread Jacek Laskowski
eyeballs the less the number of the mistakes. If we make very fine/minor releases often we should be able to attract more people who spend their time on testing/verification that eventually contribute to a higher quality of Spark. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski

Re: Should we consider a Spark 2.1.1 release?

2017-03-19 Thread Jacek Laskowski
+1 More smaller and more frequent releases (so major releases get even more quality). Jacek On 13 Mar 2017 8:07 p.m., "Holden Karau" wrote: > Hi Spark Devs, > > Spark 2.1 has been out since end of December >

Dynamic Allocation in Core vs YARN -- getDynamicAllocationInitialExecutors vs getInitialTargetExecutorNumber

2017-02-09 Thread Jacek Laskowski
/YarnSparkHadoopUtil.scala#L270 [2] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L2516 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com

Re: Google Summer of Code 2017 is coming

2017-02-03 Thread Jacek Laskowski
Thanks Sean. You've again been very helpful to put the right tone to the matters. I stand corrected and have no interest in GSoC anymore. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https

Re: Executors exceed maximum memory defined with `--executor-memory` in Spark 2.1.0

2017-02-03 Thread Jacek Laskowski
understanding of `spark.memory.offHeap.enabled` is `false` is that it does not disable off heap memory used in Java NIO for buffers in shuffling, RPC, etc. so the memory is always (?) more than you request for mx using executor-memory. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski

Re: Remove support for Hadoop 2.5 and earlier?

2017-02-03 Thread Jacek Laskowski
Hi Sean, Given that 3.0.0 is coming, removing the unused versions would be a huge benefit from maintenance point of view. I'd support removing support for 2.5 and earlier. Speaking of Hadoop support, is anyone considering 3.0.0 support? Can't find any JIRA for this. Pozdrawiam, Jacek Laskowski

Fwd: Google Summer of Code 2017 is coming

2017-02-03 Thread Jacek Laskowski
Hi, Is this something Spark considering? Would be nice to mark issues as GSoC in JIRA and solicit feedback. What do you think? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com

Re: Typo on spark.apache.org? "cyclic data flow"

2017-01-28 Thread Jacek Laskowski
Hi Nicholas, Interesting. Just on the past Monday I was introducing spark and ran into it but thought it's my poor English skills :-) Thanks for spotting it! (I also think that the entire welcome page begs for a face lifting - it's from pre-2.0 days) Jacek On 28 Jan 2017 8:18 p.m., "Nicholas

Re: Why two makeOffers in CoarseGrainedSchedulerBackend? Duplication?

2017-01-26 Thread Jacek Laskowski
Hi Imran, Ok, that makes sense for performance reasons. Thanks for bearing with me and explaining that code with so much patience. Appreciated! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https

Re: Why two makeOffers in CoarseGrainedSchedulerBackend? Duplication?

2017-01-26 Thread Jacek Laskowski
hand, since no one has considered it a small duplication it could be perfectly fine (it did make the code a bit less obvious to me). Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com

Why two makeOffers in CoarseGrainedSchedulerBackend? Duplication?

2017-01-26 Thread Jacek Laskowski
/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L211 [2] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L229 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https

Re: welcoming Burak and Holden as committers

2017-01-24 Thread Jacek Laskowski
Wow! At long last. Congrats Burak and Holden! p.s. I was a bit worried that the process of accepting new committers is equally hard as passing Sean's sanity checks for PRs, but given this it's so much easier it seems :D Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski

[YARN] $ and $$ in prepareCommand to resolve environment in ExecutorRunnable?

2017-01-24 Thread Jacek Laskowski
rc/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala#L210 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/j

clientMode in RpcEnv.create in Spark on YARN vs general case (driver vs executors)?

2017-01-18 Thread Jacek Laskowski
eploy/yarn/ApplicationMaster.scala#L434 [3] https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L254 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark

Re: RpcEnv(Factory) is no longer pluggable? spark.rpc is gone, isn't it?

2017-01-18 Thread Jacek Laskowski
On Wed, Jan 18, 2017 at 8:57 AM, Jacek Laskowski <ja...@japila.pl> wrote: > p.s. How to know when the deprecation was introduced? The last change > is for executor blacklisting so git blame does not show what I want :( > Any ideas? Figured that out myself! $ git log --topo-orde

RpcEnv(Factory) is no longer pluggable? spark.rpc is gone, isn't it?

2017-01-18 Thread Jacek Laskowski
rg/apache/spark/SparkConf.scala#L641 [2] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/RpcEnv.scala#L32 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow m

Re: What about removing TaskContext#getPartitionId?

2017-01-14 Thread Jacek Laskowski
Hi Sean, Can you elaborate on " it's actually used by Spark"? Where exactly? I'd like to be corrected. What about the scaladoc? Since the method's a public API, I think it should be fixed, shouldn't it? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Ap

Re: What about removing TaskContext#getPartitionId?

2017-01-14 Thread Jacek Laskowski
epending on it (unless we go through a > deprecation process for it). > > Regards, > Mridul > > > On Sat, Jan 14, 2017 at 2:02 AM, Jacek Laskowski <ja...@japila.pl> wrote: > > Hi, > > > > Just noticed that TaskContext#getPartitionId [1] is not used an

What about removing TaskContext#getPartitionId?

2017-01-14 Thread Jacek Laskowski
ala/org/apache/spark/TaskContext.scala#L41 [2] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ForeachSink.scala#L50 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-ap

scala.MatchError: scala.collection.immutable.Range.Inclusive from catalyst.ScalaReflection.serializerFor?

2017-01-09 Thread Jacek Laskowski
ion. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski - To unsubscribe e-mail: dev-unsub

protected val mapStatuses is ConcurrentHashMap in both MapOutputTrackerMaster and MapOutputTrackerWorker?

2017-01-08 Thread Jacek Laskowski
/apache/spark/blob/master/core/src/main/scala/org/apache/spark/MapOutputTracker.scala#L84 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski

Re: Quick request: prolific PR openers, review your open PRs

2017-01-08 Thread Jacek Laskowski
+1 What an excellent way to offload some of your chores! I'm so much to learn from you, Sean! (Now since Sean seems to have a bit more time I'm gonna send few PRs hoping he spares some time to find merits in them :)) Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski

Re: Why ShuffleMapTask has transient locs and preferredLocs?!

2017-01-04 Thread Jacek Laskowski
a lot. On to digging deeper... Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Tue, Jan 3, 2017 at 10:08 PM, Imran Rashid <iras...@cloudera.com> wrote

Re: What is mainly different from a UDT and a spark internal type that ExpressionEncoder recognized?

2017-01-03 Thread Jacek Laskowski
Thanks Herman for the explanation. I silently assume that the other points were ok since you did not object? Correct? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com

Re: What is mainly different from a UDT and a spark internal type that ExpressionEncoder recognized?

2017-01-03 Thread Jacek Laskowski
for sharing your notes! Gonna merge yours with mine! Thanks. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Mon, Jan 2, 2017 at 6:30 PM, Shuai Lin <linshu

Why ShuffleMapTask has transient locs and preferredLocs?!

2017-01-03 Thread Jacek Laskowski
(and BlockManagerMaster on the driver) to track the shuffle locations (MapStatuses)? Is my understanding correct? What am I missing? (I'm exploring shuffle system currently and would appreciate comments a lot!) Thanks! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering

Re: [ANNOUNCE] Announcing Apache Spark 2.1.0

2016-12-29 Thread Jacek Laskowski
Hi Yan, I've been surprised the first time when I noticed rxin stepped back and a new release manager stepped in. Congrats on your first ANNOUNCE! I can only expect even more great stuff coming in to Spark from the dev team after Reynold spared some time  Can't wait to read the changes...

Why is spark.shuffle.sort.bypassMergeThreshold 200?

2016-12-28 Thread Jacek Laskowski
> spark.range(5).groupByKey(_ % 5).count.rdd.getNumPartitions res3: Int = 200 I'd appreciate any guidance to get the gist of this seemingly magic number. Thanks! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-sp

Why ShuffleManager.registerShuffle takes shuffleId since ShuffleDependency has it too?

2016-12-28 Thread Jacek Laskowski
/src/main/scala/org/apache/spark/shuffle/ShuffleManager.scala#L35 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-26 Thread Jacek Laskowski
Thanks a LOT, Michael! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Mon, Dec 26, 2016 at 10:04 PM, Michael Gummelt <mgumm...@mesosphere.io>

Re: Mesos Spark Fine Grained Execution - CPU count

2016-12-26 Thread Jacek Laskowski
Hi Michael, That caught my attention... Could you please elaborate on "elastically grow and shrink CPU usage" and how it really works under the covers? It seems that CPU usage is just a "label" for an executor on Mesos. Where's this in the code? Pozdrawiam, Jacek

Use of BroadcastFactory interface (after SPARK-12588 Remove HTTPBroadcast)

2016-11-27 Thread Jacek Laskowski
(and hence Broadcast) in. WDYT? [1] https://issues.apache.org/jira/browse/SPARK-12588 [2] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/broadcast/BroadcastFactory.scala#L25-L30 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering

Re: Analyzing and reusing cached Datasets

2016-11-20 Thread Jacek Laskowski
it anyway to hunt down the "issue")? 2. Defining an override for sameResult in Range (as LocalRelation and other logical operators)? Somehow I feel Spark could do better. Please guide (and help me get better at this low-level infra of Spark SQL). Thanks! Pozdrawiam, Jacek Laskowski ---

Analyzing and reusing cached Datasets

2016-11-19 Thread Jacek Laskowski
nge (0, 1, step=1, splits=Some(8)) == Physical Plan == *Project [id#26L, id#26L AS new#29L] +- *Range (0, 1, step=1, splits=Some(8)) Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at ht

On the use of catalyst.dsl package and deserialize vs CatalystSerde.deserialize

2016-11-13 Thread Jacek Laskowski
/object.scala#L32 [4] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L2498 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com

withExpr private method duplication in Column and functions objects?

2016-11-11 Thread Jacek Laskowski
/org/apache/spark/sql/Column.scala#L152 [2] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L60 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me

[info] Warning: Unknown ScalaCheck args provided: -oDF

2016-10-29 Thread Jacek Laskowski
Hi, Just noticed the messages from the recent build of my pull request in Jenkins: [info] Warning: Unknown ScalaCheck args provided: -oDF I think we should fix it, right? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering

Redundant method in SparkUI and entire SparkUITab?

2016-10-23 Thread Jacek Laskowski
comments to learn Spark better. Thanks. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski

DAGScheduler.handleJobCancellation uses jobIdToStageIds for verification while jobIdToActiveJob for lookup?

2016-10-13 Thread Jacek Laskowski
/spark/scheduler/DAGScheduler.scala#L1372 [2] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1376 Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow

Dynamic allocation / killing executors work? Perhaps it's just web UI?

2016-09-29 Thread Jacek Laskowski
cutors. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski - To unsubscribe e-mail: dev-un

Re: [VOTE] Release Apache Spark 2.0.1 (RC2)

2016-09-29 Thread Jacek Laskowski
that code does not get compiled unless you enable the profile explicitly. I've learnt it's not part of the release, though. Thanks for all the clarifications! I appreciate your patience dealing with my questions a lot! Thanks. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering

Should LeafExpression have children final override (like Nondeterministic)?

2016-09-27 Thread Jacek Laskowski
is that LeafExpression is to mark left expressions so children is assumed to be Nil. Should children be final in LeafExpression? Why not? #curious Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https

Re: [VOTE] Release Apache Spark 2.0.1 (RC3)

2016-09-25 Thread Jacek Laskowski
+1 Ship it! Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Sun, Sep 25, 2016 at 12:08 AM, Reynold Xin <r...@databricks.com> wrote: > Please vote on

<    1   2   3   4   >