Re: Debugging Spark itself in standalone cluster mode

2016-06-30 Thread nirandap
Guys, Aren't TaskScheduler and DAGScheduler residing in the spark context? So, the debug configs need to be set in the JVM where the spark context is running? [1] But yes, I agree, if you really need to check the execution, you need to set those configs in the executors [2] [1]

Re: Logical Plan

2016-06-30 Thread Mich Talebzadeh
I don't think Spark optimizer supports something like statement cache where plan is cached and bind variables (like RDBMS) are used for different values, thus saving the parsing. What you re stating is that the source and tempTable change but the plan itself remains the same. I have not seen this

Re: Logical Plan

2016-06-30 Thread Reynold Xin
drop user@spark and keep only dev@ This is something great to figure out, if you have time. Two things that would be great to try: 1. See how this works on Spark 2.0. 2. If it is slow, try the following: org.apache.spark.sql.catalyst.rules.RuleExecutor.resetTime() // run your query

Re: Logical Plan

2016-06-30 Thread Reynold Xin
Which version are you using here? If the underlying files change, technically we should go through optimization again. Perhaps the real "fix" is to figure out why is logical plan creation so slow for 700 columns. On Thu, Jun 30, 2016 at 1:58 PM, Darshan Singh wrote: >

Re: Debugging Spark itself in standalone cluster mode

2016-06-30 Thread Reynold Xin
Yes, scheduling is centralized in the driver. For debugging, I think you'd want to set the executor JVM, not the worker JVM flags. On Thu, Jun 30, 2016 at 11:36 AM, cbruegg wrote: > Hello everyone, > > I'm a student assistant in research at the University of Paderborn,

Debugging Spark itself in standalone cluster mode

2016-06-30 Thread cbruegg
Hello everyone, I'm a student assistant in research at the University of Paderborn, working on integrating Spark (v1.6.2) with a new network resource management system. I have already taken a deep dive into the source code of spark-core w.r.t. its scheduling systems. We are running a cluster in

Re: branch-2.0 build failure

2016-06-30 Thread Pete Robbins
Ok, thanks. I'll await it appearing. On Thu, 30 Jun 2016 at 14:51 Sean Owen wrote: > TD has literally just merged the fix. > > On Thu, Jun 30, 2016 at 2:37 PM, Pete Robbins wrote: > > Our build on branch-2.0 is failing after the PR for updating kafka to

Re: branch-2.0 build failure

2016-06-30 Thread Sean Owen
TD has literally just merged the fix. On Thu, Jun 30, 2016 at 2:37 PM, Pete Robbins wrote: > Our build on branch-2.0 is failing after the PR for updating kafka to 0.10. > The new kafka pom.xml files are naming the parent version as 2.0.0-SNAPSHOT > but the branch 2.0 poms

branch-2.0 build failure

2016-06-30 Thread Pete Robbins
Our build on branch-2.0 is failing after the PR for updating kafka to 0.10. The new kafka pom.xml files are naming the parent version as 2.0.0-SNAPSHOT but the branch 2.0 poms have been updated to 2.0.1-SNAPSHOT after the rc1 cut. Shouldn't the pom versions remain as 2.0.0-SNAPSHOT until a 2.0.0

Re: Bitmap Indexing to increase OLAP query performance

2016-06-30 Thread Michael Allman
Hi Nishadi, I have not seen bloom filters in Spark. They are mentioned as part of the Orc file format, but I don't know if Spark uses them: https://orc.apache.org/docs/spec-index.html. Parquet has block-level min/max values, null counts, etc for leaf columns in its metadata. I don't believe

Re: Spark 2.0 Performance drop

2016-06-30 Thread Maciej BryƄski
I filled up 2 Jira. 1) Performance when queries nested column https://issues.apache.org/jira/browse/SPARK-16320 2) Pyspark performance https://issues.apache.org/jira/browse/SPARK-16321 I found Jira for: 1) PPD on nested columns https://issues.apache.org/jira/browse/SPARK-5151 2) Drop of support

Re: Bitmap Indexing to increase OLAP query performance

2016-06-30 Thread Nishadi Kirielle
Thank you for the response. Can I please know the reason why bit map indexes are not appropriate for big data. Rather than using the traditional bitmap indexing techniques we are planning to implement a combination of novel bitmap indexing techniques like bit sliced indexes and projection indexes.