subject:"Does Apache Spark take into account JDBC indexes \/ statistics when optimizing queries\?"

Re: Does Apache Spark take into account JDBC indexes / statistics when optimizing queries?

2017-10-20 Thread lucas.g...@gmail.com

Right, that makes sense and I understood that. The thing I'm wondering about (And i think the answer is 'no' at this stage). When the optimizer is running and pushing predicates down, does it take into account indexing and other storage layer strategies in determining which predicates are process

Re: Does Apache Spark take into account JDBC indexes / statistics when optimizing queries?

2017-10-20 Thread Mich Talebzadeh

here below Gary filtered_df = spark.hiveContext.sql(""" SELECT * FROM df WHERE type = 'type' AND action = 'action' AND audited_changes LIKE '---\ncompany_id:\n- %' """) filtered_audits.registerTempTable("filtered_df") you are using hql to read

Re: Does Apache Spark take into account JDBC indexes / statistics when optimizing queries?

2017-10-19 Thread lucas.g...@gmail.com

Ok, so when Spark is forming queries it's ignorant of the underlying storage layer index. If there is an index on a table Spark doesn't take that into account when doing the predicate push down in optimization. In that case why does spark push 2 of my conditions (where fieldx = 'action') to the da

Re: Does Apache Spark take into account JDBC indexes / statistics when optimizing queries?

2017-10-19 Thread Mich Talebzadeh

remember your indexes are in RDBMS. In this case MySQL. When you are reading from that table you have an 'id' column which I assume is an integer and you are making parallel threads through JDBC connection to that table. You can see the threads in MySQL if you query it. You can see multiple threads

Re: Does Apache Spark take into account JDBC indexes / statistics when optimizing queries?

2017-10-19 Thread lucas.g...@gmail.com

If the underlying table(s) have indexes on them. Does spark use those indexes to optimize the query? IE if I had a table in my JDBC data source (mysql in this case) had several indexes and my query was filtering on one of the fields with an index. Would spark know to push that predicate to the da

Re: Does Apache Spark take into account JDBC indexes / statistics when optimizing queries?

2017-10-19 Thread Mich Talebzadeh

sorry what do you mean my JDBC table has an index on it? Where are you reading the data from the table? I assume you are referring to "id" column on the table that you are reading through JDBC connection. Then you are creating a temp Table called "df". That temp table is created in temporary work

Does Apache Spark take into account JDBC indexes / statistics when optimizing queries?

2017-10-19 Thread lucas.g...@gmail.com

IE: If my JDBC table has an index on it, will the optimizer consider that when pushing predicates down? I noticed in a query like this: df = spark.hiveContext.read.jdbc( url=jdbc_url, table="schema.table", column="id", lowerBound=lower_bound_id, upperBound=upper_bound_id, numPartitio

Re: Does Apache Spark take into account JDBC indexes / statistics when optimizing queries?

Re: Does Apache Spark take into account JDBC indexes / statistics when optimizing queries?

Re: Does Apache Spark take into account JDBC indexes / statistics when optimizing queries?

Re: Does Apache Spark take into account JDBC indexes / statistics when optimizing queries?

Re: Does Apache Spark take into account JDBC indexes / statistics when optimizing queries?

Re: Does Apache Spark take into account JDBC indexes / statistics when optimizing queries?

Does Apache Spark take into account JDBC indexes / statistics when optimizing queries?

7 matches

Site Navigation

Mail list logo

Footer information