Re: Is there a way to use Ignite optimization and Spark optimization together when using Spark Dataframe API?

2018-10-02 Thread Ilya Kasnacheev
Hello! I suggest that you check those possibilities out: Does performance increase dramatically if you need it on 10% of data, i.e., ~1 million records? Does something change when you have only one client connected? Note that I was running this example on a single node so it should not be hard

Re: Is there a way to use Ignite optimization and Spark optimization together when using Spark Dataframe API?

2018-09-28 Thread Ray
Actually there's only one row in b. SELECT COUNT(*) FROM b where x = '1'; COUNT(*) 1 1 row selected (0.003 seconds) Maybe because the join performance drops dramatically when the data size is more than 10 million or cluster has a lot of clients connected? My 6 node cluster has 10 clients

Re: Is there a way to use Ignite optimization and Spark optimization together when using Spark Dataframe API?

2018-09-28 Thread ilya.kasnacheev
Hello! I have indeed try a use case like yours: 0: jdbc:ignite:thin://127.0.0.1/> create index on b(x,y); No rows affected (9,729 seconds) 0: jdbc:ignite:thin://127.0.0.1/> select count(*) from a; COUNT(*) 1 1 row selected (0,017 seconds) 0: jdbc:ignite:thin://127.0.0.1/> select count(*) from

Re: Is there a way to use Ignite optimization and Spark optimization together when using Spark Dataframe API?

2018-09-28 Thread Ray
Here's the detailed information for my join test. 0: jdbc:ignite:thin://sap-datanode6/> select * from a; x 1 y 1 A bearbrick 1 row selected (0.002 seconds) 0: jdbc:ignite:thin://sap-datanode6/> select count(*) from b; COUNT(*) 14337959 1 row selected (0.299 seconds) 0:

Re: Is there a way to use Ignite optimization and Spark optimization together when using Spark Dataframe API?

2018-09-25 Thread vkulichenko
Ray, This sounds suspicious. Please show your configuration and the execution plan for the query. -Val -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Is there a way to use Ignite optimization and Spark optimization together when using Spark Dataframe API?

2018-09-25 Thread Ilya Kasnacheev
Hello! Can you show the index that you are creating here? Regards, -- Ilya Kasnacheev вт, 25 сент. 2018 г. в 8:23, Ray : > Let's say I have two tables I want to join together. > Table a has around 10 millions of rows and it's primary key is x and y. > I have created index on field x and y

Re: Is there a way to use Ignite optimization and Spark optimization together when using Spark Dataframe API?

2018-09-24 Thread Ray
Let's say I have two tables I want to join together. Table a has around 10 millions of rows and it's primary key is x and y. I have created index on field x and y for table a. Table b has one row and it's primary key is x and y. The primary key for that row in table b has a correspondent row in

Re: Is there a way to use Ignite optimization and Spark optimization together when using Spark Dataframe API?

2018-09-20 Thread vkulichenko
If join is indexed and collocated, it still can be pretty fast. Do you have a particular query that is slower with optimization than without? -Val -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Is there a way to use Ignite optimization and Spark optimization together when using Spark Dataframe API?

2018-09-20 Thread Ray
Hi Val, thanks for the reply. I'll try again and let you know if I missed something. By "Ignite is not optimized for join", I mean currently Ignite only supports nest loop join which is very inefficient when joining two large table. Please refer to these two tickets for details.

Re: Is there a way to use Ignite optimization and Spark optimization together when using Spark Dataframe API?

2018-09-19 Thread vkulichenko
Ray, Per my understanding, pushdown filters are propagated to Ignite either way, it's not related to the "optimization". Optimization affects joins, gropings, aggregations, etc. So, unless I'm missing something, the behavior you're looking for is achieved by setting

Re: Is there a way to use Ignite optimization and Spark optimization together when using Spark Dataframe API?

2018-09-19 Thread aealexsandrov
Hi, I am not sure that it will work but you can try next: SparkSession spark = SparkSession .builder() .appName("SomeAppName") .master("spark://10.0.75.1:7077") .config(OPTION_DISABLE_SPARK_SQL_OPTIMIZATION, "false") //or true

Is there a way to use Ignite optimization and Spark optimization together when using Spark Dataframe API?

2018-09-18 Thread Ray
Currently, OPTION_DISABLE_SPARK_SQL_OPTIMIZATION option can only be set on spark session level. It means I can only have Ignite optimization or Spark optimization for one Spark job. Let's say I want to load data into spark memory with pushdown filters using Ignite optimization. For example, I