Hi all,

Could someone please explain, in what cases it makes sense to use REPLICATED
cache/table?

I thought I would use REPLICATED tables for small tables which contain
related information.
Like a city table, which maybe has a few 10k entries.

And use PARTITIONED for big tables, like persons table, with millions of
entries.

My hope was, that if the small tables are REPLICATED and so available on
each node.
It would avoid network trafic during table joins and would make queries
faster.

But surprisingly the opposite happend. With one table replicated the join
was slower than with 2 PARTITIONED tables.

Do I think in the wrong way?

small example

table 1 persons: big data set = PARTITIONED
"CREATE TABLE IF NOT EXISTS Persons" +
                        "(id BIGINT, " +
                        "firstname varchar(20), " +
                        "lastname varchar(20), " +
                        "age int, " +
                        "currentcityid int, " +
                        "borncityid int, " +
                        "gender varchar(1), " +
                        "PRIMARY KEY(id))" +
                        "WITH \"template=PARTITIONED, backups=0\"";



table 2 city: small data set=REPLICATED
"CREATE TABLE IF NOT EXISTS city " +
                            "(id int PRIMARY KEY, " +
                            "name varchar(40), " +
                            "lat double, " +
                            "lng double, " +
                            "country varchar(60), " +
                            "capital varchar(10)) " +
                            "WITH \"template=REPLICATED, backups=0\"";

query
SELECT count(*) FROM "cachePerson".person AS p left join "cityCache".city AS
c ON p.CURRENTCITYID=c.id where p.age <50;

is faster when both tables are PARTITIONED.
While my logic telles me it should faster if one is REPLICATED to avoid
network trafic during joins.

can someone help to understand.

br
  David






--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to