Hi Denis and Stephen, Thanks a lot for the great information and suggestions. I think we are clear on the directions we are heading, but just one more question on the calcite engine Stephen mentioned. From what I read this is replacing the current H2-based engine starting from Ignite 2.13, and from this official document <https://calcite.apache.org/docs/reference.html> from Calcite I can see that the "PARTITION BY" keyword is supported by calcite. However I don't see it on the Ignite DDL document <https://ignite.apache.org/docs/latest/sql-reference/ddl>even for 2.13 which has the calcite engine. Does that mean not all the features from calcite would be enabled/supported by Ignite, or if that is simply missing from the documentation?
Thanks Mike On Wed, Jul 20, 2022 at 10:25 AM Denis Magda <dma...@apache.org> wrote: > Hey folks, let me chime in to clarify the matters. What Wen Bo and > Mengliao are really asking for is a combination of two features > "partitioning + sharding". > > In the relational world (Postgres, MySQL, Oracle) partitioning is used to > split a large table into smaller tables (called partitions). For example, > imagine you have table PizzaOrders and you want the database to keep > DELIVERED orders in one partition and all the others in a different one. > You can easily achieve this by partitioning the primary PizzaOrders table > by the order status into PizzaOrdersDelivered and PizzaOrdersOther. With > those partitions in place, the SQL engine can apply the partition pruning > optimization [1] and, as Megliao highlighted, it comes with data management > benefits [2]. > > This has nothing to do with the partitioning in Ignite. What Ignite does > is sharding - distributing table data across a cluster of nodes. > Distributed databases that support both partitioning and sharding do exist > and usually, they are built on Postgres or MySQL (and have nothing to do > with in-memory computing). For instance, YugabyteDB can partition your > primary PizzaOrders table first into PizzaOrdersDelivered and > PizzaOrdersOther, and then have those partitioned tables sharded > automatically across the cluster. > > As long as Ignite doesn't have the partitioning feature of relational > databases, you have these options: > > 1. Use affinity keys in Ignite as you would use partition keys in > Postgres/MySQL. But remember that all the data that matches an affinity key > will be stored together on a single Ignite node. It might be a capacity > problem if there are way too many records that belong to the affinity key. > 2. Implement the Postgres/MySQL-like partitioning at the application > layer. Create Ignite tables for each logical partition, intercept user > queries and, depending on the value of a partitioning column, place a > record in one of the Ignite tables. Then Ignite will take care of the next > step - sharding. > > > [1] > https://dmagda.hashnode.dev/optimizing-application-queries-with-partition-pruning > [2] > https://dmagda.hashnode.dev/managing-data-placement-with-table-partitioning > > -- > Denis > > On Wed, Jul 20, 2022 at 6:35 AM Stephen Darlington < > stephen.darling...@gridgain.com> wrote: > >> Ignite’s SQL is ANSI 99 compliant. Windowing functions such as PARTITION >> BY came in SQL 2003 (and later). It’s possible that the new Calcite engine ( >> sql-calcite <https://ignite.apache.org/docs/latest/SQL/sql-calcite>) >> supports the keywords, but I have not checked. >> >> >> - While querying we can only scan a small portion of the data to >> improve performance >> >> As you suggested: indexes. >> >> >> - Quickly and safely manage data in one partition in particular. For >> example, in some RDBMS you can build index or compress data for only one >> partition, or delete one partition without locking other partitions being >> updated >> >> In general, traditional databases increase performance by grouping >> related stuff together. Ignite increases performance by distributing the >> data across multiple machines, which allows tasks to be parallelised. A >> different architecture results in different solutions. >> >> >> - Partitioning on multiple columns >> >> That’s an affinity key. But as I noted previously, you don’t want to do >> that if you only have three distinct values. >> >> Regards, >> Stephen >> >> On 18 Jul 2022, at 16:53, Mengliao(Mike) Wang <mengliaow...@geotab.com> >> wrote: >> >> Hi Stephen, >> >> What we are looking for is the table partition with SQL in particular, >> instead of the data partition people mostly refer to in Ignite which is >> more from the infrastructure perspective. A.k.a the "PARTITION BY" keyword >> in traditional RDBMS. In the Ignite official document ( >> https://ignite.apache.org/docs/latest/SQL/schemas) we didn't see >> anything like that, so not sure if there is anything in Ignite that could >> achieve these: >> >> - While querying we can only scan a small portion of the data to >> improve performance >> - Quickly and safely manage data in one partition in particular. For >> example, in some RDBMS you can build index or compress data for only one >> partition, or delete one partition without locking other partitions being >> updated >> - Partitioning on multiple columns >> >> >> Thanks >> Mike >> >> On Thu, Jul 14, 2022 at 9:05 AM Stephen Darlington < >> stephen.darling...@gridgain.com> wrote: >> >>> As you say, partitions in Ignite are about the distribution of data. You >>> can group together related data using affinity keys, but if you only have >>> three distinct values that would be a really bad idea. You can’t change the >>> number of partitions after a table has been created. >>> >>> Either of your other solutions would work but, to be honest, I’m not >>> completely sure what problem you’re trying to solve. >>> >>> On 13 Jul 2022, at 19:24, Wen Bo (Bill) Li <wenb...@geotab.com> wrote: >>> >>> Hi, >>> >>> The traditional RDBMS has the concept of partitioning a table into >>> different chunks, but that isn't really partitioning data to different >>> nodes as described in the Ignite document. Our team is trying to partition >>> a table based on the values of one column and query data based on these >>> values. For example, there are 3 different values in our partitioned >>> column, A, B and C, and we want to get all data that belong to C and don't >>> want to read anything that belong to A and B. >>> >>> We have a few ideas on doing this as indicated below: >>> >>> - Create separate tables for A, B and C >>> - Use index for the partitioned column >>> - Use affinity key for the partitioned column (this is more related >>> to if the data are on the same node) >>> >>> I am curious if the above 3 approaches are valid or if there is another >>> way to do this? Is it possible to do the ALTER command in the RDBMS to add >>> partitions? Thanks. >>> >>> Regards, >>> Bill >>> >>> >>> >>