Hey folks, let me chime in to clarify the matters. What Wen Bo and Mengliao are really asking for is a combination of two features "partitioning + sharding".
In the relational world (Postgres, MySQL, Oracle) partitioning is used to split a large table into smaller tables (called partitions). For example, imagine you have table PizzaOrders and you want the database to keep DELIVERED orders in one partition and all the others in a different one. You can easily achieve this by partitioning the primary PizzaOrders table by the order status into PizzaOrdersDelivered and PizzaOrdersOther. With those partitions in place, the SQL engine can apply the partition pruning optimization [1] and, as Megliao highlighted, it comes with data management benefits [2]. This has nothing to do with the partitioning in Ignite. What Ignite does is sharding - distributing table data across a cluster of nodes. Distributed databases that support both partitioning and sharding do exist and usually, they are built on Postgres or MySQL (and have nothing to do with in-memory computing). For instance, YugabyteDB can partition your primary PizzaOrders table first into PizzaOrdersDelivered and PizzaOrdersOther, and then have those partitioned tables sharded automatically across the cluster. As long as Ignite doesn't have the partitioning feature of relational databases, you have these options: 1. Use affinity keys in Ignite as you would use partition keys in Postgres/MySQL. But remember that all the data that matches an affinity key will be stored together on a single Ignite node. It might be a capacity problem if there are way too many records that belong to the affinity key. 2. Implement the Postgres/MySQL-like partitioning at the application layer. Create Ignite tables for each logical partition, intercept user queries and, depending on the value of a partitioning column, place a record in one of the Ignite tables. Then Ignite will take care of the next step - sharding. [1] https://dmagda.hashnode.dev/optimizing-application-queries-with-partition-pruning [2] https://dmagda.hashnode.dev/managing-data-placement-with-table-partitioning -- Denis On Wed, Jul 20, 2022 at 6:35 AM Stephen Darlington < [email protected]> wrote: > Ignite’s SQL is ANSI 99 compliant. Windowing functions such as PARTITION > BY came in SQL 2003 (and later). It’s possible that the new Calcite engine ( > sql-calcite <https://ignite.apache.org/docs/latest/SQL/sql-calcite>) > supports the keywords, but I have not checked. > > > - While querying we can only scan a small portion of the data to > improve performance > > As you suggested: indexes. > > > - Quickly and safely manage data in one partition in particular. For > example, in some RDBMS you can build index or compress data for only one > partition, or delete one partition without locking other partitions being > updated > > In general, traditional databases increase performance by grouping related > stuff together. Ignite increases performance by distributing the data > across multiple machines, which allows tasks to be parallelised. A > different architecture results in different solutions. > > > - Partitioning on multiple columns > > That’s an affinity key. But as I noted previously, you don’t want to do > that if you only have three distinct values. > > Regards, > Stephen > > On 18 Jul 2022, at 16:53, Mengliao(Mike) Wang <[email protected]> > wrote: > > Hi Stephen, > > What we are looking for is the table partition with SQL in particular, > instead of the data partition people mostly refer to in Ignite which is > more from the infrastructure perspective. A.k.a the "PARTITION BY" keyword > in traditional RDBMS. In the Ignite official document ( > https://ignite.apache.org/docs/latest/SQL/schemas) we didn't see anything > like that, so not sure if there is anything in Ignite that could achieve > these: > > - While querying we can only scan a small portion of the data to > improve performance > - Quickly and safely manage data in one partition in particular. For > example, in some RDBMS you can build index or compress data for only one > partition, or delete one partition without locking other partitions being > updated > - Partitioning on multiple columns > > > Thanks > Mike > > On Thu, Jul 14, 2022 at 9:05 AM Stephen Darlington < > [email protected]> wrote: > >> As you say, partitions in Ignite are about the distribution of data. You >> can group together related data using affinity keys, but if you only have >> three distinct values that would be a really bad idea. You can’t change the >> number of partitions after a table has been created. >> >> Either of your other solutions would work but, to be honest, I’m not >> completely sure what problem you’re trying to solve. >> >> On 13 Jul 2022, at 19:24, Wen Bo (Bill) Li <[email protected]> wrote: >> >> Hi, >> >> The traditional RDBMS has the concept of partitioning a table into >> different chunks, but that isn't really partitioning data to different >> nodes as described in the Ignite document. Our team is trying to partition >> a table based on the values of one column and query data based on these >> values. For example, there are 3 different values in our partitioned >> column, A, B and C, and we want to get all data that belong to C and don't >> want to read anything that belong to A and B. >> >> We have a few ideas on doing this as indicated below: >> >> - Create separate tables for A, B and C >> - Use index for the partitioned column >> - Use affinity key for the partitioned column (this is more related >> to if the data are on the same node) >> >> I am curious if the above 3 approaches are valid or if there is another >> way to do this? Is it possible to do the ALTER command in the RDBMS to add >> partitions? Thanks. >> >> Regards, >> Bill >> >> >> >
