Hey folks, let me chime in to clarify the matters. What Wen Bo and Mengliao
are really asking for is a combination of two features "partitioning +
sharding".

In the relational world (Postgres, MySQL, Oracle) partitioning is used to
split a large table into smaller tables (called partitions). For example,
imagine you have table PizzaOrders and you want the database to keep
DELIVERED orders in one partition and all the others in a different one.
You can easily achieve this by partitioning the primary PizzaOrders table
by the order status into PizzaOrdersDelivered and PizzaOrdersOther. With
those partitions in place, the SQL engine can apply the partition pruning
optimization [1] and, as Megliao highlighted, it comes with data management
benefits [2].

This has nothing to do with the partitioning in Ignite. What Ignite does is
sharding - distributing table data across a cluster of nodes. Distributed
databases that support both partitioning and sharding do exist and usually,
they are built on Postgres or MySQL (and have nothing to do with in-memory
computing). For instance, YugabyteDB can partition your primary PizzaOrders
table first into PizzaOrdersDelivered and PizzaOrdersOther, and then have
those partitioned tables sharded automatically across the cluster.

As long as Ignite doesn't have the partitioning feature of relational
databases, you have these options:

   1. Use affinity keys in Ignite as you would use partition keys in
   Postgres/MySQL. But remember that all the data that matches an affinity key
   will be stored together on a single Ignite node. It might be a capacity
   problem if there are way too many records that belong to the affinity key.
   2. Implement the Postgres/MySQL-like partitioning at the application
   layer. Create Ignite tables for each logical partition, intercept user
   queries and, depending on the value of a partitioning column, place a
   record in one of the Ignite tables. Then Ignite will take care of the next
   step - sharding.


[1]
https://dmagda.hashnode.dev/optimizing-application-queries-with-partition-pruning
[2]
https://dmagda.hashnode.dev/managing-data-placement-with-table-partitioning

--
Denis

On Wed, Jul 20, 2022 at 6:35 AM Stephen Darlington <
[email protected]> wrote:

> Ignite’s SQL is ANSI 99 compliant. Windowing functions such as PARTITION
> BY came in SQL 2003 (and later). It’s possible that the new Calcite engine (
> sql-calcite <https://ignite.apache.org/docs/latest/SQL/sql-calcite>)
> supports the keywords, but I have not checked.
>
>
>    - While querying we can only scan a small portion of the data to
>    improve performance
>
> As you suggested: indexes.
>
>
>    - Quickly and safely manage data in one partition in particular. For
>    example, in some RDBMS you can build index or compress data for only one
>    partition, or delete one partition without locking other partitions being
>    updated
>
> In general, traditional databases increase performance by grouping related
> stuff together. Ignite increases performance by distributing the data
> across multiple machines, which allows tasks to be parallelised. A
> different architecture results in different solutions.
>
>
>    - Partitioning on multiple columns
>
> That’s an affinity key. But as I noted previously, you don’t want to do
> that if you only have three distinct values.
>
> Regards,
> Stephen
>
> On 18 Jul 2022, at 16:53, Mengliao(Mike) Wang <[email protected]>
> wrote:
>
> Hi Stephen,
>
> What we are looking for is the table partition with SQL in particular,
> instead of the data partition people mostly refer to in Ignite which is
> more from the infrastructure perspective. A.k.a the "PARTITION BY" keyword
> in traditional RDBMS. In the Ignite official document (
> https://ignite.apache.org/docs/latest/SQL/schemas) we didn't see anything
> like that, so not sure if there is anything in Ignite that could achieve
> these:
>
>    - While querying we can only scan a small portion of the data to
>    improve performance
>    - Quickly and safely manage data in one partition in particular. For
>    example, in some RDBMS you can build index or compress data for only one
>    partition, or delete one partition without locking other partitions being
>    updated
>    - Partitioning on multiple columns
>
>
> Thanks
> Mike
>
> On Thu, Jul 14, 2022 at 9:05 AM Stephen Darlington <
> [email protected]> wrote:
>
>> As you say, partitions in Ignite are about the distribution of data. You
>> can group together related data using affinity keys, but if you only have
>> three distinct values that would be a really bad idea. You can’t change the
>> number of partitions after a table has been created.
>>
>> Either of your other solutions would work but, to be honest, I’m not
>> completely sure what problem you’re trying to solve.
>>
>> On 13 Jul 2022, at 19:24, Wen Bo (Bill) Li <[email protected]> wrote:
>>
>> Hi,
>>
>> The traditional RDBMS has the concept of partitioning a table into
>> different chunks, but that isn't really partitioning data to different
>> nodes as described in the Ignite document. Our team is trying to partition
>> a table based on the values of one column and query data based on these
>> values. For example, there are 3 different values in our partitioned
>> column, A, B and C, and we want to get all data that belong to C and don't
>> want to read anything that belong to A and B.
>>
>> We have a few ideas on doing this as indicated below:
>>
>>    - Create separate tables for A, B and C
>>    - Use index for the partitioned column
>>    - Use affinity key for the partitioned column (this is more related
>>    to if the data are on the same node)
>>
>> I am curious if the above 3 approaches are valid or if there is another
>> way to do this? Is it possible to do the ALTER command in the RDBMS to add
>> partitions? Thanks.
>>
>> Regards,
>> Bill
>>
>>
>>
>

Reply via email to