Hi Denis and Stephen,

Thanks a lot for the great information and suggestions. I think we are
clear on the directions we are heading, but just one more question on the
calcite engine Stephen mentioned. From what I read this is replacing the
current H2-based engine starting from Ignite 2.13, and from this official
document <https://calcite.apache.org/docs/reference.html> from Calcite I
can see that the "PARTITION BY" keyword is supported by calcite. However I
don't see it on the Ignite DDL document
<https://ignite.apache.org/docs/latest/sql-reference/ddl>even for 2.13
which has the calcite engine. Does that mean not all the features from
calcite would be enabled/supported by Ignite, or if that is simply missing
from the documentation?

Thanks
Mike


On Wed, Jul 20, 2022 at 10:25 AM Denis Magda <dma...@apache.org> wrote:

> Hey folks, let me chime in to clarify the matters. What Wen Bo and
> Mengliao are really asking for is a combination of two features
> "partitioning + sharding".
>
> In the relational world (Postgres, MySQL, Oracle) partitioning is used to
> split a large table into smaller tables (called partitions). For example,
> imagine you have table PizzaOrders and you want the database to keep
> DELIVERED orders in one partition and all the others in a different one.
> You can easily achieve this by partitioning the primary PizzaOrders table
> by the order status into PizzaOrdersDelivered and PizzaOrdersOther. With
> those partitions in place, the SQL engine can apply the partition pruning
> optimization [1] and, as Megliao highlighted, it comes with data management
> benefits [2].
>
> This has nothing to do with the partitioning in Ignite. What Ignite does
> is sharding - distributing table data across a cluster of nodes.
> Distributed databases that support both partitioning and sharding do exist
> and usually, they are built on Postgres or MySQL (and have nothing to do
> with in-memory computing). For instance, YugabyteDB can partition your
> primary PizzaOrders table first into PizzaOrdersDelivered and
> PizzaOrdersOther, and then have those partitioned tables sharded
> automatically across the cluster.
>
> As long as Ignite doesn't have the partitioning feature of relational
> databases, you have these options:
>
>    1. Use affinity keys in Ignite as you would use partition keys in
>    Postgres/MySQL. But remember that all the data that matches an affinity key
>    will be stored together on a single Ignite node. It might be a capacity
>    problem if there are way too many records that belong to the affinity key.
>    2. Implement the Postgres/MySQL-like partitioning at the application
>    layer. Create Ignite tables for each logical partition, intercept user
>    queries and, depending on the value of a partitioning column, place a
>    record in one of the Ignite tables. Then Ignite will take care of the next
>    step - sharding.
>
>
> [1]
> https://dmagda.hashnode.dev/optimizing-application-queries-with-partition-pruning
> [2]
> https://dmagda.hashnode.dev/managing-data-placement-with-table-partitioning
>
> --
> Denis
>
> On Wed, Jul 20, 2022 at 6:35 AM Stephen Darlington <
> stephen.darling...@gridgain.com> wrote:
>
>> Ignite’s SQL is ANSI 99 compliant. Windowing functions such as PARTITION
>> BY came in SQL 2003 (and later). It’s possible that the new Calcite engine (
>> sql-calcite <https://ignite.apache.org/docs/latest/SQL/sql-calcite>)
>> supports the keywords, but I have not checked.
>>
>>
>>    - While querying we can only scan a small portion of the data to
>>    improve performance
>>
>> As you suggested: indexes.
>>
>>
>>    - Quickly and safely manage data in one partition in particular. For
>>    example, in some RDBMS you can build index or compress data for only one
>>    partition, or delete one partition without locking other partitions being
>>    updated
>>
>> In general, traditional databases increase performance by grouping
>> related stuff together. Ignite increases performance by distributing the
>> data across multiple machines, which allows tasks to be parallelised. A
>> different architecture results in different solutions.
>>
>>
>>    - Partitioning on multiple columns
>>
>> That’s an affinity key. But as I noted previously, you don’t want to do
>> that if you only have three distinct values.
>>
>> Regards,
>> Stephen
>>
>> On 18 Jul 2022, at 16:53, Mengliao(Mike) Wang <mengliaow...@geotab.com>
>> wrote:
>>
>> Hi Stephen,
>>
>> What we are looking for is the table partition with SQL in particular,
>> instead of the data partition people mostly refer to in Ignite which is
>> more from the infrastructure perspective. A.k.a the "PARTITION BY" keyword
>> in traditional RDBMS. In the Ignite official document (
>> https://ignite.apache.org/docs/latest/SQL/schemas) we didn't see
>> anything like that, so not sure if there is anything in Ignite that could
>> achieve these:
>>
>>    - While querying we can only scan a small portion of the data to
>>    improve performance
>>    - Quickly and safely manage data in one partition in particular. For
>>    example, in some RDBMS you can build index or compress data for only one
>>    partition, or delete one partition without locking other partitions being
>>    updated
>>    - Partitioning on multiple columns
>>
>>
>> Thanks
>> Mike
>>
>> On Thu, Jul 14, 2022 at 9:05 AM Stephen Darlington <
>> stephen.darling...@gridgain.com> wrote:
>>
>>> As you say, partitions in Ignite are about the distribution of data. You
>>> can group together related data using affinity keys, but if you only have
>>> three distinct values that would be a really bad idea. You can’t change the
>>> number of partitions after a table has been created.
>>>
>>> Either of your other solutions would work but, to be honest, I’m not
>>> completely sure what problem you’re trying to solve.
>>>
>>> On 13 Jul 2022, at 19:24, Wen Bo (Bill) Li <wenb...@geotab.com> wrote:
>>>
>>> Hi,
>>>
>>> The traditional RDBMS has the concept of partitioning a table into
>>> different chunks, but that isn't really partitioning data to different
>>> nodes as described in the Ignite document. Our team is trying to partition
>>> a table based on the values of one column and query data based on these
>>> values. For example, there are 3 different values in our partitioned
>>> column, A, B and C, and we want to get all data that belong to C and don't
>>> want to read anything that belong to A and B.
>>>
>>> We have a few ideas on doing this as indicated below:
>>>
>>>    - Create separate tables for A, B and C
>>>    - Use index for the partitioned column
>>>    - Use affinity key for the partitioned column (this is more related
>>>    to if the data are on the same node)
>>>
>>> I am curious if the above 3 approaches are valid or if there is another
>>> way to do this? Is it possible to do the ALTER command in the RDBMS to add
>>> partitions? Thanks.
>>>
>>> Regards,
>>> Bill
>>>
>>>
>>>
>>

Reply via email to