Re: questions

narges saleh Fri, 23 Aug 2019 06:22:52 -0700

Hello Ilya

There are parallel streams inserting data for all the countries into
different nodes (and caches) and there are parallel queries against the
distributed database for different countries, aggregating the data, in
some cases inserting back the data, and others returning results. Yes, for
a given query, only one or two caches might get hit. But if the volume of
data for a given city is too big, the query might hit multiple caches; and
hence my question. How do I keep these caches as close as possible to each
other?


What would be some of the ways to minimize the network hops? How can I keep
the data with the same affinity as close as possible to each other,
preferably on the same physical node or neighboring nodes (but across
multiple ignite nodes, and caches)?

Thanks and I am sorry for dragging this.


On Fri, Aug 23, 2019 at 5:19 AM Ilya Kasnacheev <ilya.kasnach...@gmail.com>
wrote:

> Hello!
>
> I don't think that partitioning by country or city is a good idea, since
> this distribution will be very uneven.
>
> You can have different ways of minimizing network hops, while keeping
> distributed nature of your database. Database isn't really distributed when
> for a given city query, only one node is taking all the load and the rest
> is idle.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пт, 23 авг. 2019 г. в 13:15, narges saleh <snarges...@gmail.com>:
>
>> Hello Ilya,
>>  I agree with you that partitioning based on month was a bad example,
>> because most will be idle. Country or customer are better examples of my
>> case. There are limited number of them, but they are disproportionate and
>> they are always active. Let's take the country example. I need to search
>> and aggregate the volume of sales in each city and by country. I have a
>> couple of hundreds countries.
>> Let me ask a basic question.  If my queries/aggregations are based on
>> cities and countries, do I need to partition based on countries (or even
>> cities)?  I want to avoid network hops for my searches and aggregations as
>> much as possible (I do not slow writes either but I am aware of the trade
>> off between read/writes and replication and partitioning). What do I define
>> my affinity key on and what do I partition on?
>>
>> thanks again for your help.
>>
>> On Fri, Aug 23, 2019 at 4:03 AM Ilya Kasnacheev <
>> ilya.kasnach...@gmail.com> wrote:
>>
>>> Hello!
>>>
>>> Partitioning based on let's say user id is usually fair, because there
>>> usually are 100,000ths of users and neither of those owns disproportionate
>>> amount of data.
>>>
>>> Partitioning by month is especially bad, since in a given months, all of
>>> partitions will be basically idle save for one, and there would be a lot of
>>> contention.
>>>
>>> Regards,
>>> --
>>> Ilya Kasnacheev
>>>
>>>
>>> чт, 22 авг. 2019 г. в 19:31, narges saleh <snarges...@gmail.com>:
>>>
>>>> I am not sure you can find real world examples where caches can be
>>>> evenly partitioned, if the partitioning factor is an affinity key. I
>>>> comparing, with partitioning case with relational databases, say
>>>> partitioning based on month of the year. I definitely don't have 100s of
>>>> departments but I do have 10s of departments, but departments are very
>>>> disproportional in size.
>>>> As for rebalancing case, the pods will be added to the system as the
>>>> volume increases, so I'd assume that would prompt ignite to rebalance.
>>>>
>>>> On Thu, Aug 22, 2019 at 11:00 AM Ilya Kasnacheev <
>>>> ilya.kasnach...@gmail.com> wrote:
>>>>
>>>>> Hello!
>>>>>
>>>>> 1) No. Ignite only rebalances data when nodes are joining or leaving
>>>>> cluster.
>>>>> 2) Ignite's affinity is not really well suited to such detailed manual
>>>>> assignment. It is assumed that your cache has large number of partitions
>>>>> (e.g. 1024) and data is distributed evenly between all partitions. Having
>>>>> department as affinity key is suboptimal because there's not many
>>>>> departments and they usually vary in size. That's the kind of distribution
>>>>> that you want to avoid.
>>>>>
>>>>> Regards,
>>>>> --
>>>>> Ilya Kasnacheev
>>>>>
>>>>>
>>>>> чт, 22 авг. 2019 г. в 18:37, narges saleh <snarges...@gmail.com>:
>>>>>
>>>>>> Thanks Ilya for replies.
>>>>>> 1)  Doesn't ignite rebalance the nodes if there are additional nodes
>>>>>> available and the data doesn't fit the cache current ignite node? 
>>>>>> Consider
>>>>>> a scenario where I have 100 pods on a physical node, assuming pod = 
>>>>>> ignite
>>>>>> node.
>>>>>> 2)  I am not sure what you mean by confining half of cache to one
>>>>>> cluster and another half to another node. If my affinity key is 
>>>>>> department
>>>>>> id, why can't I have department A on a partitioned cache, one partition 
>>>>>> on
>>>>>> one node in cluster A, and the other partition on another node on another
>>>>>> cluster.
>>>>>>
>>>>>> I might be misunderstanding the whole, and I'd appreciate
>>>>>> clarification.
>>>>>>
>>>>>> On Thu, Aug 22, 2019 at 6:52 AM Ilya Kasnacheev <
>>>>>> ilya.kasnach...@gmail.com> wrote:
>>>>>>
>>>>>>> Hello!
>>>>>>>
>>>>>>> 1) When there is an overflow, either page eviction kicks in, or, if
>>>>>>> it is disabled, you get an IgniteOOM, after which the node is no longer
>>>>>>> usable. Please avoid overflowing any data regions since there's no 
>>>>>>> graceful
>>>>>>> handling currently.
>>>>>>> 2) I don't think so. You can't easily confine half of cache's data
>>>>>>> to one cluster group and another half to other group.
>>>>>>>
>>>>>>> Such scenarios are not recommended. We expect that all partitions
>>>>>>> have same amount of data. Not that there are a few gargantuan partitions
>>>>>>> that don't fit in a single node.
>>>>>>>
>>>>>>> Regards,
>>>>>>> --
>>>>>>> Ilya Kasnacheev
>>>>>>>
>>>>>>>
>>>>>>> вт, 20 авг. 2019 г. в 06:29, narges saleh <snarges...@gmail.com>:
>>>>>>>
>>>>>>>> Hello All,
>>>>>>>>
>>>>>>>> I'd appreciate your answers to my questions.
>>>>>>>>
>>>>>>>> 1) Assuming I use affinity key among 4 caches, and they all end up
>>>>>>>> on the same ignite node. What happens where is an overflow? Does the
>>>>>>>> overflow data end up on a joined node? How do I keep the related data 
>>>>>>>> from
>>>>>>>> all the caches close to each other when the volume of exceeds a single 
>>>>>>>> node?
>>>>>>>>
>>>>>>>> 2) Is there a concept of cluster affinity, meaning having a cluster
>>>>>>>> group defined based on some affinity key? For example, if I have two
>>>>>>>> departments A and B, can I have a cluster group for department A and
>>>>>>>> another for department B?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Narges
>>>>>>>>
>>>>>>>

Re: questions

Reply via email to