Hello!

It's impossible to answer this question without going into specifics of
your use case, which I don't have. Maybe you have a case to show?

Regards,
-- 
Ilya Kasnacheev


пт, 23 авг. 2019 г. в 16:22, narges saleh <snarges...@gmail.com>:

> Hello Ilya
>
> There are parallel streams inserting data for all the countries into
> different nodes (and caches) and there are parallel queries against the
> distributed database for different countries, aggregating the data, in
> some cases inserting back the data, and others returning results. Yes, for
> a given query, only one or two caches might get hit. But if the volume of
> data for a given city is too big, the query might hit multiple caches; and
> hence my question. How do I keep these caches as close as possible to each
> other?
>
> What would be some of the ways to minimize the network hops? How can I
> keep the data with the same affinity as close as possible to each other,
> preferably on the same physical node or neighboring nodes (but across
> multiple ignite nodes, and caches)?
>
> Thanks and I am sorry for dragging this.
>
>
> On Fri, Aug 23, 2019 at 5:19 AM Ilya Kasnacheev <ilya.kasnach...@gmail.com>
> wrote:
>
>> Hello!
>>
>> I don't think that partitioning by country or city is a good idea, since
>> this distribution will be very uneven.
>>
>> You can have different ways of minimizing network hops, while keeping
>> distributed nature of your database. Database isn't really distributed when
>> for a given city query, only one node is taking all the load and the rest
>> is idle.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> пт, 23 авг. 2019 г. в 13:15, narges saleh <snarges...@gmail.com>:
>>
>>> Hello Ilya,
>>>  I agree with you that partitioning based on month was a bad example,
>>> because most will be idle. Country or customer are better examples of my
>>> case. There are limited number of them, but they are disproportionate and
>>> they are always active. Let's take the country example. I need to search
>>> and aggregate the volume of sales in each city and by country. I have a
>>> couple of hundreds countries.
>>> Let me ask a basic question.  If my queries/aggregations are based on
>>> cities and countries, do I need to partition based on countries (or even
>>> cities)?  I want to avoid network hops for my searches and aggregations as
>>> much as possible (I do not slow writes either but I am aware of the trade
>>> off between read/writes and replication and partitioning). What do I define
>>> my affinity key on and what do I partition on?
>>>
>>> thanks again for your help.
>>>
>>> On Fri, Aug 23, 2019 at 4:03 AM Ilya Kasnacheev <
>>> ilya.kasnach...@gmail.com> wrote:
>>>
>>>> Hello!
>>>>
>>>> Partitioning based on let's say user id is usually fair, because there
>>>> usually are 100,000ths of users and neither of those owns disproportionate
>>>> amount of data.
>>>>
>>>> Partitioning by month is especially bad, since in a given months, all
>>>> of partitions will be basically idle save for one, and there would be a lot
>>>> of contention.
>>>>
>>>> Regards,
>>>> --
>>>> Ilya Kasnacheev
>>>>
>>>>
>>>> чт, 22 авг. 2019 г. в 19:31, narges saleh <snarges...@gmail.com>:
>>>>
>>>>> I am not sure you can find real world examples where caches can be
>>>>> evenly partitioned, if the partitioning factor is an affinity key. I
>>>>> comparing, with partitioning case with relational databases, say
>>>>> partitioning based on month of the year. I definitely don't have 100s of
>>>>> departments but I do have 10s of departments, but departments are very
>>>>> disproportional in size.
>>>>> As for rebalancing case, the pods will be added to the system as the
>>>>> volume increases, so I'd assume that would prompt ignite to rebalance.
>>>>>
>>>>> On Thu, Aug 22, 2019 at 11:00 AM Ilya Kasnacheev <
>>>>> ilya.kasnach...@gmail.com> wrote:
>>>>>
>>>>>> Hello!
>>>>>>
>>>>>> 1) No. Ignite only rebalances data when nodes are joining or leaving
>>>>>> cluster.
>>>>>> 2) Ignite's affinity is not really well suited to such detailed
>>>>>> manual assignment. It is assumed that your cache has large number of
>>>>>> partitions (e.g. 1024) and data is distributed evenly between all
>>>>>> partitions. Having department as affinity key is suboptimal because 
>>>>>> there's
>>>>>> not many departments and they usually vary in size. That's the kind of
>>>>>> distribution that you want to avoid.
>>>>>>
>>>>>> Regards,
>>>>>> --
>>>>>> Ilya Kasnacheev
>>>>>>
>>>>>>
>>>>>> чт, 22 авг. 2019 г. в 18:37, narges saleh <snarges...@gmail.com>:
>>>>>>
>>>>>>> Thanks Ilya for replies.
>>>>>>> 1)  Doesn't ignite rebalance the nodes if there are additional nodes
>>>>>>> available and the data doesn't fit the cache current ignite node? 
>>>>>>> Consider
>>>>>>> a scenario where I have 100 pods on a physical node, assuming pod = 
>>>>>>> ignite
>>>>>>> node.
>>>>>>> 2)  I am not sure what you mean by confining half of cache to one
>>>>>>> cluster and another half to another node. If my affinity key is 
>>>>>>> department
>>>>>>> id, why can't I have department A on a partitioned cache, one partition 
>>>>>>> on
>>>>>>> one node in cluster A, and the other partition on another node on 
>>>>>>> another
>>>>>>> cluster.
>>>>>>>
>>>>>>> I might be misunderstanding the whole, and I'd appreciate
>>>>>>> clarification.
>>>>>>>
>>>>>>> On Thu, Aug 22, 2019 at 6:52 AM Ilya Kasnacheev <
>>>>>>> ilya.kasnach...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello!
>>>>>>>>
>>>>>>>> 1) When there is an overflow, either page eviction kicks in, or, if
>>>>>>>> it is disabled, you get an IgniteOOM, after which the node is no longer
>>>>>>>> usable. Please avoid overflowing any data regions since there's no 
>>>>>>>> graceful
>>>>>>>> handling currently.
>>>>>>>> 2) I don't think so. You can't easily confine half of cache's data
>>>>>>>> to one cluster group and another half to other group.
>>>>>>>>
>>>>>>>> Such scenarios are not recommended. We expect that all partitions
>>>>>>>> have same amount of data. Not that there are a few gargantuan 
>>>>>>>> partitions
>>>>>>>> that don't fit in a single node.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> --
>>>>>>>> Ilya Kasnacheev
>>>>>>>>
>>>>>>>>
>>>>>>>> вт, 20 авг. 2019 г. в 06:29, narges saleh <snarges...@gmail.com>:
>>>>>>>>
>>>>>>>>> Hello All,
>>>>>>>>>
>>>>>>>>> I'd appreciate your answers to my questions.
>>>>>>>>>
>>>>>>>>> 1) Assuming I use affinity key among 4 caches, and they all end up
>>>>>>>>> on the same ignite node. What happens where is an overflow? Does the
>>>>>>>>> overflow data end up on a joined node? How do I keep the related data 
>>>>>>>>> from
>>>>>>>>> all the caches close to each other when the volume of exceeds a 
>>>>>>>>> single node?
>>>>>>>>>
>>>>>>>>> 2) Is there a concept of cluster affinity, meaning having a
>>>>>>>>> cluster group defined based on some affinity key? For example, if I 
>>>>>>>>> have
>>>>>>>>> two departments A and B, can I have a cluster group for department A 
>>>>>>>>> and
>>>>>>>>> another for department B?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Narges
>>>>>>>>>
>>>>>>>>

Reply via email to