Hello! It's impossible to answer this question without going into specifics of your use case, which I don't have. Maybe you have a case to show?
Regards, -- Ilya Kasnacheev пт, 23 авг. 2019 г. в 16:22, narges saleh <snarges...@gmail.com>: > Hello Ilya > > There are parallel streams inserting data for all the countries into > different nodes (and caches) and there are parallel queries against the > distributed database for different countries, aggregating the data, in > some cases inserting back the data, and others returning results. Yes, for > a given query, only one or two caches might get hit. But if the volume of > data for a given city is too big, the query might hit multiple caches; and > hence my question. How do I keep these caches as close as possible to each > other? > > What would be some of the ways to minimize the network hops? How can I > keep the data with the same affinity as close as possible to each other, > preferably on the same physical node or neighboring nodes (but across > multiple ignite nodes, and caches)? > > Thanks and I am sorry for dragging this. > > > On Fri, Aug 23, 2019 at 5:19 AM Ilya Kasnacheev <ilya.kasnach...@gmail.com> > wrote: > >> Hello! >> >> I don't think that partitioning by country or city is a good idea, since >> this distribution will be very uneven. >> >> You can have different ways of minimizing network hops, while keeping >> distributed nature of your database. Database isn't really distributed when >> for a given city query, only one node is taking all the load and the rest >> is idle. >> >> Regards, >> -- >> Ilya Kasnacheev >> >> >> пт, 23 авг. 2019 г. в 13:15, narges saleh <snarges...@gmail.com>: >> >>> Hello Ilya, >>> I agree with you that partitioning based on month was a bad example, >>> because most will be idle. Country or customer are better examples of my >>> case. There are limited number of them, but they are disproportionate and >>> they are always active. Let's take the country example. I need to search >>> and aggregate the volume of sales in each city and by country. I have a >>> couple of hundreds countries. >>> Let me ask a basic question. If my queries/aggregations are based on >>> cities and countries, do I need to partition based on countries (or even >>> cities)? I want to avoid network hops for my searches and aggregations as >>> much as possible (I do not slow writes either but I am aware of the trade >>> off between read/writes and replication and partitioning). What do I define >>> my affinity key on and what do I partition on? >>> >>> thanks again for your help. >>> >>> On Fri, Aug 23, 2019 at 4:03 AM Ilya Kasnacheev < >>> ilya.kasnach...@gmail.com> wrote: >>> >>>> Hello! >>>> >>>> Partitioning based on let's say user id is usually fair, because there >>>> usually are 100,000ths of users and neither of those owns disproportionate >>>> amount of data. >>>> >>>> Partitioning by month is especially bad, since in a given months, all >>>> of partitions will be basically idle save for one, and there would be a lot >>>> of contention. >>>> >>>> Regards, >>>> -- >>>> Ilya Kasnacheev >>>> >>>> >>>> чт, 22 авг. 2019 г. в 19:31, narges saleh <snarges...@gmail.com>: >>>> >>>>> I am not sure you can find real world examples where caches can be >>>>> evenly partitioned, if the partitioning factor is an affinity key. I >>>>> comparing, with partitioning case with relational databases, say >>>>> partitioning based on month of the year. I definitely don't have 100s of >>>>> departments but I do have 10s of departments, but departments are very >>>>> disproportional in size. >>>>> As for rebalancing case, the pods will be added to the system as the >>>>> volume increases, so I'd assume that would prompt ignite to rebalance. >>>>> >>>>> On Thu, Aug 22, 2019 at 11:00 AM Ilya Kasnacheev < >>>>> ilya.kasnach...@gmail.com> wrote: >>>>> >>>>>> Hello! >>>>>> >>>>>> 1) No. Ignite only rebalances data when nodes are joining or leaving >>>>>> cluster. >>>>>> 2) Ignite's affinity is not really well suited to such detailed >>>>>> manual assignment. It is assumed that your cache has large number of >>>>>> partitions (e.g. 1024) and data is distributed evenly between all >>>>>> partitions. Having department as affinity key is suboptimal because >>>>>> there's >>>>>> not many departments and they usually vary in size. That's the kind of >>>>>> distribution that you want to avoid. >>>>>> >>>>>> Regards, >>>>>> -- >>>>>> Ilya Kasnacheev >>>>>> >>>>>> >>>>>> чт, 22 авг. 2019 г. в 18:37, narges saleh <snarges...@gmail.com>: >>>>>> >>>>>>> Thanks Ilya for replies. >>>>>>> 1) Doesn't ignite rebalance the nodes if there are additional nodes >>>>>>> available and the data doesn't fit the cache current ignite node? >>>>>>> Consider >>>>>>> a scenario where I have 100 pods on a physical node, assuming pod = >>>>>>> ignite >>>>>>> node. >>>>>>> 2) I am not sure what you mean by confining half of cache to one >>>>>>> cluster and another half to another node. If my affinity key is >>>>>>> department >>>>>>> id, why can't I have department A on a partitioned cache, one partition >>>>>>> on >>>>>>> one node in cluster A, and the other partition on another node on >>>>>>> another >>>>>>> cluster. >>>>>>> >>>>>>> I might be misunderstanding the whole, and I'd appreciate >>>>>>> clarification. >>>>>>> >>>>>>> On Thu, Aug 22, 2019 at 6:52 AM Ilya Kasnacheev < >>>>>>> ilya.kasnach...@gmail.com> wrote: >>>>>>> >>>>>>>> Hello! >>>>>>>> >>>>>>>> 1) When there is an overflow, either page eviction kicks in, or, if >>>>>>>> it is disabled, you get an IgniteOOM, after which the node is no longer >>>>>>>> usable. Please avoid overflowing any data regions since there's no >>>>>>>> graceful >>>>>>>> handling currently. >>>>>>>> 2) I don't think so. You can't easily confine half of cache's data >>>>>>>> to one cluster group and another half to other group. >>>>>>>> >>>>>>>> Such scenarios are not recommended. We expect that all partitions >>>>>>>> have same amount of data. Not that there are a few gargantuan >>>>>>>> partitions >>>>>>>> that don't fit in a single node. >>>>>>>> >>>>>>>> Regards, >>>>>>>> -- >>>>>>>> Ilya Kasnacheev >>>>>>>> >>>>>>>> >>>>>>>> вт, 20 авг. 2019 г. в 06:29, narges saleh <snarges...@gmail.com>: >>>>>>>> >>>>>>>>> Hello All, >>>>>>>>> >>>>>>>>> I'd appreciate your answers to my questions. >>>>>>>>> >>>>>>>>> 1) Assuming I use affinity key among 4 caches, and they all end up >>>>>>>>> on the same ignite node. What happens where is an overflow? Does the >>>>>>>>> overflow data end up on a joined node? How do I keep the related data >>>>>>>>> from >>>>>>>>> all the caches close to each other when the volume of exceeds a >>>>>>>>> single node? >>>>>>>>> >>>>>>>>> 2) Is there a concept of cluster affinity, meaning having a >>>>>>>>> cluster group defined based on some affinity key? For example, if I >>>>>>>>> have >>>>>>>>> two departments A and B, can I have a cluster group for department A >>>>>>>>> and >>>>>>>>> another for department B? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Narges >>>>>>>>> >>>>>>>>