>>> Ferenc Wágner <wf...@niif.hu> schrieb am 18.04.2017 um 18:46 in Nachricht <87tw5l64v0....@lant.ki.iif.hu>: > Ken Gaillot <kgail...@redhat.com> writes: > >> On 04/13/2017 11:11 AM, Ferenc Wágner wrote: >> >>> I encountered several (old) statements on various forums along the lines >>> of: "the CIB is not a transactional database and shouldn't be used as >>> one" or "resource parameters should only uniquely identify a resource, >>> not configure it" and "the CIB was not designed to be a configuration >>> database but people still use it that way". Sorry if I misquote these, >>> I go by my memories now, I failed to dig up the links by a quick try. >>> >>> Well, I've been feeling guilty in the above offenses for years, but it >>> worked out pretty well that way which helped to suppress these warnings >>> in the back of my head. Still, I'm curious: what's the reason for these >>> warnings, what are the dangers of "abusing" the CIB this way? >>> /var/lib/pacemaker/cib/cib.xml is 336 kB with 6 nodes and 155 resources >>> configured. Old Pacemaker versions required tuning PCMK_ipc_buffer to >>> handle this, but even the default is big enough nowadays (128 kB after >>> compression, I guess). >>> >>> Am I walking on thin ice? What should I look out for? >> >> That's a good question. Certainly, there is some configuration >> information in most resource definitions, so it's more a matter of degree. >> >> The main concerns I can think of are: >> >> 1. Size: Increasing the CIB size increases the I/O, CPU and networking >> overhead of the cluster (and if it crosses the compression threshold, >> significantly). It also marginally increases the time it takes the >> policy engine to calculate a new state, which slows recovery. > > Thanks for the input, Ken! Is this what you mean? > > cib: info: crm_compress_string: Compressed 1028972 bytes into 69095 (ratio > 14:1) in 138ms > > At the same time /var/lib/pacemaker/cib/cib.xml is 336K, and
I wonder why the CIB is transferred as a whole all the time: Considering that the configuration changes rarely, it would not have to be sent all the time. Even if a change occurs, only the affected element (i.e. a single resource) should be transferred. Similarly to the status. > > # cibadmin -Q --scope resources | wc -c > 330951 > # cibadmin -Q --scope status | wc -c > 732820 On a smaller scale I have 55759 bytes resources vs. 111181 bytes status As mentioned in another thread, one of the reasons for a large size are the IDs used to describe an element. For example in resource "prm_foobar" an attribute named "iflabel" has the ID "prm_foobar-instance_attributes-iflabel". Considering that the XML element is (at least) inside a <primitive>, <instance_attributes>, <nvpair> I wonder whether it's really necessary to map the whole path into the ID name. Similar for the status: A significant portion is consumed by transition-keys and transition-magic which seem "over-unique". For example consider these: "158:49:0:69e31903-245d-4265-b732-7 60ddd369df2", "0:0;158:49:0:69e31903-245d-4265-b732-760ddd369df2". So they add extra information to a UUID (Universally Unique ID 128 bit) which is overkill. Is it just to add extra semantic? A UUID alone would be more than enough. Even a GUID (Global Unique ID, 64 bit) would be enough IMHO. (Note that Microsoft thinks GUIDs and UUIDs are the same). > > Even though I consume about 2 kB per resource, the status section > weights 2.2 times the resources section. Which means shrinking the > resource size wouldn't change the full size significantly. Another big saving would be replacing XML elements by a tokenized representation (In the times when RAM was rare, even BASIC interpreters did that). As no-one edits the CIB directly, that wouldn't affect any user (if cibadmin would do the conversions for example). > > At the same time, we should probably monitor the trends of the cluster > messaging health as we expand it (with nodes and resources). What would > be some useful indicators to graph? runaround time, I guess: The longer the messages, the loger processing (, compressing/decompressing) and transfer times. > >> 2. Consistency: Clusters can become partitioned. If changes are made on >> one or more partitions during the separation, the changes won't be >> reflected on all nodes until the partition heals, at which time the >> cluster will reconcile them, potentially losing one side's changes. If only one side of a partitioned cluster is allowed to make (valid) changes, that isn't really a problem. Maybe not everything is working as smoothly as it should. > > Ah, that's a very good point, which I neglected totally: even inquorate > partitions can have configuration changes. Thanks for bringing this up! > I wonder if there's any practical workaround for that. > >> I suppose this isn't qualitatively different from using a separate >> configuration file, but those tend to be more static, and failure to >> modify all copies would be more obvious when doing them individually >> rather than issuing a single cluster command. > > From a different angle: if a node is off, you can't modify its > configuration file. So you need an independent mechanism to do what the > CIB synchronization does anyway, or a shared file system with its added > complexity. On the other hand, one needn't guess how Pacemaker > reconciles the conflicting resource configuration changes. Indeed, how > does it? > -- > Thanks, > Feri > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org