On Fri, Dec 13, 2013 at 12:43 AM, Kiran Ayyagari <[email protected]>wrote:
> > > > On Mon, Dec 9, 2013 at 8:28 PM, Kiran Ayyagari <[email protected]>wrote: > >> >> >> >> On Mon, Dec 9, 2013 at 5:29 PM, Jörg Maaß <[email protected]> wrote: >> >>> Dear all, >>> >>> thanks or your kind and quick help so far. However, there are still >>> questions open from our side, and your help in answering them is greatly >>> appreciated. We currently have the following setup: >>> 2 * Dell PowerEdge R815/II with 64 GB Memory and 2 * AMD Opteron 6172 >>> with 12 Cores, 512k Cache and 2.1 GHz, 4x600 GB SAS >>> ApacheDS 2.0 M15 >>> Oracle JVM 1.7.0-25 >>> Memory parameters for the ApacheDS JVM: -Xms2048m -Xmx2048m >>> 30 nbthreads to handle the LDAP requests >>> Indexes on all attributes that are searched (created BEFORE the data was >>> loaded, size 10000) >>> Ca. 20000 objects in the central partition >>> Cache size for the partition: 10000 >>> Do you have any indication as to how many parallel BIND sessions and >>> searches we can support with such a setup? Is there a performance >>> whitepaper of any sort for ApacheDS (or similar information)? If so: where >>> can we find it? >>> >>> Additionally, we are experiencing regular trouble with replication in >>> our Multi Master Replication setup: If any of the servers is offline for a >>> longer period of time, it will crash when booting up again and will delete >>> all information in the partition that is being replicated. Just this >>> weekend, the havoc was even greater: the partition was totally destroyed >>> AND the server that was online all the time refused to start ApacheDS >>> again, because the partition data on disk was destroyed. >>> >>> how long the server was offline? before I guess on what would have >> caused this I would like >> to reproduce this issue in my lab and then will let you know the reason >> or the fix to avoid >> this. I would appreciate if you can share any logs related to this >> error. >> > I have found the issue which might be the reason for this issue. > It is due to sending entries to the client (i.e., a slave or another > master peer) in a random order. > I will commit a fix by the weekend and let you know. > this issue is fixed in the trunk, (the "weekend" took a long time to come due to the complexity involved in handling this issue) appreciate if you can test and let us know, thank you > Of course, we followed the published documentation when setting up the >>> servers and have unique replication IDs for our servers, etc. The system >>> times are synchronized via NTP. >>> >>> We where able to restore the data from backup, but this clearly >>> disqualifies ApacheDS as a production ready system. >>> >>> Can you give us any hints as to avoid such a catastrophic scenario in >>> the future? >>> >>> it is premature to suggest anything at this moment, I will get back to >> you as soon as I have an answer (max a day). >> >>> Your help is greatly appreciated. >>> >>> >>> Kind regards >>> >>> >>> >>> Jörg Maaß >>> -- >>> T: +49 6027 409219 >>> M: +49 178 5352364 >>> F: +49 6027 409220 >>> W: http://www.eacg.de >>> ---- >>> EACG GmbH >>> Enterprise Architecture Consulting Group >>> OpernTurm 16.OG, Bockenheimer Landstraße 2-4, 60306 Frankfurt am Main >>> Handelsregister Frankfurt am Main HRB 84852 >>> Geschäftsführer: Jan Thielscher >>> >>> >> >> >> -- >> Kiran Ayyagari >> http://keydap.com >> > > > > -- > Kiran Ayyagari > http://keydap.com > -- Kiran Ayyagari http://keydap.com
