P.S. dmesg doesn't show any hardware problems (bad blocks, segfaults and so on). P.P.S. I think, I was migrate 0.10.1 -> 1.0.1 without database replication, so it may be my fault.
2010/10/7 Alexey Loshkarev <[email protected]>: > I think, this is database file corruption. Query _all_docs returns me > a lot of duplicates (about 3.000 duplicates in ~350.000-documents > database). > > > [12:17:48 r...@node2 (~)]# curl > http://localhost:5984/exhaust/_all_docs > all_docs > % Total % Received % Xferd Average Speed Time Time Time Current > Dload Upload Total Spent Left Speed > 100 37.7M 0 37.7M 0 0 1210k 0 --:--:-- 0:00:31 --:--:-- 943k > [12:18:23 r...@node2 (~)]# wc -l all_docs > 325102 all_docs > [12:18:27 r...@node2 (~)]# uniq all_docs |wc -l > 322924 > > > Node1 has duplicates too, but very small amount: > [12:18:48 r...@node1 (~)]# curl > http://localhost:5984/exhaust/_all_docs > all_docs > % Total % Received % Xferd Average Speed Time Time Time Current > Dload Upload Total Spent Left Speed > 100 38.6M 0 38.6M 0 0 693k 0 --:--:-- 0:00:57 --:--:-- 55809 > [12:19:57 r...@node1 (~)]# wc -l all_docs > 332714 all_docs > [12:20:54 r...@node1 (~)]# uniq all_docs |wc -l > 332523 > > > > 2010/10/7 Alexey Loshkarev <[email protected]>: >> I can't say what specific it may be, so let dive into history of this >> database(s). >> >> First (before a 5-6 weeks) it was node2 server with couchdb v10.1. >> There was testing database on it. There were alot of structural >> changes, view updates and so on. >> Than it becomes production and starts working ok. >> Than we realize we need backup, and best - online backup (as we have >> couchdb we can do this). >> So, there appears node1 server with couchdb 1.0.1. I replicated node2 >> to node1, than initiates continuous replication node1 -> node2 and >> node2 -> node1. All clients works with node2 only. All works fine >> about a month. >> Few days before we was at peak load, so I'v want to use node1 and >> node2 simultaneously. This was done by round-robin on DNS (host db >> returns 2 different IP - node1's ip and node2's IP). All works fine >> about 5 minutes, than I gave first conflict (view queues/all returns >> two identical documents, one - actual version, second - conflicted >> revision, document with field _conflict="....."). Document ID was >> q_tsentr. >> As I don't has conflict resolver yet, I resolves conflict manually by >> deleting conflicted revision. I'v also disables round-robin and move >> all load to node2 to avoid conflicts for a while to wrote conflict >> resolver. >> >> It works ok (node1 and node2 in mutual replications, active load on >> node2) till yesterday. >> Yesterday operator call me he has duplicate data in program. At this >> queues/all returns 1 duplicated document - the same as few days before >> (id = q_tsentr). One row consists of actual document version, another >> row consists of old revision with field _conflicted_revision="some old >> revision". >> >> I tried to delete this revision but without success. GET for >> q_tsentr?rev="some old revision" returns valid document. DELETE >> q_tsentr?rev="some old revision" gaves me 409 error. >> Here are log files (node2): >> >> [Wed, 06 Oct 2010 12:17:19 GMT] [info] [<0.7239.1462>] 10.0.0.41 - - >> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 200 >> [Wed, 06 Oct 2010 12:17:30 GMT] [info] [<0.7245.1462>] 10.0.0.41 - - >> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 200 >> [Wed, 06 Oct 2010 12:17:35 GMT] [info] [<0.7287.1462>] 10.0.0.41 - - >> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 200 >> [Wed, 06 Oct 2010 12:17:43 GMT] [info] [<0.7345.1462>] 10.0.0.41 - - >> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 200 >> [Wed, 06 Oct 2010 12:18:02 GMT] [info] [<0.7864.1462>] 10.0.0.41 - - >> 'DELETE' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 >> 409 >> [Wed, 06 Oct 2010 12:18:29 GMT] [info] [<0.8331.1462>] 10.0.0.41 - - >> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 200 >> [Wed, 06 Oct 2010 12:18:39 GMT] [info] [<0.8363.1462>] 10.0.0.41 - - >> 'DELETE' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 >> 409 >> [Wed, 06 Oct 2010 12:38:19 GMT] [info] [<0.16765.1462>] 10.0.0.41 - - >> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 200 >> [Wed, 06 Oct 2010 12:40:40 GMT] [info] [<0.17337.1462>] 10.0.0.41 - - >> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 200 >> [Wed, 06 Oct 2010 12:40:45 GMT] [info] [<0.17344.1462>] 10.0.0.41 - - >> 'DELETE' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 >> 404 >> >> Logs at node1: >> >> [Wed, 06 Oct 2010 12:17:46 GMT] [info] [<0.25979.462>] 10.20.20.13 - - >> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 200 >> [Wed, 06 Oct 2010 12:17:56 GMT] [info] [<0.26002.462>] 10.20.20.13 - - >> 'DELETE' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 >> 200 >> [Wed, 06 Oct 2010 12:21:25 GMT] [info] [<0.27133.462>] 10.20.20.13 - - >> 'DELETE' /exhaust/q_tsentr?rev=all 404 >> [Wed, 06 Oct 2010 12:21:49 GMT] [info] [<0.27179.462>] 10.20.20.13 - - >> 'DELETE' /exhaust/q_tsentr?revs=true 404 >> [Wed, 06 Oct 2010 12:24:41 GMT] [info] [<0.28959.462>] 10.20.20.13 - - >> 'DELETE' /exhaust/q_tsentr?revs=true 404 >> [Wed, 06 Oct 2010 12:38:07 GMT] [info] [<0.10362.463>] 10.20.20.13 - - >> 'GET' /exhaust/q_tsentr?revs=all 404 >> [Wed, 06 Oct 2010 12:38:23 GMT] [info] [<0.10534.463>] 10.20.20.13 - - >> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 200 >> [Wed, 06 Oct 2010 12:40:25 GMT] [info] [<0.12014.463>] 10.20.20.13 - - >> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 200 >> [Wed, 06 Oct 2010 12:40:33 GMT] [info] [<0.12109.463>] 10.20.20.13 - - >> 'DELETE' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 >> 404 >> >> So, I deletes this document and creates new one (id - q_tsentr2). >> It will works fine about hour. >> >> Node2 has undeletable duplicate, so I move all clients to node1. There >> were now such problem, view response was correct. >> >> Than I tried to recover database at node2. I stops, deletes view index >> files and start couchdb again. Than i ping all view to recreate index. >> At the end ot this procedure, i saw duplicates of identical rows (see >> first letter in this thread). Node1 has no such problems, so I stops >> replication, leave load on node1 and go for crying into this maillist. >> >> >> 2010/10/6 Paul Davis <[email protected]>: >>> It was noted on IRC that I should give a bit more explanation. >>> >>> With the information that you've provided there are two possible >>> explanations. Either your client code is not doing what you expect or >>> you've triggered a really crazy bug in the view indexer that caused it >>> to reindex a database without invalidating a view and not removing >>> keys for docs when it reindexed. >>> >>> Given that no one has reported anything remotely like this and I can't >>> immediately see a code path that would violate so many behaviours in >>> the view updater, I'm leaning towards this being an issue in the >>> client code. >>> >>> If there was something specific that changed since the view worked, >>> that might illuminate what could cause this sort of behaviour if it is >>> indeed a bug in CouchDB. >>> >>> HTH, >>> Paul Davis >>> >>> On Wed, Oct 6, 2010 at 12:24 PM, Alexey Loshkarev <[email protected]> wrote: >>>> I have such view function (map only, without reduce) >>>> >>>> function(doc) { >>>> if (doc.type == "queue") { >>>> emit(doc.ordering, doc.drivers); >>>> } >>>> } >>>> >>>> It works perfect till yesterday, but today it start return duplicates >>>> Example: >>>> $ curl http://node2:5984/exhaust/_design/queues/_view/all >>>> >>>> {"total_rows":46,"offset":0,"rows":[ >>>> {"id":"q_mashinyi-v-gorode","key":0,"value":["d_mironets_ivan","d_smertin_ivan","d_kasyanenko_sergej","d_chabotar_aleksandr","d_martyinenko_yurij","d_krikunenko_aleksandr"]}, >>>> {"id":"q_mashinyi-v-gorode","key":0,"value":["d_mironets_ivan","d_smertin_ivan","d_kasyanenko_sergej","d_chabotar_aleksandr","d_martyinenko_yurij","d_krikunenko_aleksandr"]}, >>>> {"id":"q_mashinyi-v-gorode","key":0,"value":["d_mironets_ivan","d_smertin_ivan","d_kasyanenko_sergej","d_chabotar_aleksandr","d_martyinenko_yurij","d_krikunenko_aleksandr"]}, >>>> ...... >>>> {"id":"q_oblasnaya","key":2,"value":["d_kramarenko_viktor","d_skorodzievskij_eduard"]}, >>>> {"id":"q_oblasnaya","key":2,"value":["d_kramarenko_viktor","d_skorodzievskij_eduard"]}, >>>> {"id":"q_oblasnaya","key":2,"value":["d_kramarenko_viktor","d_skorodzievskij_eduard"]}, >>>> ........ >>>> {"id":"q_otstoj","key":11,"value":["d_gavrilenko_aleksandr","d_klishnev_sergej"]} >>>> ]} >>>> >>>> >>>> I tried to restart server, recreate view (remove view index file), >>>> compact view and database and none of this helps, it still returns >>>> duplicates. >>>> What happens? How to avoid it in the future? >>>> >>>> -- >>>> ---------------- >>>> Best regards >>>> Alexey Loshkarev >>>> mailto:[email protected] >>>> >>> >> >> >> >> -- >> ---------------- >> Best regards >> Alexey Loshkarev >> mailto:[email protected] >> > > > > -- > ---------------- > Best regards > Alexey Loshkarev > mailto:[email protected] > -- ---------------- Best regards Alexey Loshkarev mailto:[email protected]
