I have since enabled the compaction daemon but the problem arose prior to any compaction, I am reasonably certain. Automatic compaction was not enabled and I certainly didn't compact manually.

I found some issues against older versions of CouchDB dealing with documents missing from the ID index for undetermined reasons. What I was seeing was consistent with the descriptions there.

I have deleted and recreated the database and all is well since.

I have a copy of the problematic database but haven't had time to investigate further, family and holidays having intervened. I began looking at CouchDB code but not familiar with Erlang or the code generally so not much progress yet. I hope, eventually, to examine the database to determine more certainly where the documents are present and missing, how the changes feed and responses to document GETs are generated from the database file content. This will take some time. It may be that I will never know how it got to the state it is in.

Thanks for the examples Robert. They confirm my expectations. What I was seeing that is inconsistent is that the latest / deleted revision was also 'missing' whereas in your examples, after compaction, the latest / deleted revision is still available after deletion and compaction. This is important as I am not using DELETE to delete, I am using PUT and/or bulk update with _deleted: true and saving some other data (deletion datetime, user deleting, etc.) and need to be able to read these after deletion and compaction.

I'll look at this further next week. In the meantime, thank you both for considering the issue.

Regards,
Ian

On 5/01/2017 12:05, Robert Samuel Newson wrote:
just reconfirmed;

"missing" is for non-leaf revisions _after_ compaction has removed them, so, 
yeah, I think Dale is right that you've compacted without realising it. Perhaps you have 
the compaction daemon enabled?

create a new document;

➜  ~ curl 'foo:bar@localhost:15984/db1/doc2' -XPUT -d '{"hello":true}'
{"ok":true,"id":"doc2","rev":"1-7c881dd71ff21b0cee8c541a5463ac56"}

Then fetch it without rev;

➜  ~ curl 'foo:bar@localhost:15984/db1/doc2'
{"_id":"doc2","_rev":"1-7c881dd71ff21b0cee8c541a5463ac56","hello":true}

And with explicit rev;

➜  ~ curl 
'foo:bar@localhost:15984/db1/doc2?rev=1-7c881dd71ff21b0cee8c541a5463ac56'
{"_id":"doc2","_rev":"1-7c881dd71ff21b0cee8c541a5463ac56","hello":true}


Now I delete it;

➜  ~ curl 
'foo:bar@localhost:15984/db1/doc2?rev=1-7c881dd71ff21b0cee8c541a5463ac56' 
-XDELETE
{"ok":true,"id":"doc2","rev":"2-5875293a8441836659ff7f5bcbc56df9"}

Fetch it without rev, and it's deleted (as expected)

➜  ~ curl 'foo:bar@localhost:15984/db1/doc2'
{"error":"not_found","reason":"deleted"}

Fetch the old revision explicitly and it's still there;

➜  ~ curl 
'foo:bar@localhost:15984/db1/doc2?rev=1-7c881dd71ff21b0cee8c541a5463ac56'
{"_id":"doc2","_rev":"1-7c881dd71ff21b0cee8c541a5463ac56","hello":true}

Fetch the new version explicitly and we see the body;

➜  ~ curl 
'foo:bar@localhost:15984/db1/doc2?rev=2-5875293a8441836659ff7f5bcbc56df9'
{"_id":"doc2","_rev":"2-5875293a8441836659ff7f5bcbc56df9","_deleted":true}

Now we compact;

➜  ~ curl 'foo:bar@localhost:15984/db1/_compact' -XPOST 
-Hcontent-type:application/json
{"ok":true}

Fetch the old revision explicitly and it's now missing (expected)

➜  ~ curl 
'foo:bar@localhost:15984/db1/doc2?rev=1-7c881dd71ff21b0cee8c541a5463ac56'
{"error":"not_found","reason":"missing"}

Fetch the current revision explicitly and it's still there (also expected)

➜  ~ curl 
'foo:bar@localhost:15984/db1/doc2?rev=2-5875293a8441836659ff7f5bcbc56df9'
{"_id":"doc2","_rev":"2-5875293a8441836659ff7f5bcbc56df9","_deleted":true}


B.

On 4 Jan 2017, at 22:58, Robert Samuel Newson <rnew...@apache.org> wrote:

ah, right, thanks dale.

to be "missing" it has to have been removed by the compactor, right? it's 
"deleted" until then.

B.


On 4 Jan 2017, at 09:06, Dale Harvey <d...@arandomurl.com> wrote:

If you specify the revision and that revision is available it should return
even if the winning revision is deleted

$ curl -X GET http://127.0.0.1:5984/test/docid
{"error":"not_found","reason":"deleted"}
$ curl -X GET
http://127.0.0.1:5984/test/docid?rev=1-967a00dff5e02add41819138abb3284d
{"_id":"docid","_rev":"1-967a00dff5e02add41819138abb3284d"}

First thing I would check for is a race, put a delay in there after seeing
the revision to fetching it but I would be surprised if there was a bug
there, next I would check for auto compaction as that would trigger that
behaviour

Cheers
Dale


On 3 January 2017 at 23:51, Robert Samuel Newson <rnew...@apache.org> wrote:

hm, a GET /dbname/docid where the document is deleted should always return
a 404, can you show a counterexample?

B.

On 27 Dec 2016, at 05:02, Ian Goodacre <ian.gooda...@xtra.co.nz> wrote:

I haven't compacted the database. When I get the document, I am
specifying the revision that appears in the output from _changes. This
output includes many deleted documents. Most of these I am able to retrieve
in this way but there are a few exceptions for which I get the 404 error.
Those for which I get the 404 error appear only once in the output from
_changes, so I assume the revision is the latest. In any case, something
makes these few documents different from the other deleted documents. I
don't yet know what has caused them to be different, which is what I would
like to know.
After compacting a database, is it normal for revisions of documents to
appear in the output of _changes despite them having been removed from the
database and unavailable to get, even with the revision specified? Other
than a continuous changes feed, in which a document may appear repeatedly,
my observation (not very thorough but I haven't yet seen an exception) has
been that each document appears only once and the latest/winning revision
appears. It seems odd that an unavailable revision would appear.
On 26/12/2016 13:35, Robert Newson wrote:
Deleted docs return 404 when fetched, that's normal. If you're fetching
an older revision than the latest, it will also be missing if you've
compacted the database.
Sent from my iPhone

On 24 Dec 2016, at 17:32, Ian Goodacre <ian.gooda...@xtra.co.nz>
wrote:
Hi all,

I am running CouchDB 1.6.1 on Linux.

I have a database that has many deleted documents and I am able to
retrieve most of these but there are a few that I am unable to retrieve.
When I attempt to retrieve these, I get 404 with error 'not_found' and
reason 'missing'.
I would like to understand why these few documents are different - why
am I unable to retrieve these deleted documents?
For example, _changes response includes:

  {
    "deleted": true,
    "changes": [
      {
        "rev": "2-338d783957e141566caf3662cc0726bb"
      }
    ],
    "id": "assets_0000245",
    "seq": 2355
  },


When I attempt to retrieve this with:

http://localhost:5984/dcm_assets_tp/assets_0000245?rev=2-
338d783957e141566caf3662cc0726bb
I get a 404 response.

I am expecting to get the deleted document, even it if only contains
_id, _rev and _deleted.
Also, I don't understand the response to

curl --noproxy '*' -X GET 'http://localhost:5984/dcm_
assets_tp/assets_0000245?open_revs=all'
which is

--a341c8902ae323bd6ea7d938bc0c2ac5--

And I get the same in response to

http://localhost:5984/dcm_assets_tp/assets_0000245?revs=
true&open_revs=all
But, if I add -H 'Accept: application/json' then I get an empty array
([]):
curl --noproxy '*' -X GET -H 'Accept: application/json' '
http://localhost:5984/dcm_assets_tp/assets_0000245?open_revs=all'

I must be misunderstanding something (or a lot of things). Any help
would be appreciated.
Regards,
Ian




Reply via email to