AFAIK, couch views are impossible to parametrize. It is their strong
and weak side.

I'd go with:
(virtual step) add field "last_indexed" with timestamp of last index to docs
2. create view with map function(doc) { emit( doc.last_indexed ?
doc.last_indexed : null, null ); }
3. run background service which
3.1. queries that view with descending=true&limit=100
3.2. runs xapian indexer against returned docs
3.3. sets doc.last_indexed = datetime.now
3.4. tries to do bulk save of 100 docs into couch
3.5. if saving of some doc fails, it's ok, it means that someone other
changed document in the mean while, failed docs would still appear at
the top of query

This way your xapian index and real documents are always somewhat out
of sync, but this must be acceptable for most cases.

Of course, you can completely replace Xapian with Lucene or Sphinx or
*your custom* or some other FTS engine. It's just an idea to separate
fulltext indexing from database. And Xapian has nice support for that.
It works with arbitrary "documents" (sets of fields), just like couch,
while, e.g. Sphinx works only with SQL databases.

On Fri, May 29, 2009 at 2:49 PM, Peter Maas <pfmm...@gmail.com> wrote:
> Hi Sergey,
>
> Yes, I know there are other projects which might better fit my requirements.
> That is not the point. I just wanted to see what the 'edges' of couchdb are.
> 'sanitizing' input parameters is one of my 'blindspots' at the moment, the
> stemming is just an example where i'd like to use it.
>
>
> On May 29, 2009, at 12:20 , Sergey Shepelev wrote:
>
>> Just use Xapian.
>>
>> On Fri, May 29, 2009 at 1:47 PM, Peter Maas <pfmm...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I'm trying to write a very basic fulltext search facility and managed to
>>> get
>>> something working:
>>>
>>> http://log4p.com/2009/05/28/simple-fulltext-analysis-in-couchdb/
>>>
>>> Currently I sanitize the source text, remove stopwords (English that is)
>>> and
>>> emit a each term (with the number of occurences). Quite useful (for me)
>>> already.
>>>
>>> The next step would be to add stemming, not to hard either (have a
>>> working
>>> prototype already). This does however present me with a new problem. I'd
>>> like to stem the provided keys in the REST parameters using the same
>>> stemmer
>>> used by the mapping code. Is there a way to process the parameters passed
>>> to
>>> a REST view within CouchDB? Or would I need to duplicate (port) the
>>> stemmer
>>> in the clients (which are various in various languages)?
>>>
>>> kind regards,
>>>
>>> Peter
>>>
>
>

Reply via email to