Re: couchdb-lucene indexing issues

Robert Newson Mon, 05 Sep 2011 02:39:06 -0700

The analyzer setting is a top-level item as documented in the README here;

https://github.com/rnewson/couchdb-lucene


B.

On 5 September 2011 10:14, Rory Franklin <[email protected]> wrote:
>  I've modified my original index in CouchDB to be the following, but not 
> having any joy with things being broken up in to tokens:
>
>
> {
>  "_id": "_design/foo",
>  "_rev": "19-da99913ce4cdd421903d0d48f9a40cc3",
>  "fulltext": {
> "by_metadata": {
>  "index": "function(doc) {
> var ret=new Document();
> if (doc['type'] == 'CSAsset' && doc['deleted'] != true) {
> for (var i in doc.metadata) {
> if(doc.metadata[i]['key'] == 'Title') {
> ret.add(doc.metadata[i]['value'].toLowerCase(), {'field':'sort_title', 
> 'store':'yes', 'index' : 'not_analyzed'});
> }
> ret.add(doc.metadata[i]['value'],{ 'field' : 
> doc.metadata[i]['key'].toLowerCase(), 'analyzer' : 'simple' });
> ret.add(doc.metadata[i]['value'], { 'analyzer' : 'simple' });
> }
> for (var i in doc.partitions) {
> ret.add(doc.partitions[i].partition_id,{'field':'partition'}); 
> ret.add(doc.partitions[i].partition_id);
> }
> ret.add(doc['created_at'], {'field':'sort_created_at', 'store':'yes', 'index' 
> : 'not_analyzed'});
> return ret;
> } else {
> return null;
> }
> }"
>  }
>  }
> }
>
> I've opened the index up in Luke and going to the Documents tab and doing 
> reconstruct & edit on a particular document shows that the fields aren't 
> being split up in to separate tokens.
>
>
> --
>
> Rory
>
> On Saturday, 3 September 2011 at 17:12, Robert Newson wrote:
>
>> " For instance, searching for the term "wonderland" should return back
>> a document where there is a field with the value
>> "some_wonderland_example" but it doesn't."
>>
>> It shouldn't and doesn't. :)
>>
>> 'some_wonderland_example' is a single token when tokenized by the
>> default StandardAnalyzer. If instead you specify "analyzer":"simple",
>> you will find that it is 3 tokens, and your search should work.
>>
>> B.
>>
>> On 3 September 2011 16:06, Rory Franklin <[email protected] 
>> (mailto:[email protected])> wrote:
>> > I'm using couchdb-lucene to index a list of fields (user defined) in a 
>> > document using the following design document:
>> >
>> > {
>> > "_id": "_design/foo",
>> > "_rev": "16-dcd0d39369c35b3d74ceef13a388826f",
>> > "fulltext": {
>> > "by_metadata": {
>> > "index": "function(doc) {
>> > var ret=new Document();
>> > if (doc['type'] == 'CSAsset' && doc['deleted'] != true) {
>> > for (var i in doc.metadata) {
>> > if(doc.metadata[i]['key'] == 'Title') {
>> > ret.add(doc.metadata[i]['value'].toLowerCase(), {'field':'sort_title', 
>> > 'store':'yes', 'index' : 'not_analyzed'});
>> > }
>> > ret.add(doc.metadata[i]['value'],{'field':doc.metadata[i]['key'].toLowerCase()
>> >  });
>> > ret.add(doc.metadata[i]['value']);
>> > }
>> > for (var i in doc.partitions) {
>> > ret.add(doc.partitions[i].partition_id,{'field':'partition'}); 
>> > ret.add(doc.partitions[i].partition_id);
>> > }
>> > ret.add(doc['created_at'], {'field':'sort_created_at', 'store':'yes', 
>> > 'index' : 'not_analyzed'});
>> > return ret;
>> > } else {
>> > return null;
>> > }
>> > }"
>> > }
>> > }
>> > }
>> >
>> >
>> >
>> > (I've formatted the definition so that it's not all on one line for 
>> > readability here)
>> >
>> > However, when using the by_metadata view it doesn't appear to be breaking 
>> > the values up when there are underscores. For instance, searching for the 
>> > term "wonderland" should return back a document where there is a field 
>> > with the value "some_wonderland_example" but it doesn't. It returns the 
>> > document if I search for the full term.
>> >
>> > I'm just wondering whether I'm defining the index incorrectly? (of course, 
>> > feel free to point out if I'm doing anything else glaringly obviously 
>> > wrong too!)
>> >
>> >
>> >
>> > Rory
>
>

Re: couchdb-lucene indexing issues

Reply via email to