The analyzer setting is a top-level item as documented in the README here; https://github.com/rnewson/couchdb-lucene
B. On 5 September 2011 10:14, Rory Franklin <[email protected]> wrote: > I've modified my original index in CouchDB to be the following, but not > having any joy with things being broken up in to tokens: > > > { > "_id": "_design/foo", > "_rev": "19-da99913ce4cdd421903d0d48f9a40cc3", > "fulltext": { > "by_metadata": { > "index": "function(doc) { > var ret=new Document(); > if (doc['type'] == 'CSAsset' && doc['deleted'] != true) { > for (var i in doc.metadata) { > if(doc.metadata[i]['key'] == 'Title') { > ret.add(doc.metadata[i]['value'].toLowerCase(), {'field':'sort_title', > 'store':'yes', 'index' : 'not_analyzed'}); > } > ret.add(doc.metadata[i]['value'],{ 'field' : > doc.metadata[i]['key'].toLowerCase(), 'analyzer' : 'simple' }); > ret.add(doc.metadata[i]['value'], { 'analyzer' : 'simple' }); > } > for (var i in doc.partitions) { > ret.add(doc.partitions[i].partition_id,{'field':'partition'}); > ret.add(doc.partitions[i].partition_id); > } > ret.add(doc['created_at'], {'field':'sort_created_at', 'store':'yes', 'index' > : 'not_analyzed'}); > return ret; > } else { > return null; > } > }" > } > } > } > > I've opened the index up in Luke and going to the Documents tab and doing > reconstruct & edit on a particular document shows that the fields aren't > being split up in to separate tokens. > > > -- > > Rory > > On Saturday, 3 September 2011 at 17:12, Robert Newson wrote: > >> " For instance, searching for the term "wonderland" should return back >> a document where there is a field with the value >> "some_wonderland_example" but it doesn't." >> >> It shouldn't and doesn't. :) >> >> 'some_wonderland_example' is a single token when tokenized by the >> default StandardAnalyzer. If instead you specify "analyzer":"simple", >> you will find that it is 3 tokens, and your search should work. >> >> B. >> >> On 3 September 2011 16:06, Rory Franklin <[email protected] >> (mailto:[email protected])> wrote: >> > I'm using couchdb-lucene to index a list of fields (user defined) in a >> > document using the following design document: >> > >> > { >> > "_id": "_design/foo", >> > "_rev": "16-dcd0d39369c35b3d74ceef13a388826f", >> > "fulltext": { >> > "by_metadata": { >> > "index": "function(doc) { >> > var ret=new Document(); >> > if (doc['type'] == 'CSAsset' && doc['deleted'] != true) { >> > for (var i in doc.metadata) { >> > if(doc.metadata[i]['key'] == 'Title') { >> > ret.add(doc.metadata[i]['value'].toLowerCase(), {'field':'sort_title', >> > 'store':'yes', 'index' : 'not_analyzed'}); >> > } >> > ret.add(doc.metadata[i]['value'],{'field':doc.metadata[i]['key'].toLowerCase() >> > }); >> > ret.add(doc.metadata[i]['value']); >> > } >> > for (var i in doc.partitions) { >> > ret.add(doc.partitions[i].partition_id,{'field':'partition'}); >> > ret.add(doc.partitions[i].partition_id); >> > } >> > ret.add(doc['created_at'], {'field':'sort_created_at', 'store':'yes', >> > 'index' : 'not_analyzed'}); >> > return ret; >> > } else { >> > return null; >> > } >> > }" >> > } >> > } >> > } >> > >> > >> > >> > (I've formatted the definition so that it's not all on one line for >> > readability here) >> > >> > However, when using the by_metadata view it doesn't appear to be breaking >> > the values up when there are underscores. For instance, searching for the >> > term "wonderland" should return back a document where there is a field >> > with the value "some_wonderland_example" but it doesn't. It returns the >> > document if I search for the full term. >> > >> > I'm just wondering whether I'm defining the index incorrectly? (of course, >> > feel free to point out if I'm doing anything else glaringly obviously >> > wrong too!) >> > >> > >> > >> > Rory > >
