I would like to take a look at your pathAnalyzer code.. I got this more or less working, but I'd love to see another way to do it - your solution sounds much more robust than mine. Easier to search for specific paths, for sure.

+--------------------------------------------------------+
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
+--------------------------------------------------------+


On Aug 9, 2007, at 10:44 PM, Jonathan Woods wrote:

Maybe there's a different way, in which path-like values like this are
treated explicitly.

I use a similar approach to Matthew at www.colfes.com, where all pages are
generated from Lucene searches according to filters on a couple of
hierarchical categories ('spaces'), i.e. subject and organisational unit.
From that experience, a few things occur to me here:

1.  The structure of any particular category/space is not immediately
derivable from data, so unless we're Google or doing something RDF- like they're something you define up front. For this reason, and because it makes internationalisation easier, I feel you should model this kind of
standing data independently of its representation.

So instead searching for Departments>Men's Apparel>Jackets, I index (and search for) a String "/departments/mensapparel/jackets/", and used a simple standing data mapping to resolves each of the nodes along the path to a
human-readable form when necessary.  In my case, the values for any
particular resource (e.g. a news article) are defined by CMS users from
drop-downs.

2.  In my Lucene library, I redundantly indexed paths like
"/departments/mensapparel/jackets/" into successive fragments, together with
the whole path value:

/departments
/departments/mensapparel
/departments/mensapparel/jackets
/departments/mensapparel/jackets/

using my own PathAnalyzer (extends Analyzer, of course) which makes it very fast to query on path fragments: "all goods anywhere in the men's apparel section" -> query on "/departments/mensapparel"; "all goods categorised as
exactly in the men's apparel section" -> query on
"/departments/mensapparel/".

I implemented all queries like this as filters, and cached the filter
definitions. I guess Solr's query optimisation and filter caching do all this out of the box, so it may end up being just as fast to use the kind of
PrefixQuery suggested in this thread.

3. However, I can post/attach/donate PathAnalyzer if anyone thinks it might still be useful. I started off calling it HierarchyValueAnalyzer, then TreeNodePathAnalyzer, but now that it's PathAnalyzer I cna't help thinking
it might have lots of applications....

Jon

-----Original Message-----
From: Yonik Seeley [mailto:[EMAIL PROTECTED]
Sent: 09 August 2007 21:50
To: solr-user@lucene.apache.org
Subject: Re: Best use of wildcard searches

On 8/9/07, Matthew Runo <[EMAIL PROTECTED]> wrote:
http://66.209.92.171:8080/solr/select/?q=department_exact:Apparel%
3EMen's%20Apparel%
3EJackets*&fq=country_code:US&fq=brand_exact:adidas&wt=python

The same exact query, with... wait..

Wow. I'm making myself look like an idiot.

I swear that these queries didn't work the first time I ran them...

But now "\ " and "?" give the same results, as would be expected,
while " " returns nothing.

I'm sorry for wasting your time, but I do appreciate the help!

lo - these things can happen when you get too many levels of
escaping needed.
Hopefully we can improve the situation in the future to get
rid of the query parser escaping for certain queries such as
prefix and term.


Reply via email to