I would like to take a look at your pathAnalyzer code.. I got this
more or less working, but I'd love to see another way to do it - your
solution sounds much more robust than mine. Easier to search for
specific paths, for sure.
+--------------------------------------------------------+
| Matthew Runo
| Zappos Development
| [EMAIL PROTECTED]
| 702-943-7833
+--------------------------------------------------------+
On Aug 9, 2007, at 10:44 PM, Jonathan Woods wrote:
Maybe there's a different way, in which path-like values like this are
treated explicitly.
I use a similar approach to Matthew at www.colfes.com, where all
pages are
generated from Lucene searches according to filters on a couple of
hierarchical categories ('spaces'), i.e. subject and organisational
unit.
From that experience, a few things occur to me here:
1. The structure of any particular category/space is not immediately
derivable from data, so unless we're Google or doing something RDF-
like
they're something you define up front. For this reason, and
because it
makes internationalisation easier, I feel you should model this
kind of
standing data independently of its representation.
So instead searching for Departments>Men's Apparel>Jackets, I index
(and
search for) a String "/departments/mensapparel/jackets/", and used
a simple
standing data mapping to resolves each of the nodes along the path
to a
human-readable form when necessary. In my case, the values for any
particular resource (e.g. a news article) are defined by CMS users
from
drop-downs.
2. In my Lucene library, I redundantly indexed paths like
"/departments/mensapparel/jackets/" into successive fragments,
together with
the whole path value:
/departments
/departments/mensapparel
/departments/mensapparel/jackets
/departments/mensapparel/jackets/
using my own PathAnalyzer (extends Analyzer, of course) which makes
it very
fast to query on path fragments: "all goods anywhere in the men's
apparel
section" -> query on "/departments/mensapparel"; "all goods
categorised as
exactly in the men's apparel section" -> query on
"/departments/mensapparel/".
I implemented all queries like this as filters, and cached the filter
definitions. I guess Solr's query optimisation and filter caching
do all
this out of the box, so it may end up being just as fast to use the
kind of
PrefixQuery suggested in this thread.
3. However, I can post/attach/donate PathAnalyzer if anyone thinks
it might
still be useful. I started off calling it HierarchyValueAnalyzer,
then
TreeNodePathAnalyzer, but now that it's PathAnalyzer I cna't help
thinking
it might have lots of applications....
Jon
-----Original Message-----
From: Yonik Seeley [mailto:[EMAIL PROTECTED]
Sent: 09 August 2007 21:50
To: solr-user@lucene.apache.org
Subject: Re: Best use of wildcard searches
On 8/9/07, Matthew Runo <[EMAIL PROTECTED]> wrote:
http://66.209.92.171:8080/solr/select/?q=department_exact:Apparel%
3EMen's%20Apparel%
3EJackets*&fq=country_code:US&fq=brand_exact:adidas&wt=python
The same exact query, with... wait..
Wow. I'm making myself look like an idiot.
I swear that these queries didn't work the first time I ran them...
But now "\ " and "?" give the same results, as would be expected,
while " " returns nothing.
I'm sorry for wasting your time, but I do appreciate the help!
lo - these things can happen when you get too many levels of
escaping needed.
Hopefully we can improve the situation in the future to get
rid of the query parser escaping for certain queries such as
prefix and term.