Re: Best use of wildcard searches

Matthew Runo Fri, 10 Aug 2007 10:21:51 -0700

I would like to take a look at your pathAnalyzer code.. I got thismore or less working, but I'd love to see another way to do it - yoursolution sounds much more robust than mine. Easier to search forspecific paths, for sure.


+--------------------------------------------------------+
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
+--------------------------------------------------------+



On Aug 9, 2007, at 10:44 PM, Jonathan Woods wrote:

Maybe there's a different way, in which path-like values like this are
treated explicitly.
I use a similar approach to Matthew at www.colfes.com, where allpages are
generated from Lucene searches according to filters on a couple of
hierarchical categories ('spaces'), i.e. subject and organisationalunit.
From that experience, a few things occur to me here:

1.  The structure of any particular category/space is not immediately
derivable from data, so unless we're Google or doing something RDF-likethey're something you define up front. For this reason, andbecause itmakes internationalisation easier, I feel you should model thiskind of
standing data independently of its representation.
So instead searching for Departments>Men's Apparel>Jackets, I index(andsearch for) a String "/departments/mensapparel/jackets/", and useda simplestanding data mapping to resolves each of the nodes along the pathto a
human-readable form when necessary.  In my case, the values for any
particular resource (e.g. a news article) are defined by CMS usersfrom
drop-downs.

2.  In my Lucene library, I redundantly indexed paths like
"/departments/mensapparel/jackets/" into successive fragments,together with
the whole path value:

/departments
/departments/mensapparel
/departments/mensapparel/jackets
/departments/mensapparel/jackets/
using my own PathAnalyzer (extends Analyzer, of course) which makesit veryfast to query on path fragments: "all goods anywhere in the men'sapparelsection" -> query on "/departments/mensapparel"; "all goodscategorised as
exactly in the men's apparel section" -> query on
"/departments/mensapparel/".

I implemented all queries like this as filters, and cached the filter
definitions. I guess Solr's query optimisation and filter cachingdo allthis out of the box, so it may end up being just as fast to use thekind of
PrefixQuery suggested in this thread.
3. However, I can post/attach/donate PathAnalyzer if anyone thinksit mightstill be useful. I started off calling it HierarchyValueAnalyzer,thenTreeNodePathAnalyzer, but now that it's PathAnalyzer I cna't helpthinking
it might have lots of applications....

Jon
-----Original Message-----
From: Yonik Seeley [mailto:[EMAIL PROTECTED]
Sent: 09 August 2007 21:50
To: solr-user@lucene.apache.org
Subject: Re: Best use of wildcard searches

On 8/9/07, Matthew Runo <[EMAIL PROTECTED]> wrote:
http://66.209.92.171:8080/solr/select/?q=department_exact:Apparel%
3EMen's%20Apparel%
3EJackets*&fq=country_code:US&fq=brand_exact:adidas&wt=python

The same exact query, with... wait..

Wow. I'm making myself look like an idiot.

I swear that these queries didn't work the first time I ran them...

But now "\ " and "?" give the same results, as would be expected,
while " " returns nothing.

I'm sorry for wasting your time, but I do appreciate the help!
lo - these things can happen when you get too many levels of
escaping needed.
Hopefully we can improve the situation in the future to get
rid of the query parser escaping for certain queries such as
prefix and term.

Re: Best use of wildcard searches

Reply via email to