On Thu, Aug 2, 2012 at 11:09 PM, Bill Chesky
bill.che...@learninga-z.com wrote:
Hi,
I understand that generally speaking you should use the same analyzer on
querying as was used on indexing. In my code I am using the SnowballAnalyzer
on index creation. However, on the query side I am
Hi Jayendra,
This isn't supported yet. You could implement this by creating a
custom Lucene collector.
This collector could count the unique hits inside a block of docs per
unique facet field value. The
unique facet values could be retrieved from Lucene's FieldCache or doc
values (if you can use
Hi Jayendra,
we use facetting and blockjoinqueries on lucene 3.6 like this:
- Create the FacetsCollector
- For facetting on Parent documents, use ToParentBlockJoinQuery, for
facetting on children ToChildBlockJoinQuery (if needed, add additional
query clauses using a Booleanquery)
- Use
Thanks Simon,
Unfortunately, I'm using Lucene 3.0.1 and CharTermAttribute doesn't seem to
have been introduced until 3.1.0. Similarly my version of Lucene does not have
a BooleanQuery.addClause(BooleanClause) method. Maybe you meant
BooleanQuery.add(BooleanClause).
In any case, most of what
You can add parsed queries to a BooleanQuery. Would that help in this case?
SnowballAnalyzer sba = whatever();
QueryParser qp = new QueryParser(..., sba);
Query q1 = qp.parse(some snowball string);
Query q2 = qp.parse(some other snowball string);
BooleanQuery bq = new BooleanQuery();
bq.add(q1,
Bill, the simple answer to your original question is that in general you
should apply the same or similar analysis for your query terms as you do
with your indexed data. In your specific case the Query.toString is
generating your unanalyzed terms and then the query parser is performing the
I am trying to (mis)use Lucene a bit like a NoSQL database or, rather, a
persistent map. I am entering 38000 documents at a rate of 1000/s to the
index. Because each item add may be actually an update, I have a
sequence of read/change/write for each of the documents.
All goes well until when
hey harald,
if you use a possibly different searcher (reader) than you used for
the search you will run into problems with the doc IDs since they
might change during the request. I suggest you to use SearcherManager
or NRTMangager and carry on the searcher reference when you collect
the stored
Jack,
Thanks. Yeah, I don't know what you mean be term analysis. I googled it but
didn't come up with much. So if that is the preferred way of doing this, a
wiki document would be greatly appreciated.
I notice you did say I should be doing the term analysis first. But is it
wrong to do
Ian,
I gave this method a try, at least the way I understood your suggestion. E.g.
to search for the phrase cells combine I built up a string like:
title:cells combine description:cells combine text:cells combine
then I passed that to the queryParser.parse() method (where queryParser is an
Bill
You're getting the snowball stemming either way which I guess is good,
and if you get same results either way maybe it doesn't matter which
technique you use. I'd be a bit worried about parsing the result of
query.toString() because you aren't guaranteed to get back, in text,
what you put
Bill, the re-parse of Query.toString will work provided that your query
terms are either un-analyzed or their analyzer is idempotent (can be
applied repeatedly without changing the output terms.) In your case, you are
doing the former.
The bottom line: 1) if it works for you, great, 2) for
Ian/Jack,
Ok, thanks for the help. I certainly don't want to take a cheap way out, hence
my original question about whether this is the right way to do this. Jack, you
say the right way is to do Term analysis before creating the Term. If anybody
has any information on how to accomplish this
Simon gave sample code for analyzing a multi-term string.
Here's some pseudo-code (hasn't been compiled to check it) to analyze a
single term with Lucene 3.6:
public Term analyzeTerm(Analyzer analyzer, String termString){
TokenStream stream = analyzer.tokenStream(field, new
you must call reset() before consuming any tokenstream.
On Fri, Aug 3, 2012 at 4:03 PM, Jack Krupansky j...@basetechnology.com wrote:
Simon gave sample code for analyzing a multi-term string.
Here's some pseudo-code (hasn't been compiled to check it) to analyze a
single term with Lucene 3.6:
I still don't see what Bill gains by doing the term analysis himself
rather than letting QueryParser do the hard work, in a portable
non-lucene-version-specific way.
--
Ian.
On Fri, Aug 3, 2012 at 9:39 PM, Robert Muir rcm...@gmail.com wrote:
you must call reset() before consuming any
Thanks for the help everybody. We're using 3.0.1 so I couldn't do exactly what
Simon and Jack suggested. But after some searching around I came up with this
method:
private String analyze(String token) throws Exception {
StringBuffer result = new StringBuffer();
What it buys you is not having to convert the whole complex query to
string form, which is not guaranteed to be reparseable for all queries
(e.g., AND or -abc as raw terms would be treated as operators), and then
parsing it which will turn around and regenerate the same query structure
(you
Hello Simon,
thanks for the information. I really thought that once a docId is
assigned it is kept until the document is deleted. The only problem I
would have expected are docIds that no longer refer to a document,
because it was deleted in the meantime. But this is clearly not the case
in
Hello Simon,
now that I knew what to search for I found
http://wiki.apache.org/lucene-java/LuceneFAQ#When_is_it_possible_for_document_IDs_to_change.3F
So that clearly explains this issue for me.
Many thanks for your help.
Harald
Am 04.08.2012 07:38, schrieb Harald Kirsch:
Hello Simon,
20 matches
Mail list logo