Re: Analyzer on query question

2012-08-03 Thread Simon Willnauer
On Thu, Aug 2, 2012 at 11:09 PM, Bill Chesky bill.che...@learninga-z.com wrote: Hi, I understand that generally speaking you should use the same analyzer on querying as was used on indexing. In my code I am using the SnowballAnalyzer on index creation. However, on the query side I am

Re: ToParentBlockJoinQuery - Faceting on Parent and Child Documents

2012-08-03 Thread Martijn v Groningen
Hi Jayendra, This isn't supported yet. You could implement this by creating a custom Lucene collector. This collector could count the unique hits inside a block of docs per unique facet field value. The unique facet values could be retrieved from Lucene's FieldCache or doc values (if you can use

Re: ToParentBlockJoinQuery - Faceting on Parent and Child Documents

2012-08-03 Thread Christoph Kaser
Hi Jayendra, we use facetting and blockjoinqueries on lucene 3.6 like this: - Create the FacetsCollector - For facetting on Parent documents, use ToParentBlockJoinQuery, for facetting on children ToChildBlockJoinQuery (if needed, add additional query clauses using a Booleanquery) - Use

RE: Analyzer on query question

2012-08-03 Thread Bill Chesky
Thanks Simon, Unfortunately, I'm using Lucene 3.0.1 and CharTermAttribute doesn't seem to have been introduced until 3.1.0. Similarly my version of Lucene does not have a BooleanQuery.addClause(BooleanClause) method. Maybe you meant BooleanQuery.add(BooleanClause). In any case, most of what

Re: Analyzer on query question

2012-08-03 Thread Ian Lea
You can add parsed queries to a BooleanQuery. Would that help in this case? SnowballAnalyzer sba = whatever(); QueryParser qp = new QueryParser(..., sba); Query q1 = qp.parse(some snowball string); Query q2 = qp.parse(some other snowball string); BooleanQuery bq = new BooleanQuery(); bq.add(q1,

Re: Analyzer on query question

2012-08-03 Thread Jack Krupansky
Bill, the simple answer to your original question is that in general you should apply the same or similar analysis for your query terms as you do with your indexed data. In your specific case the Query.toString is generating your unanalyzed terms and then the query parser is performing the

Problem with near realtime search

2012-08-03 Thread Harald Kirsch
I am trying to (mis)use Lucene a bit like a NoSQL database or, rather, a persistent map. I am entering 38000 documents at a rate of 1000/s to the index. Because each item add may be actually an update, I have a sequence of read/change/write for each of the documents. All goes well until when

Re: Problem with near realtime search

2012-08-03 Thread Simon Willnauer
hey harald, if you use a possibly different searcher (reader) than you used for the search you will run into problems with the doc IDs since they might change during the request. I suggest you to use SearcherManager or NRTMangager and carry on the searcher reference when you collect the stored

RE: Analyzer on query question

2012-08-03 Thread Bill Chesky
Jack, Thanks. Yeah, I don't know what you mean be term analysis. I googled it but didn't come up with much. So if that is the preferred way of doing this, a wiki document would be greatly appreciated. I notice you did say I should be doing the term analysis first. But is it wrong to do

RE: Analyzer on query question

2012-08-03 Thread Bill Chesky
Ian, I gave this method a try, at least the way I understood your suggestion. E.g. to search for the phrase cells combine I built up a string like: title:cells combine description:cells combine text:cells combine then I passed that to the queryParser.parse() method (where queryParser is an

Re: Analyzer on query question

2012-08-03 Thread Ian Lea
Bill You're getting the snowball stemming either way which I guess is good, and if you get same results either way maybe it doesn't matter which technique you use. I'd be a bit worried about parsing the result of query.toString() because you aren't guaranteed to get back, in text, what you put

Re: Analyzer on query question

2012-08-03 Thread Jack Krupansky
Bill, the re-parse of Query.toString will work provided that your query terms are either un-analyzed or their analyzer is idempotent (can be applied repeatedly without changing the output terms.) In your case, you are doing the former. The bottom line: 1) if it works for you, great, 2) for

RE: Analyzer on query question

2012-08-03 Thread Bill Chesky
Ian/Jack, Ok, thanks for the help. I certainly don't want to take a cheap way out, hence my original question about whether this is the right way to do this. Jack, you say the right way is to do Term analysis before creating the Term. If anybody has any information on how to accomplish this

Re: Analyzer on query question

2012-08-03 Thread Jack Krupansky
Simon gave sample code for analyzing a multi-term string. Here's some pseudo-code (hasn't been compiled to check it) to analyze a single term with Lucene 3.6: public Term analyzeTerm(Analyzer analyzer, String termString){ TokenStream stream = analyzer.tokenStream(field, new

Re: Analyzer on query question

2012-08-03 Thread Robert Muir
you must call reset() before consuming any tokenstream. On Fri, Aug 3, 2012 at 4:03 PM, Jack Krupansky j...@basetechnology.com wrote: Simon gave sample code for analyzing a multi-term string. Here's some pseudo-code (hasn't been compiled to check it) to analyze a single term with Lucene 3.6:

Re: Analyzer on query question

2012-08-03 Thread Ian Lea
I still don't see what Bill gains by doing the term analysis himself rather than letting QueryParser do the hard work, in a portable non-lucene-version-specific way. -- Ian. On Fri, Aug 3, 2012 at 9:39 PM, Robert Muir rcm...@gmail.com wrote: you must call reset() before consuming any

RE: Analyzer on query question

2012-08-03 Thread Bill Chesky
Thanks for the help everybody. We're using 3.0.1 so I couldn't do exactly what Simon and Jack suggested. But after some searching around I came up with this method: private String analyze(String token) throws Exception { StringBuffer result = new StringBuffer();

Re: Analyzer on query question

2012-08-03 Thread Jack Krupansky
What it buys you is not having to convert the whole complex query to string form, which is not guaranteed to be reparseable for all queries (e.g., AND or -abc as raw terms would be treated as operators), and then parsing it which will turn around and regenerate the same query structure (you

Re: Problem with near realtime search

2012-08-03 Thread Harald Kirsch
Hello Simon, thanks for the information. I really thought that once a docId is assigned it is kept until the document is deleted. The only problem I would have expected are docIds that no longer refer to a document, because it was deleted in the meantime. But this is clearly not the case in

Re: Problem with near realtime search

2012-08-03 Thread Harald Kirsch
Hello Simon, now that I knew what to search for I found http://wiki.apache.org/lucene-java/LuceneFAQ#When_is_it_possible_for_document_IDs_to_change.3F So that clearly explains this issue for me. Many thanks for your help. Harald Am 04.08.2012 07:38, schrieb Harald Kirsch: Hello Simon,