Sample code for blockjoinquery

2012-08-07 Thread Johan Haest
Hi, I've just read the following blog: http://blog.mikemccandless.com/2012/01/searching-relational-content-with .html I've been using Lucene.net for the past months and I really like the functionalities.

Re: Small Vocabulary

2012-08-07 Thread Carsten Schnober
Am 06.08.2012 20:29, schrieb Mike Sokolov: Hi Mike, > There was some interesting work done on optimizing queries including > very common words (stop words) that I think overlaps with your problem. > See this blog post > http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-wo

Re: Small Vocabulary

2012-08-07 Thread Danil ŢORIN
If you do intersection (not join), maybe it make sense to put every thing into 1 index? Just transform your input like "brown fox" into "ADJ:brown| NOUN:fox|" Write a custom tokenizer, some filters and that's it. Of course I'm not aware of all the details, so my solution might not be applicable

Re: Small Vocabulary

2012-08-07 Thread Carsten Schnober
Am 07.08.2012 10:20, schrieb Danil ŢORIN: Hi Danil, > If you do intersection (not join), maybe it make sense to put every > thing into 1 index? Just a note on that: my application performs intersections and joins (unions) on the results, depending on the query. So the index structure has to be r

Re: Small Vocabulary

2012-08-07 Thread Carsten Schnober
Hi Danil, >> Just transform your input like "brown fox" into "ADJ:brown|> payload> NOUN:fox|" > > I understand that this denotes "ADJ" and "NOUN" to be interpreted as the > actual token and "brown" and "fox" as payloads (followed by payload>), right? Sorry for replying to myself, but I've reali

Re: Small Vocabulary

2012-08-07 Thread Danil ŢORIN
I mean "ADJ:brown" as a token and only the as payload, since you probably only use it for some scoring/postprocessing not the actual matching. You can even write a filter that will emit both tokens "ADJ" and "AJD:brown" on same position (so you'll be able to do phrase queries), and still maintain

Re: Small Vocabulary

2012-08-07 Thread Danil ŢORIN
To avoid wildcard queries, you can write a TokenFilter that will create both tokens "ADJ" and "ADJ:brown" in same position. so you can use you index for both lookups without doing wildcard. On Tue, Aug 7, 2012 at 12:31 PM, Carsten Schnober wrote: > Hi Danil, > >>> Just transform your input like

Re: Directory flushing / commit / openIfChanged

2012-08-07 Thread Harald Kirsch
Hello Simon, ok, I'll try this out. Just to be sure. I was after a way to update documents before they are even written to disk, but this seems not to be the Lucene way. From what you propose I understand that this approach tries to keep documents from being written up to the time they need to