Re: Inconsistent Search Speed

2008-02-28 Thread Daniel Noll
On Thursday 28 February 2008 01:52:27 Erick Erickson wrote: > And don't iterate through the Hits object for more than 100 or so hits. > Like Mark said. Really. Really don't ... Is there a good trick for avoiding this? Say you have a situation like this... - User searches - User sees first N h

Re: Rebuilding Document from index?

2008-02-28 Thread Daniel Noll
On Wednesday 27 February 2008 03:33:53 Itamar Syn-Hershko wrote: > I'm still trying to engineer the best possible solution for Lucene with > Hebrew, right now my path is NOT using a stemmer by default, only by > explicit request of the user. MoreLikeThis would only return relevant > results if I wi

Re: When does QueryParser creates PhraseQueries

2008-02-28 Thread Daniel Noll
On Wednesday 27 February 2008 00:50:04 [EMAIL PROTECTED] wrote: > Looks that this is really hard-coded behaviour, and not Analyzer-specific. The whitespace part is coded into QueryParser.jj, yes. So are the quotes and : and other query-specific things. > I want to search for directories with to

Re: DefaultIndexAccessor

2008-02-28 Thread vivek sar
Thanks Mark. I'll wait for your enhancements in IndexAccessor on the new methods. I use mergeFactor = 100. I've read about the merge factor and it's hard to balance both the read/write optimization. What's the number do you use? Thanks again. -vivek On Thu, Feb 28, 2008 at 7:14 PM, Mark Miller <

Re: DefaultIndexAccessor

2008-02-28 Thread Mark Miller
vivek sar wrote: Mark, Just for my clarification, 1) Would you have indexStop and indexStart methods? If that's the case then I don't have to call close() at all. These new methods would serve as just cleaning up the caches and not closing the thread pool. Yes. This is the approach I agre

Re: How do i get a text summary

2008-02-28 Thread h t
Hi Karl, Where is the introduction of below algorithm? Thanks. "Very simple algorithmic solutions usually involve ranking top senstances by looking at distribution of terms in sentances, paragraphs and the whole document. I implemented something like this a couple of years back that worked fairly w

Re: Vector Space Model: New Similarity Implementation Issues

2008-02-28 Thread h t
Compare with classical VSM, lucene just ignore the denominator (|Q|*|D|) of similarity formula, but it add norm(t,d) and coord(q,d) to calculate the fraction of terms in Query and Doc, so it's a modified implementation of VSM in practice. Do you just want to verify which implementation of VSM in "

Re: DefaultIndexAccessor

2008-02-28 Thread vivek sar
Mark, Just for my clarification, 1) Would you have indexStop and indexStart methods? If that's the case then I don't have to call close() at all. These new methods would serve as just cleaning up the caches and not closing the thread pool. I would prefer not to call close() and init() again if

Re: DefaultIndexAccessor

2008-02-28 Thread Mark Miller
I added the Thread Pool recently, so things did probably work before that. I am certainly willing to put the Thread Pool init in the open call instead of the constructor. As for the best method to use, I was thinking of something along the same lines as what you suggest. One of the decisions

Re: DefaultIndexAccessor

2008-02-28 Thread vivek sar
Mark, Yes, I think that's what precisely is happening. I call accessor.close, which shuts down all the ExecutorService. I was assuming the accessor.open would re-open it (I think that's how it worked in older version of your IndexAccessor). Basically, I need a way to stop (or close) all the I

Re: DefaultIndexAccessor

2008-02-28 Thread Mark Miller
Hey vivek, Sorry you ran into this. I believe the problem is that I had just not foreseen the use case of closing and then reopening the Accessor. The only time I ever close the Accessors is when I am shutting down the JVM. What do you do about all of the IndexAccessor requests while it is in

SOC: Lulu, a Lua implementation of Lucene

2008-02-28 Thread Petite Abeille
A proposal for a Lua entry for the "Google Summer of Code" '08: lu·lu (lū'lū) n. Slang. A remarkable person, object, or idea. A very attractive or seductive looking woman. A Lua implementation of Lucene. Skimpy details bellow: http://svr225.stepx.com:3388/lulu http://lua-users.org/wiki/Goog

Re: DefaultIndexAccessor

2008-02-28 Thread vivek sar
Mark, Some more information, 1) I run indexwriter every 5 mins 2) After every cycle I check if I need to partition (based on the index size) 3) In the partition interface, a) I first call close on the index accessor (so all the searchers can close before I move tha

Re: Vector Space Model: New Similarity Implementation Issues

2008-02-28 Thread Dharmalingam
You can find those variants of the vector space model in this interesting article: http://ieeexplore.ieee.org/iel1/52/12658/00582976.pdf?tp=&isnumber=&arnumber=582976 Now, I got confirmed with you the current nature of Similarity API's will be not easy to quickly realize these variants. Actually

Re: Vector Space Model: New Similarity Implementation Issues

2008-02-28 Thread Grant Ingersoll
FYI: The mailing list handler strips attachments. At any rate, sounds like an interesting project. I don't know how easy it will be for you to implement 7 variants of VSM in Lucene given the nature of the APIs, but if you do, it might be handy to see your changes as a patch. :-) Also not

Re: DefaultIndexAccessor

2008-02-28 Thread vivek sar
Mark, We deployed our indexer (using defaultIndexAccessor) on one of the production site and getting this error, Caused by: java.util.concurrent.RejectedExecutionException at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(Unknown Source) at java.util.conc

Re: Query regarding usage of Lucene - Filtering folders

2008-02-28 Thread Erick Erickson
Sure, but you have to make it happen. The most straight-forward thing I can think of is to index (probably UN_TOKENIZED) the path to the file in a new field when you index the contents. Then you can easily restrict things however you want by including an AND clause with the path fragment you wish

Re: Vector Space Model: New Similarity Implementation Issues

2008-02-28 Thread Dharmalingam
Thanks for your tips. My overall goal is to quickly implement 7 variants of vector space model using Lucene. You can find these variants in the updloaded file. I am doing all these stuffs for a much broader goal: I am trying to recover traceability links from requirements to source code files. I

Re: Vector Space Model: New Similarity Implementation Issues

2008-02-28 Thread Grant Ingersoll
On Feb 28, 2008, at 9:00 AM, Dharmalingam wrote: Thanks for the reply. Sorry if my explanation is not clear. Yes, you are correct the model is based on Salton's VSM. However, the calculation of the term weight and the doc norm is, in my opinion, different from Lucene. If you look at th

RE: Lucene-Highlight words in a searched docs

2008-02-28 Thread Mitchell, Erica
Hi Ravinder Checkout Highlighter.test in lucene-2.3.1\contrib\highlighter\src\test\org\apache\lucene\search\highl ight\ folder -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: 28 February 2008 11:03 To: java-user@lucene.apache.org Subject: Lucene-Highlight words

Re: How do i get a text summary

2008-02-28 Thread Karl Wettin
[EMAIL PROTECTED] skrev: If you want something from an index it has to be IN the index. So, store a summary field in each document and make sure that field is part of the query. And how could one create automatically such a summary? Taking the first 2 lines of a document makes not always much

How to obtain the freq term vector of a field from a remote index ?

2008-02-28 Thread Ariel
Hi folks: I need to know how to get the frequency term vector of a field from a remote index in another host. I know that *IndexSearcher *class has a method named *getIndexReader().getTermFreqVector(idDoc, fieldName) *to know the the term frequency vector of certain field* *but I am using* RemoteS

Re: Indexing source code files

2008-02-28 Thread Ken Krugler
I am working on some sort of search mechanism to link a requirement (i.e. a query) to source code files (i.e., documents). For that purpose, I indexed the source code files using Lucene. Contrary to traditional natural language search scenario, we search for code files that are relevant to a given

Re: Indexing source code files

2008-02-28 Thread Mathieu Lecarme
Dharmalingam a écrit : I am working on some sort of search mechanism to link a requirement (i.e. a query) to source code files (i.e., documents). For that purpose, I indexed the source code files using Lucene. Contrary to traditional natural language search scenario, we search for code files that

Lucene-Highlight words in a searched docs

2008-02-28 Thread Ravinder.Teepiredddy
Hi All, How do we Highlight words in a searched docs. Please give inputs on "rewritten query as the input for the highlighter, i.e. call rewrite() on the query". Thanks, Ravinder DISCLAIMER: This message contains privileged and confidential information and is intended only for an indi

Indexing source code files

2008-02-28 Thread Dharmalingam
I am working on some sort of search mechanism to link a requirement (i.e. a query) to source code files (i.e., documents). For that purpose, I indexed the source code files using Lucene. Contrary to traditional natural language search scenario, we search for code files that are relevant to a given

Re: Vector Space Model: New Similarity Implementation Issues

2008-02-28 Thread Dharmalingam
Thanks for the reply. Sorry if my explanation is not clear. Yes, you are correct the model is based on Salton's VSM. However, the calculation of the term weight and the doc norm is, in my opinion, different from Lucene. If you look at the table given in http://www.miislita.com/term-vector/term-ve

RE: How do i get a text summary

2008-02-28 Thread Donna L Gresh
I think you may want to look into the Highlighter. It allows you to show the "relevant" bits of the document which contributed to the document being matched to the query. It does a pretty good job. Of course it does not create a "summary" but it does give you a good idea of why the document was

Re: How do i get a text summary

2008-02-28 Thread Mathieu Lecarme
[EMAIL PROTECTED] a écrit : If you want something from an index it has to be IN the index. So, store a summary field in each document and make sure that field is part of the query. And how could one create automatically such a summary? Have a look to http://alias-i.com/lingpipe/index.h

RE: How do i get a text summary

2008-02-28 Thread spring
> If you want something from an index it has to be IN the > index. So, store a > summary field in each document and make sure that field is part of the > query. And how could one create automatically such a summary? Taking the first 2 lines of a document makes not always much sense. How does goog

Re: Vector Space Model: New Similarity Implementation Issues

2008-02-28 Thread Grant Ingersoll
Not sure I am understanding what you are asking, but I will give it a shot. See below On Feb 26, 2008, at 3:45 PM, Dharmalingam wrote: Hi List, I am pretty new to Lucene. Certainly, it is very exciting. I need to implement a new Similarity class based on the Term Vector Space Model giv

RE: Query regarding usage of Lucene(Filtering folder)

2008-02-28 Thread Daan de Wit
This sure is possible with Lucene. What you need to do is index the path along with your documents, so you get a field like this: `path: /subfolder/subsubfolder`. Now you can restrict your search to a specific path. Including subfolders in the search can be done by adding a '*' to the path used in