On Thursday 28 February 2008 01:52:27 Erick Erickson wrote:
> And don't iterate through the Hits object for more than 100 or so hits.
> Like Mark said. Really. Really don't ...
Is there a good trick for avoiding this?
Say you have a situation like this...
- User searches
- User sees first N h
On Wednesday 27 February 2008 03:33:53 Itamar Syn-Hershko wrote:
> I'm still trying to engineer the best possible solution for Lucene with
> Hebrew, right now my path is NOT using a stemmer by default, only by
> explicit request of the user. MoreLikeThis would only return relevant
> results if I wi
On Wednesday 27 February 2008 00:50:04 [EMAIL PROTECTED] wrote:
> Looks that this is really hard-coded behaviour, and not Analyzer-specific.
The whitespace part is coded into QueryParser.jj, yes. So are the quotes
and : and other query-specific things.
> I want to search for directories with to
Thanks Mark. I'll wait for your enhancements in IndexAccessor on the
new methods.
I use mergeFactor = 100. I've read about the merge factor and it's
hard to balance both the read/write optimization. What's the number do
you use?
Thanks again.
-vivek
On Thu, Feb 28, 2008 at 7:14 PM, Mark Miller <
vivek sar wrote:
Mark,
Just for my clarification,
1) Would you have indexStop and indexStart methods? If that's the case
then I don't have to call close() at all. These new methods would
serve as just cleaning up the caches and not closing the thread pool.
Yes. This is the approach I agre
Hi Karl,
Where is the introduction of below algorithm? Thanks.
"Very simple algorithmic solutions usually involve ranking top senstances
by looking at distribution of terms in sentances, paragraphs and the
whole document. I implemented something like this a couple of years back
that worked fairly w
Compare with classical VSM, lucene just ignore the denominator (|Q|*|D|) of
similarity formula,
but it add norm(t,d) and coord(q,d) to calculate the fraction of terms in
Query and Doc,
so it's a modified implementation of VSM in practice.
Do you just want to verify which implementation of VSM in "
Mark,
Just for my clarification,
1) Would you have indexStop and indexStart methods? If that's the case
then I don't have to call close() at all. These new methods would
serve as just cleaning up the caches and not closing the thread pool.
I would prefer not to call close() and init() again if
I added the Thread Pool recently, so things did probably work before
that. I am certainly willing to put the Thread Pool init in the open
call instead of the constructor.
As for the best method to use, I was thinking of something along the
same lines as what you suggest.
One of the decisions
Mark,
Yes, I think that's what precisely is happening. I call
accessor.close, which shuts down all the ExecutorService. I was
assuming the accessor.open would re-open it (I think that's how it
worked in older version of your IndexAccessor).
Basically, I need a way to stop (or close) all the I
Hey vivek,
Sorry you ran into this. I believe the problem is that I had just not
foreseen the use case of closing and then reopening the Accessor. The
only time I ever close the Accessors is when I am shutting down the JVM.
What do you do about all of the IndexAccessor requests while it is in
A proposal for a Lua entry for the "Google Summer of Code" '08:
lu·lu (lū'lū) n. Slang.
A remarkable person, object, or idea.
A very attractive or seductive looking woman.
A Lua implementation of Lucene.
Skimpy details bellow:
http://svr225.stepx.com:3388/lulu
http://lua-users.org/wiki/Goog
Mark,
Some more information,
1) I run indexwriter every 5 mins
2) After every cycle I check if I need to partition (based on
the index size)
3) In the partition interface,
a) I first call close on the index accessor (so all the
searchers can close before I move tha
You can find those variants of the vector space model in this interesting
article:
http://ieeexplore.ieee.org/iel1/52/12658/00582976.pdf?tp=&isnumber=&arnumber=582976
Now, I got confirmed with you the current nature of Similarity API's will be
not easy to quickly realize these variants.
Actually
FYI: The mailing list handler strips attachments.
At any rate, sounds like an interesting project. I don't know how
easy it will be for you to implement 7 variants of VSM in Lucene given
the nature of the APIs, but if you do, it might be handy to see your
changes as a patch. :-) Also not
Mark,
We deployed our indexer (using defaultIndexAccessor) on one of the
production site and getting this error,
Caused by: java.util.concurrent.RejectedExecutionException
at
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(Unknown
Source)
at java.util.conc
Sure, but you have to make it happen. The most straight-forward thing I
can think of is to index (probably UN_TOKENIZED) the
path to the file in a new field when you index the contents.
Then you can easily restrict things however you want by
including an AND clause with the path fragment you wish
Thanks for your tips. My overall goal is to quickly implement 7 variants of
vector space model using Lucene. You can find these variants in the
updloaded file.
I am doing all these stuffs for a much broader goal: I am trying to recover
traceability links from requirements to source code files. I
On Feb 28, 2008, at 9:00 AM, Dharmalingam wrote:
Thanks for the reply. Sorry if my explanation is not clear. Yes, you
are
correct the model is based on Salton's VSM. However, the
calculation of the
term weight and the doc norm is, in my opinion, different from
Lucene. If
you look at th
Hi Ravinder
Checkout Highlighter.test in
lucene-2.3.1\contrib\highlighter\src\test\org\apache\lucene\search\highl
ight\ folder
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: 28 February 2008 11:03
To: java-user@lucene.apache.org
Subject: Lucene-Highlight words
[EMAIL PROTECTED] skrev:
If you want something from an index it has to be IN the
index. So, store a
summary field in each document and make sure that field is part of the
query.
And how could one create automatically such a summary?
Taking the first 2 lines of a document makes not always much
Hi folks:
I need to know how to get the frequency term vector of a field from a remote
index in another host.
I know that *IndexSearcher *class has a method named
*getIndexReader().getTermFreqVector(idDoc,
fieldName) *to know the the term frequency vector of certain field* *but I
am using* RemoteS
I am working on some sort of search mechanism to link a requirement (i.e. a
query) to source code files (i.e., documents). For that purpose, I indexed
the source code files using Lucene. Contrary to traditional natural language
search scenario, we search for code files that are relevant to a given
Dharmalingam a écrit :
I am working on some sort of search mechanism to link a requirement (i.e. a
query) to source code files (i.e., documents). For that purpose, I indexed
the source code files using Lucene. Contrary to traditional natural language
search scenario, we search for code files that
Hi All,
How do we Highlight words in a searched docs. Please give inputs on
"rewritten query as the input for the highlighter, i.e. call rewrite()
on the query".
Thanks,
Ravinder
DISCLAIMER:
This message contains privileged and confidential information and is intended
only for an indi
I am working on some sort of search mechanism to link a requirement (i.e. a
query) to source code files (i.e., documents). For that purpose, I indexed
the source code files using Lucene. Contrary to traditional natural language
search scenario, we search for code files that are relevant to a given
Thanks for the reply. Sorry if my explanation is not clear. Yes, you are
correct the model is based on Salton's VSM. However, the calculation of the
term weight and the doc norm is, in my opinion, different from Lucene. If
you look at the table given in
http://www.miislita.com/term-vector/term-ve
I think you may want to look into the Highlighter. It allows you to show
the "relevant" bits of the document which contributed to the document
being matched to the query. It does a pretty good job. Of course it does
not create a "summary" but it does give you a good idea of why the
document was
[EMAIL PROTECTED] a écrit :
If you want something from an index it has to be IN the
index. So, store a
summary field in each document and make sure that field is part of the
query.
And how could one create automatically such a summary?
Have a look to http://alias-i.com/lingpipe/index.h
> If you want something from an index it has to be IN the
> index. So, store a
> summary field in each document and make sure that field is part of the
> query.
And how could one create automatically such a summary?
Taking the first 2 lines of a document makes not always much sense.
How does goog
Not sure I am understanding what you are asking, but I will give it a
shot. See below
On Feb 26, 2008, at 3:45 PM, Dharmalingam wrote:
Hi List,
I am pretty new to Lucene. Certainly, it is very exciting. I need to
implement a new Similarity class based on the Term Vector Space
Model giv
This sure is possible with Lucene. What you need to do is index the path
along with your documents, so you get a field like this: `path:
/subfolder/subsubfolder`. Now you can restrict your search to a specific
path. Including subfolders in the search can be done by adding a '*' to
the path used in
32 matches
Mail list logo