Hello.
I am using Lucene to submit fuzzy queries against an index. I have noticed
that relevant matches are often retreived, but the scoring is not at all
what I expected.
For example, if my query is "rightches~", a reference to a text file with
the single word "righteous" is returned with a sco
I am following all the points which are mentioned in the following link:
http://wiki.apache.org/lucene-java/LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71
I am having the following issues:
1. For different Queries I give I get a Hits object where there are always
21 documents, but gett
I solved the issue by using:
1.Same Analyser.
2.Making indexing by tokenizing terms.
Now issue with the following code is, I am facing issues which I have pasted
after the code, I searched the forum but couldn't find a relevant post :
QueryParser parser = new QueryParser("Title", analyzer);
Que
Would some sort of caching strategy work? How big is your overall
collection?
Also, lately there have been a few threads on TV (term vector)
performance. I don't recall anyone having actively profiled or
examined it for improvements, so perhaps that would be helpful.
Another thought: co
On Apr 10, 2007, at 8:03 PM, Daniel Einspanjer wrote:
The people reviewing this matching process need some way of
determining why a particular match was made other than the overall
score. Was it because the title was a perfect match or was it because
the title wasn't that close, but the direct
Wow, you are right. I never realized that!
- Original Message
From: Daniel Noll <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, April 10, 2007 8:39:28 PM
Subject: Re: How to update index dynamically
Otis Gospodnetic wrote:
> Anson,
>
> That's not your real code, is
Once again, thank you for your help.
>> We don't really know what your problem is. Explaining that rathern
>> than the solution you have thought of might render a couple of
>> alternate solutions. Perhaps something could be precalculated and
>> stored in the documents. Perhaps feature selection
Otis Gospodnetic wrote:
Anson,
That's not your real code, is it? Those $ characters in it look incorrect.
Are you sure? $ is legal at the front of a variable in Java. :-)
Daniel
--
Daniel Noll
Nuix Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, AustraliaPh: +61 2 9280 0699
Web: http
I asked this question on the Solr user list because that is the
current lucene server implementation I'm using, but I didn't get any
feedback there and the problem isn't really Solr specific so I thought
I'd cross post here just in case any non-Solr users might have some
ideas.
Thank you very muc
On 4/10/07, Walt Stoneburner <[EMAIL PROTECTED]> wrote:
Furthermore syntax like +(-A +B) and -(-A +B) appear to be legal to Luke,
though I have no clue what this even means in simple English.
Let me try:
+(-A +B) -> must match (-A +B) -> must contain B and must not contain A
-(-A +B) -> must
: The problem is the grouping operator ( ) and how it works with distributed
: operators, I don't quite get what the specific transformation rules are.
you shouldnt' think if parens as a groiuping operator, you should think of
it as a way to force the explicit creation of a BooleanQuery object.
: The worse solution is to have another duplicated field which is un-tokenized
: but it is not scalable when we have lots of fields need to be searchable.
That is really the only solution that exists in in Lucene at the moment.
Typically the number of fields people want to sort on isn't that big
I understand what you are trying to say about the problem of sorting a
tokenized field.
The reason why i try to sort a tokenized field is that I need to have a
field to be both sortable and searchable in different time. Searchable field
requires tokenized field while sortable field requires un-
Lucene sorting is intended to sort documents relative to each other.
So it makes no sense to allow sorts on tokenized fields in the
Lucene context. Imagine the separate tokens in a field for doc1 of
a, c and e, and for doc2 b, d and f. Where should doc1 go in
relation to doc2 when sorting on that
My task is to index lots of documents with different fields. Some of the
fields are tokenized and are going to be sorted later on when a list of
result set is need to particular field. Unfortunately, Lucene complains
about sort on a tokenized field.
So is there any way to get around of it?
Thank
Steven Parkes points out:
Lucene doesn't use a pure Boolean algebra, so things don't always do
what one might expect and things like De Morgan's law don't hold.
You're exactly on to what I was pondering about. With boolean logic, I
understand the operators inside and out, so something like De
You can find the list in StopAnalyzer.java:
public static final String[] ENGLISH_STOP_WORDS = {
"a", "an", "and", "are", "as", "at", "be", "but", "by",
"for", "if", "in", "into", "is", "it",
"no", "not", "of", "on", "or", "such",
"that", "the", "their", "then", "there", "these",
Hi,
Where can i find the list of words that is used for removal of
common English words by StopAnlayzer ?
Can i add additional words to the stop list ?
Regards,
--
சாய் Hari
You do need to be careful with this because if a writer commits while
you are copying you can easily get a copy that's unusable (is missing
files).
When you instantiate an IndexReader, it actually holds open most files
that it uses which protects them from being deleted. So in theory if
you coul
> > I'm indexing documents, and some of them are provided in several
> > languages. ... Either, I create
> > languages specific field, either I index the translations in different
> > documents, adding the language field.
> >
> > I choose the second solution, because first, the translated docum
Here is one way to do it:
You can read/open an index at any point, even when it's being modified. You
can then open a new FSDirectory pointing to a new directory and add your
original FSDirectory to that new FSDirectory. That will copy the index. Of
course, any new documents you add to the or
10 apr 2007 kl. 17.58 skrev Chen Li:
Which is interesting that, for some larger files (around 500kb),
only the query term on the top of the file is searchable, once the
term is at the end or after an unknown point of the file, I
couldn't use SearchFiles.java, which also came with demo code
10 apr 2007 kl. 17.48 skrev Sengly Heng:
We don't really know what your problem is. Explaining that rathern
than the solution you have thought of might render a couple of
alternate solutions. Perhaps something could be precalculated and
stored in the documents. Perhaps feature selection (reduct
Hello,
I have a scenario, where we need to set up our application, that uses
Lucene (and has on-demand indexing of documents) in Disaster-recovery
site.
The simple files/attachments used by our application can be simply
copied to the DR site just by syncing (manual copying).
Yes, we can also cop
Hello,
I used demo code(IndexFiles.java) from lucene to index around 100 text
files.
doc.add(new Field("contents", new FileReader(f)));
Which is interesting that, for some larger files (around 500kb), only
the query term on the top of the file is searchable, once the term is at
the end or a
Dear Karl,
Thank you for taking your time in my problem.
We don't really know what your problem is. Explaining that rathern
than the solution you have thought of might render a couple of
alternate solutions. Perhaps something could be precalculated and
stored in the documents. Perhaps feature
Thanks so much Thomas for your prompt reply.
First of all you have to make sure, that you create new Fields, which
you add to a Document, with the appropriate constructor. You have to
specify the usage of term vectors (Field.TermVector.YES):
new Field("text", "your text...", Field.Store.YES,
10 apr 2007 kl. 16.58 skrev Sengly Heng:
I wanted to do this way as well but I am a bit worrying about
computational
time as I have many documents and each document is a bit large.
I am looking for more solutions.
We don't really know what your problem is. Explaining that rathern
than
Hello Sengly
First of all you have to make sure, that you create new Fields, which
you add to a Document, with the appropriate constructor. You have to
specify the usage of term vectors (Field.TermVector.YES):
new Field("text", "your text...", Field.Store.YES,
Field.Index.TOKENIZED,Field.Ter
Hello all,
I would like to extract the term freq vector from the hit results as a total
vector not by document.
I have searched the mailing and I found many have talked about this issue
but I still could not find the right solution to this matter. Everyone just
suggested to look at getTermFreqVe
>Hi,
>I din't get the exact message from this sentence what exactly you want to
>say??
>Can you please brief it with some more sentences???
I believe he meant that in one place (with no error) you have
"E:/eclipse/310307/objtest/crawl-result/indexes/part-0";
but in the other you have
indexDir
You might get some good pointers by searching the mail archive for
"faceted search", or perhaps just "faceted". I vaguely remember that
the whole notion of sub-dividing result sets into bags of documents
was discussed under that heading, quite an extensive discussion
as I remember, and certainly n
Hello
Please excuse my newbiness but I need Lucene to do a simple taks and
I haven't been able to find out how. I just need to search some text
files for a given string, say "brown fox" and get the filename , which
I found out how but I also need the position in that file ( so I can
replace that t
Hi,
I'm indexing documents, and some of them are provided in several languages.
Thanks to this mailing list participants, I know that I have two choices to
index these multiple instances of documents. Either, I create languages
specific field, either I index the translations in different doc
34 matches
Mail list logo