Hey
Dev Guys
Apologies
Can Some body Explain me
Why for an I/P word "TA" to the StopAnalyzer.java returns [ta]
instead of [ta]
"TA" ==> [ta] instead of [ta]
"$125.96 === [125.95] instead of [$125.95]
Is it something wrong I have been missing.
with r
Hey
Dev Guys
Apologies
I have a Quick Problem...
The no of Hits on set of Documents indexed using 1.3-final is not same
on 1.4-final version
[ The only modification done to the src is , I have upgraded my
CustomAnalyzer on basis of StopAnalyzer avaliable in 1.4 ]
Does doing this
I want to search farsi pages so I need a way to index farsi pages.One told me to chang
tokenizer .I do this but it doesn't work.
Ofcourse if there is one I prefere to use thet but i didn't find any yet.
Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
Moving to lucene-user list.
Persian = Farsi? Wha
Hey
Dev Guys
Apologies
Can some body Explain me How to Retrieve All hits avaliable per indexed
document.
To explain in Detail
A Physical Search on Single document would list 3 places for a certain
word occurance,
So if i am suppose to retrieve all the 3 Occurance
Moving to lucene-user list.
Persian = Farsi? What you would need is a Farsi Analyzer, and Lucene
does not come with one, unfortunately. You'll likely have to write it
yourself, or find an existing one.
Otis
--- shafipour elnaz <[EMAIL PROTECTED]> wrote:
> I want to make it to be compatible wit
Hey Kevin,
Not sure if you're aware of it, but you can specify the lock dir, so in
your example, both JVMs could use the exact same lock dir, as long as
you invoke the VMs with the same params. You shouldn't be writing the
same index with more than 1 IndexWriter though (not sure if this was
just
As per 1.3 (or was it 1.4) Lucene migrated to using java.iot.tmpdir to
store the locks for the index.
While under most situations this is save a lot of application servers
change java.io.tmpdir at runtime.
Tomcat is a good example. Within Tomcat this property is set to
TOMCAT_HOME/temp..
Und
Hi!
Thanks, the problem was sovled by using lucene1.4 final.
Regards,
AlexAw
- Original Message -
From: "Zilverline info" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Wednesday, July 07, 2004 10:32 PM
Subject: Re: upgrade from Lucene 1.3 final to 1.4rc3 problem
Thanks. This works fine. I guess I was missing something . I
would have expected this to be a property of Document.
On Jul 7, 2004, at 8:49 PM, Peter M Cipollone wrote:
Bill,
Check
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/
Hits.html#id(int)
Pete
- Original Mess
Bill,
Check
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Hits.html#id(int)
Pete
- Original Message -
From: "Bill Tschumy" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Wednesday, July 07, 2004 9:46 PM
Subject: Deleting a Doc found via a Quer
I must be missing something here, but I can't see an easy way to delete
a Document that has been found via searching. The delete() method of
IndexReader takes a docNum. How do I get the docNum corresponding to
the Document in the Hits?
I tried scanning through all the Documents using IndexRea
On Jul 7, 2004, at 6:24 PM, [EMAIL PROTECTED] wrote:
Hi,
Is there any way to do a PhraseQuery with Wildcards?
No.
This very question came up a few days ago. Look at PhrasePrefixQuery -
although this will be a bit of effort to expand the terms matching the
wildcarded term.
I'd like to
search for
Hi Doug:
Thanks for the response!
The solution you proposed is still a derivative of creating a
dummy document stream. Taking the same example, java (5), lucene (6),
VectorTokenStream would create a total of 11 Tokens whereas only 2 is
neccessary.
Given many documents with many term
Hi,
Is there any way to do a PhraseQuery with Wildcards?
I'd like to
search for:
MyField:"foo bar*"
I thought I could cobble something together
using PhraseQuery and Wildcards but I couldn't get this functionality to work
due to my lack of experience with Lucene.
Is there a way to do
John Wang wrote:
While lucene tokenizes the words in the document, it counts the
frequency and figures out the position, we are trying to bypass this
stage: For each document, I have a set of words with a know frequency,
e.g. java (5), lucene (6) etc. (I don't care about the position, so it
ca
Use org.apache.lucene.analysis.PerFieldAnalyzerWrapper
Here is how I use it:
PerFieldAnalyzerWrapper analyzer = new
org.apache.lucene.analysis.PerFieldAnalyzerWrapper(new MyAnalyzer());
analyzer.addAnalyzer("url", new NullAnalyzer());
try
I have a Lucene Document with a field named Code which is stored
and indexed but not tokenized. The value of the field is ABC5-LB.
The only way I can match the field when searching is by entering
Code:"ABC5-LB" because when I drop the quotes, every Analyzer I've tried
using breaks my
query into C
On Jul 7, 2004, at 3:41 PM, [EMAIL PROTECTED] wrote:
Can you recommend an analyzer that doesn't discard '*' or '/'?
WhitespaceAnalyzer :)
Check the wiki AnalysisParalysis page also.
Erik
-
To unsubscribe, e-mail: [EMAIL PRO
Can you recommend an analyzer that doesn't discard '*' or '/'?
--- Lucene
Users List" <[EMAIL PROTECTED] wrote:
The first thing you'll
want to check is that you are using an Analyzer
> that does not discard that
'*' before indexing. StandardAnalyzer, for
> instance, will discard it.
Check o
The first thing you'll want to check is that you are using an Analyzer
that does not discard that '*' before indexing. StandardAnalyzer, for
instance, will discard it. Check one of Erik Hatcher's articles that
includes a tool that helps you see what your Analyzer does with the any
given text inpu
Hi gurus:
I am trying to be able to control the indexing process.
While lucene tokenizes the words in the document, it counts the
frequency and figures out the position, we are trying to bypass this
stage: For each document, I have a set of words with a know frequency,
e.g. java (5), l
Hi,
I'm trying to search for a term that contains an asterisk.
This
is the field that I indexed:
- new Field("testField", "Hello *foo bar", true,
true, true);
I'm trying to find this document by matching '*foo':
- new
TermQuery(new Term("testField", "*me"));
I've also tried to escap
Doug Cutting wrote:
Julien,
Thanks for the excellent explanation.
I think this thread points to a documentation problem. We should
improve the javadoc for these parameters to make it easier for folks to
In particular, the javadoc for mergeFactor should mention that very
large values (>100) are n
Would it make more sense to use a parameter defining RAM size for the cache rather
than minMergeDocs?
Tuning RAM usage is the real issue here and controlling this by guessing the number of
docs you can
squeeze into RAM is not the most helpful approach. How about a "setMaxCacheSize(int
megabytes
Julien,
Thanks for the excellent explanation.
I think this thread points to a documentation problem. We should
improve the javadoc for these parameters to make it easier for folks to
In particular, the javadoc for mergeFactor should mention that very
large values (>100) are not recommended, sin
Hey y'all again,
Just wondering why the IndexWriter.addIndexes method calls optimize before and after
it starts merging segments together.
We would like to create an addIndexes method that doesn't optimize and call optimize
on the IndexWriter later.
Roy.
--
Otis,
Okay, got it... however we weren't creating new document objects... just
grabbing a document through an IndexReader and calling addDocument on another
index. Would that still work with unstored fields(well, its working for us
since we don't have any unstored fields)?
Thanks a lot!
Roy.
O
It is not surprising that you run out of file handles with such a large
mergeFactor.
Before trying more complex strategies involving RAMDirectories and/or
splitting your indexation on several machines, I reckon you should try
simple things like using a low mergeFactor (eg: 10) combined with a high
A mergeFactor of 5000 is a bad idea. If you want to index faster, try
increasing minMergeDocs instead. If you have lots of memory this can
probably be 5000 or higher.
Also, why do you optimize before you're done? That only slows things.
Perhaps you have to do it because you've set mergeFacto
On Tue, Jul 06, 2004 at 10:44:40PM -0700, Kevin A. Burton wrote:
> I'm trying to burn an index of 14M documents.
>
> I have two problems.
>
> 1. I have to run optimize() every 50k documents or I run out of file
> handles. this takes TIME and of course is linear to the size of the
> index so i
Hey Ype
Apologies .
I would be more intrested in Boost/Weight factor in terms of Query rather
then Fields.
Please explain with example src.
With regards
Karthik
-Original Message-
From: Ype Kingma [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 07, 2004 12:08 PM
To: [EMAI
[EMAIL PROTECTED] wrote:
A colleague of mine found the fastest way to index was to use a RAMDirectory, letting it grow
to a pre-defined maximum size, then merging it to a new temporary file-based index to
flush it. Repeat this, creating new directories for all the file based indexes then perform
a
This is a bug (see posting 'Lockfile Problem Solved'), upgrade to
1.4-final, and you'll be fine
Alex Aw Seat Kiong wrote:
Hi!
I'm using Lucene 1.3 final currently, all things were working fine.
But, after i'm upgraded from Lucene 1.3 final to 1.4rc3 (simply overwrite the
lucene-1.4-final.jar to
I think that the only way to resolve this would be to order your keywords
alphabetically to control the result every single time prior to submitting
your search to Lucene. I don't know if Lucene does this, but I'm fairly
sure that sorting the criteria would be a complex matter.
At 09:05 AM 07/
Hi all,
I've managed to add multi-index searching capability to my code. But one
thing that I have noticed is that Lucene is extremely slow in searching.
For example I have been testing with 2 indexes for the past month or so and
searching them returns results in under 250ms and sometimes even
Hi Guys,
Finally I have sorted the problem of hits score thanks to the great help of
Franck.
I have hit another problem with the boolean operators now.
When I search for "Winston and churchill" i get a set of perfectly
acceptable results.
But when I change the order, "churchill and winston" the r
On Mon, 28 Jun 2004 10:04:40 +0200, Julien Nioche
<[EMAIL PROTECTED]> wrote:
> Hello Drew,
>
> I don't think it's in the FAQ.
>
Julien,
Thanks for the advice, and the in-depth exploration of INDEX_INTERVAL
here and on the developer's list. If I have the opportunity to run
similar benchmarks com
Hello Alex.
I had the similar problem when I've upgraded to Lucene 1.4 rc3 from
1.3 final.
After short investigation, I realized that problem is in the
code of constructor FSDirectory() below:
private FSDirectory(File path, boolean create) throws IOException {
directory = path;
lockDir = new F
Hi Sergiu,
First of all, if your application is web-based, its not necessary to
programmatically construct the query based on user-input (via
MultiFieldQueryParser). you can use luceneQueryConstructor.js in Lucene sandbox.
You can find the documentation here:
http://cvs.apache.org/viewcvs.cgi/*che
A colleague of mine found the fastest way to index was to use a RAMDirectory, letting
it grow
to a pre-defined maximum size, then merging it to a new temporary file-based index to
flush it. Repeat this, creating new directories for all the file based indexes then
perform
a merge into one index o
40 matches
Mail list logo