Hi all,
As part of my diploma thesis I'm starting to work on an information
retrieval
solution for a law and business publisher. Currently I'm trying to define a
flexible and scalable architecture. All the data is present in XML-form and
at the moment simply stored on the file system and
Hello Askar,
Which analyzer are you using for indexing and searching? If you use an analyzer
that uses stemming, you might see that change, changing, changed, chan
etc al get reduced to the same word chan.
In luke you can test with plugins that show you what tokens are created from
your
Hello,
Did take a look at nutch or hadoop or solr? They partially seem to address the
things you describe...About the LSI I am not sure what has been done in those
projects
Regards Ard
Hi, Please help me.
Its been a month since i am trying lucene.
My requirements are huge, i have to
Hi,
There is a Lucene-eXist trigger that allows you to do just that. Take
a look at patch
http://sourceforge.net/tracker/index.php?func=detailaid=1654205group_id=17691atid=317691
Then, from exist, you can search either with XQuery or Lucene syntax.
Patrick
Thomas wrote:
My intention is to
But will it be possible to rename the Field's name inside Lucene Document.
I know its not possible to change the value of the Document's Field but can
we change the field's name.
Any Ideas...
I am totally petrified of googling.
--
View this message in context:
I have run into problems with an error that I am trying to access a
deleted document when doing something along the lines below; my brief
question is, what is necessary to avoid seeing deleted documents? Is an
optimize() necessary? Or will a flush() or close() accomplish the same
thing?
All deletes should be removed after an optimize. Otherwise, I think you'll
have to call isDeleted before trying to access the document. numDocs does
not include deletes, but the document() call will retrieve deletes. You
might try using maxDoc() instead of numDocs().
- Mark
You say there's only one document and you added many. The line
IndexWriter writer = new IndexWriter(indexPath, new StandardAnalyzer(),
true);
blows away any existing index data and starts over. If you're calling
this fragment for each document, you'll always have only one doc. Try
changing the
Well, instead of googling, look at the lucene searchable archive.
It's linked to from the lucene home page. I have no clue whether
what you want is already in there, but there is a wealth of info
there.
Erick
On 7/19/07, miztaken [EMAIL PROTECTED] wrote:
But will it be possible to rename
Thank you-- this is what appeared to be the case, but I wanted to check if
there
was something simple I wasn't understanding--
All deletes should be removed after an optimize. Otherwise, I think
you'll
have to call isDeleted before trying to access the document. numDocs does
not include
Yes, I realized that. Now I have all the documents in the Index. I'll play
around with Luke to see what can stop stemming.
thanks !
AZ
On 7/19/07, Erick Erickson [EMAIL PROTECTED] wrote:
You say there's only one document and you added many. The line
IndexWriter writer = new
Hi,
The score i am getting in DocCollector is raw score... which is not necessary
between 0 and 1.
Where lucene exactly calculating the final score...? Or
what if i want final score in DocCollector ??? How to ???
Regards.
Bhavin pandya
Hello All
I am working on a couple of projects that require some search engine
capabilities.
I came across Lucene and I think that it might be good tool to
incorporate into the project.
I started implementing the software but got some error messages that
prevent me from going further.
I don't think you can using a HitCollector. If you used a TopDocs instead,
you have access to the maximum score and can normalize the
scores to between 0 and 1, but I don't know if that suits your needs.
Erick
On 7/19/07, Bhavin Pandya [EMAIL PROTECTED] wrote:
Hi,
The score i am getting in
Hi erick,
Thanks for your prompt reply...
Let me explain what i m doing
There is lucene query which returns relevant result when i am searching
through Hits object.
But when i m using same query using DocCollector ( I want this way because
want to remove duplicate records at search time
Hey Erik,
How can I change the default Lucene OR property to AND.
When I tried query.toString(), I got
contents:w contents:chan contents: kim
Thats fine, but its doing OR, how can I make it AND so that it shows:
contents: W Chan Kim ??
thanks a ton !
AZ
On 7/19/07, Erick Erickson [EMAIL
QueryParser.setDefaultOperator
On 7/19/07, Askar Zaidi [EMAIL PROTECTED] wrote:
Hey Erik,
How can I change the default Lucene OR property to AND.
When I tried query.toString(), I got
contents:w contents:chan contents: kim
Thats fine, but its doing OR, how can I make it AND so that it
I think it goes without saying that a semi-complex NFA or DFA is going
to be quite a bit slower than say, breaking on whitespace. Not that I am
against such a warning.
To support my point on writing a custom solution that is more exact
towards your needs:
If you just remove the NUM
You get non relevant results because normally a HitCollector will only
collect documents with scores greater than 0.
Hits normalizes raw scores like this:
if (hitDocs.size() min) {
min = hitDocs.size();
}
int n = min * 2;// double # retrieved
TopDocs topDocs = (sort ==
19 jul 2007 kl. 15.48 skrev Yom Chouloute:
My time frame at this moment will not allow me so get the full
grasp of
that software so if anybody on that list would like to do some
contract
work you can contact me at
Hi Yom,
you can find human resources at this page:
On 7/19/07, Mark Miller [EMAIL PROTECTED] wrote:
I think it goes without saying that a semi-complex NFA or DFA is going
to be quite a bit slower than say, breaking on whitespace. Not that I am
against such a warning.
This is true to those very familiar with the code base and the Tokenizer
The source data for my index is already in standard UTF-8 and available as a
simple byte array. I need to do some simple tokenization of the data (check
for whitespace and special characters that control position increment). What
is the most efficient way to index this data and avoid unnecessary
Hi all, I use query (+body:12) (+title:12) , but I got some wrong
message bellow:
java.io.IOException: read past EOF
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:137)
at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
I need to use getTermFreqVector on a subset of docs that belong to the hits for
a query. I understand I need to pass the docNumber as an argument in this case.
How do I access that.
For ex .
doc = hits.doc(0);
TermFreqVector vector = reader.getTermFreqVector(docId, field);
How do I get
19 jul 2007 kl. 22.58 skrev Kevin Chen:
doc = hits.doc(0);
TermFreqVector vector = reader.getTermFreqVector(docId, field);
How do I get docId?
If you use Hits, it is hits.doc()
--
karl
-
To unsubscribe, e-mail: [EMAIL
hits.id() should work.
karl wettin wrote:
19 jul 2007 kl. 22.58 skrev Kevin Chen:
doc = hits.doc(0);
TermFreqVector vector = reader.getTermFreqVector(docId, field);
How do I get docId?
If you use Hits, it is hits.doc()
Hopefully someone will be able to give you some further insight into
this. To me, it looks like a corrupted index. If TermVectors where not
stored, at worst you should be seeing a NullPointerException. Has this
index had anything interesting happen to it? Made with an older version
of Lucene,
Hello, everyone:
doc.add(Field.Unstored(subject, subject, true));
This syntax above is for Lucene1.4
I need the syntax which could do the same work for Lucene2.0
Could you help me?
Thank you very much!
--
View this message in context:
I also have this problem...
Field.Text
Field.Keyword
...
I cannot find this method in lucene2.0 API
-Original Message-
From: savageboy [mailto:[EMAIL PROTECTED]
Sent: 2007年7月20日 9:46 上午好,Daniel
To: java-user@lucene.apache.org
Subject: How to open the term vector storage?
Hello,
: I also have this problem...
: Field.Text
: Field.Keyword
: ...
: I cannot find this method in lucene2.0 API
please see the FAQ How do I get code written for Lucene 1.4.x to work
with Lucene 2.x?
http://wiki.apache.org/lucene-java/LuceneFAQ#head-86d479476c63a2579e867b75d4faa9664ef6cf4d
30 matches
Mail list logo