Hi David:
Can you further explain which calls specically would solve my problem?
Thanks
-John
On Mon, 21 Feb 2005 12:20:15 -0800, David Spencer
<[EMAIL PROTECTED]> wrote:
> John Wang wrote:
>
> > Anyone has any thoughts on this?
>
> Does this help?
>
> ht
Anyone has any thoughts on this?
Thanks
-John
On Wed, 16 Feb 2005 14:39:52 -0800, John Wang <[EMAIL PROTECTED]> wrote:
> Hi:
>
>Is there way to find out given a hit from a search, find out which
> fields contributed to the hit?
>
> e.g.
>
> If my search
Hi:
Is there way to find out given a hit from a search, find out which
fields contributed to the hit?
e.g.
If my search for:
contents1="brown fox" OR contents2="black bear"
can the document founded by this query also have information on
whether it was found via contents1 or contents2 or bot
I think Google mini also includes crawling and a server wrapper. So it
is not entirely an 1-to-1 comparison.
Of couse extending lucene to have those features are not at all
difficult anyway.
-John
On Thu, 27 Jan 2005 16:04:54 -0800 (PST), Xiaohong Yang (Sharon)
<[EMAIL PROTECTED]> wrote:
> Hi,
Hi:
When is lucene 2.0 scheduled to be released? Is there a javadoc
somewhere so we can check out the new APIs?
Is there a plan to add transaction support into lucene? This is
something we need and if we do implement it ourselves, is it too large
of a change for a patch?
Thanks
-John
--
Hi Chuck:
Trying to follow up on this thread. Do you know if this feature
will be incorporated in the next Lucene release?
How would someone find out which patches will go into the next release?
Thanks
-John
On Mon, 15 Nov 2004 13:05:36 -0800, Chuck Williams <[EMAIL PROTECTED]> wrot
Thanks guys for the info!
After looking at the patch code I have two problems:
1) The patch implementation doesn't help with performance. It still
reads the data for every field in the document. Just not storing all
of them. So this implementation helps if there are memory
restrictions, but not i
Hi:
Is there some way to read only 1 field value from an index given a docID?
From the current API, in order to get a field from given a docID, I
would call:
IndexSearcher.document(docID)
which in turn reads in all fields from the disk.
Here is my problem:
After
unaccounted for. Is that due to
thread scheduling/context switching?
Thanks
-John
On Thu, 6 Jan 2005 10:36:12 -0800, John Wang <[EMAIL PROTECTED]> wrote:
> Is the operation IndexSearcher.search I/O or CPU bound if I am doing
> 100's of searches on the same query?
>
> Thank
Is the operation IndexSearcher.search I/O or CPU bound if I am doing
100's of searches on the same query?
Thanks
-John
On Thu, 06 Jan 2005 10:31:49 -0800, Doug Cutting <[EMAIL PROTECTED]> wrote:
> John Wang wrote:
> > 1 thread: 445 ms.
> > 2 threads: 870
I actually ran a few tests. But seeing similar behaviors.
After removing all the possible variations, this is what I used:
1 Index, doccount is 15,000.
Using FSDirectory, e.g. new IndexSearcher(String path), by default I
think it uses FSDirectory.
each thread is doing 100 iterations of search, e
Hi folks:
We are trying to measure thru-put lucene in a multi-threaded environment.
This is what we found:
1 thread, search takes 20 ms.
2 threads, search takes 40 ms.
5 threads, search takes 100 ms.
Seems like under a multi-threaded scenario, thru-put isn't go
one way is to create a reader from a URL to your file:
(Assuming the file is hosted somewhere reachable by an URL)
Reader r=new InputStreamReader(url.getInputStream());
Document doc=new Document();
doc.addField(Field.Keyword("url",url.toString()));
doc.addField(Field.Text("contents",r));
iw.add
Hi:
When is Lucene planning on moving toward java 1.4+?
I see there are some problems caused from the current lock file
implementation, e.g. Bug# 32171. The problems would be easily fixed by
using the java.nio.channels.FileLock object.
Thanks
-John
#x27;ll get access to
> 'internal' methods, of course. If you end up creating this, we could
> stick it in the Sandbox, where we should really create a new section
> for handy command-line tools that manipulate the index.
>
> Otis
>
>
>
>
> --- John Wang
I thought Lucene implements the Boolean model.
-John
On Thu, 9 Dec 2004 00:19:21 +0100, Nicolas Maisonneuve
<[EMAIL PROTECTED]> wrote:
> hi,
> think first of the relevance of the model in this 2 search engine for
> XML document retrieval.
>
> Lucene is classic fulltext search engine using the
Hi folks:
I sent this out a few days ago without a response.
Please help.
Thanks in advance
-John
On Mon, 6 Dec 2004 21:15:00 -0800, John Wang <[EMAIL PROTECTED]> wrote:
> Hi:
>
> Is there a way to finalize delete, e.g. actually remove them from
> the segments
Hi:
Is there a way to finalize delete, e.g. actually remove them from
the segments and make sure the docIDs are contiguous again.
The only explicit way to do this is by calling
IndexWriter.optmize(). But this call does a lot more (also merges all
the segments), hence is very expensive. Is t
We've found something interesting about mergeFactors.
We are indexing a million documents with a batch of 1000.
We first set the mergeFactor to 1000.
What we found is at every 10th commit, we see a significant spike in
indexing time.
The reason is that the indexer is trying to merge the segments
rton, Location - San Francisco, CA
> >AIM/YIM - sfburtonator, Web - http://peerfear.org/
> > GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
> >
> >
> >
> >
> > -
I have also seen this problem.
In the Lucene code, I don't see where the reader speicified when
creating a field is closed. That holds on to the file.
I am looking at DocumentWriter.invertDocument()
Thanks
-John
On Mon, 22 Nov 2004 16:21:35 -0600, Chris Lamprecht
<[EMAIL PROTECTED]> wrote:
>
+0100, Paul Elschot <[EMAIL PROTECTED]> wrote:
> On Wednesday 24 November 2004 00:37, John Wang wrote:
>
>
> > Hi:
> >
> >I am trying to index 1M documents, with batches of 500 documents.
> >
> >Each document has an unique text key, which is added
It looks to me like it scans sequentially
> only within a small buffer window (of size
> SegmentTermEnum.indexInterval) and that it uses binary search otherwise.
> See TermInfosReader.getIndexOffset(Term).
>
> Chuck
>
>
>
> > -Original Message-
>
Hi:
I am trying to index 1M documents, with batches of 500 documents.
Each document has an unique text key, which is added as a
Field.KeyWord(name,value).
For each batch of 500, I need to make sure I am not adding a
document with a key that is already in the current index.
To do this
In my test, I have 12900 documents. Each document is small, a few
discreet fields (KeyWord type) and 1 Text field containing only 1
sentence.
with both mergeFactor and maxMergeDocs being 1000
using RamDirectory, the indexing job took about 9.2 seconds
not using RamDirectory, the indexing job too
Hi folks:
Is there an indexing benchmark somewhere? I see a search
benchmark on the lucene home site.
Thanks
-John
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Hi folks:
How does lucene implement transaction and roll back. E.g. if the
machine crashes (from power outage etc.) in the middle of a write,
e.g. indexWriter.close()? From examining the code, seems that there is
a possibility such crash can cause a corrupted index.
(in segmentInfos, new data
Hi folks:
My application builds a super-index around the lucene index,
e.g. stores some additional information outside of lucene.
I am using my own locking outside of the lucene index via
FileLock object in the jdk1.4 nio package.
My code does the following:
FileLock lock=nu
Hi:
Maybe this has been asked before.
Is there a plan to support ACL check on the documents in lucene?
Say I have a customized ACL check module, e.g.:
boolean ACLCheck(int docID,String user,String password);
And have some sort of framework to plug in something like that.
In general, yes.
By splitting up a large index into smaller indicies, you are
linearizing the search time.
Furthermore, that allows you to make your search distributable.
-John
On Wed, 21 Jul 2004 13:00:28 +1000, Anson Lau <[EMAIL PROTECTED]> wrote:
> Hello guys,
>
> What are some general techni
Hi Eric and Grant:
Thanks for the replies and this is certainly encouraging. As
suggested, I will post furthere such discussions to the dev list.
Thanks
-John
On Tue, 20 Jul 2004 15:37:35 -0400, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> It seems to me the answer to this is not necessari
wrote:
> On Tuesday 20 July 2004 18:12, John Wang wrote:
>
> > They make sure during deployment their "versions"
> > gets loaded before the same classes in the lucene .jar.
>
> I don't see why people cannot just make their own lucene.jar. Just remove
> the
On Tue, 20 Jul 2004 13:40:28 -0400, Erik Hatcher
<[EMAIL PROTECTED]> wrote:
> On Jul 20, 2004, at 12:12 PM, John Wang wrote:
> > There are few things I want to do to be able to customize lucene:
> >
> [...]
> >
> > 3) to be able to customize analyzers to a
EMAIL PROTECTED]> wrote:
> On Tuesday 20 July 2004 17:28, John Wang wrote:
>
> >I have asked to make the Lucene API less restrictive many many many
> > times but got no replies.
>
> I suggest you just change it in your source and see if it works. Then you can
> s
Hi:
I am trying to store some Databased like field values into lucene.
I have my own way of storing field values in a customized format.
I guess my question is wheather we can make the Reader/Writer
classes, e.g. FieldReader, FieldWriter, DocumentReader/Writer classes
non-final?
I have a
Hi:
I am trying to store certain types of document fields in my own
format by intercepting the indexing process. One the same line as the
previous discussions on final modifiers on classes, it would be nice
to be able to extend the FieldWriter, DocumentWriter etc. classes.
Otis suggested
cessful when it
is able to do what its creator never even dreamt of". And I think
Lucene is certainly capable of that.
Just my two cents.
Thanks
-John
On Tue, 13 Jul 2004 09:12:09 -0700, Doug Cutting <[EMAIL PROTECTED]> wrote:
> John Wang wrote:
> >
Hi:
On the same thought, how about the org.apache.lucene.analysis.Token
class. Can we make it non-final?
I sent out this question 3 different times and still got no responses...
Thanks
-John
On Mon, 12 Jul 2004 18:33:04 -0700, Kevin A. Burton
<[EMAIL PROTECTED]> wrote:
> Doug Cutting wrot
I was running into the similar problems with Lucene classes being
final. In my case the Token class. I sent out an email but no one
responeded :(
-John
On Sat, 10 Jul 2004 15:50:28 -0700, Kevin A. Burton
<[EMAIL PROTECTED]> wrote:
> I was going to create a new IDField class which just calls super
Thanks Doug. I will do just that.
Just for my education, can you maybe elaborate on using the
"implement an IndexReader that delivers a
synthetic index" approach?
Thanks in advance
-John
On Thu, 08 Jul 2004 10:01:59 -0700, Doug Cutting <[EMAIL PROTECTED]> wrote:
> John Wang
Hi gurus:
Please forgive some more of my ignorant questions :)
The Token class is declared as final.
The tokenizers and the analyzers I am writing produce token objects
with more information encapsulated than the Token class defined in
lucene.
So it makes sense to me to be able to derive from
would create a total of 11 Tokens whereas only 2 is
> > neccessary.
> >
> >Given many documents with many terms and frequencies, it would
> > create many extra Token instances.
> >
> > The reason I was looking to derving the Field class is because I
> >
tting the frequency. But
> the class is final...
>
> Any other suggestions?
>
> Thanks
>
> -John
>
> On Wed, 07 Jul 2004 14:20:24 -0700, Doug Cutting <[EMAIL PROTECTED]> wrote:
> > John Wang wrote:
> > > While lucene tokenizes the words in
, Doug Cutting <[EMAIL PROTECTED]> wrote:
> John Wang wrote:
> > While lucene tokenizes the words in the document, it counts the
> > frequency and figures out the position, we are trying to bypass this
> > stage: For each document, I have a set of words with a know
Hi gurus:
I am trying to be able to control the indexing process.
While lucene tokenizes the words in the document, it counts the
frequency and figures out the position, we are trying to bypass this
stage: For each document, I have a set of words with a know frequency,
e.g. java (5), l
45 matches
Mail list logo