I think they do a proximity result based on keyword matches. So... If you
search for "lucene" and the document returned has this word at the very
start and the very end of the document, then you will see the two sentences
(sequences of words) surrounding the two keyword matches, one from the st
Xiaohong Yang (Sharon) wrote:
Hi,
I agree that Google mini is quite expensive. It might be similar to the desktop version in quality. Anyone knows google's ratio of index to text? Is it true that Lucene's index is about 500 times the original text size (not including image size)? I don't hav
Hi
Is it hard to implement a function that displays the search results
excerpts similar to Google?
Is it just string manipulations or there are some logic behind it? I
like their excerpts.
Thanks
-
To unsubscribe, e-mail: [EMAI
Jason Polites wrote:
I think everyone agrees that this would be a very neat application of
opensource technology like Lucene... however (opens drawer, pulls out
devil's advocate hat, places on head)... there are several complexities
here not addressed by Lucene (et. al). Not because Lucene isn'
Overall, even if google mini gives a lot of cool features compared to
a bare-born lucene project, what is good with the 50,000 documents
limit. It is useless with that limit. That is just their way of trying
to turn it into another cash cow.
Jian
On Thu, 27 Jan 2005 17:45:03 -0800 (PST), Otis Go
No, the number of occurrences of a term in a Query.
Jonathan
Quoting David Spencer <[EMAIL PROTECTED]>:
> Jonathan Lasko wrote:
>
> > What do I call to get the term frequencies for terms in the Query? I
> > can't seem to find it in the Javadoc...
>
> Do you mean the # of docs that have a ter
Could you work up a self-contained RAMDirectory-using example that
demonstrates this issue?
Erik
On Jan 27, 2005, at 9:10 PM, <[EMAIL PROTECTED]> wrote:
Erik,
I am using the keyword field
doc.add(Field.Keyword("uid", pathRelToArea));
anything else I can check on ?
thanks
atul
PS w
Erik,
I am using the keyword field
doc.add(Field.Keyword("uid", pathRelToArea));
anything else I can check on ?
thanks
atul
PS we worked together for Darden project
>
> From: Erik Hatcher <[EMAIL PROTECTED]>
> Date: 2005/01/27 Thu PM 07:46:40 EST
> To: "Lucene Users List"
> Subjec
I think everyone agrees that this would be a very neat application of
opensource technology like Lucene... however (opens drawer, pulls out
devil's advocate hat, places on head)... there are several complexities here
not addressed by Lucene (et. al). Not because Lucene isn't damn fantastic,
ju
As they say, nothing lasts forever ;)
I like the idea. If a project like this gets going, I think I'd be
interested in helping.
The Google mini looks very well done (they have two demos on the web
page). For $5000, it's probably a very good solution for many
businesses. If the demos are accura
: processes ended. If you're under linux, try running the 'lsof'
: command to see if there are any handles to files marked "(deleted)".
: > Searcher, the old Searcher is closed and nulled, but I
: > still see about twice the amount of memory in use well
: > after the original searcher has been c
500 times the original data? Not true! :)
Otis
--- "Xiaohong Yang (Sharon)" <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I agree that Google mini is quite expensive. It might be similar to
> the desktop version in quality. Anyone knows google's ratio of index
> to text? Is it true that Lucene's i
Have you tried using the multifile index format? Now I wonder if there
is actually a difference in disk space cosumed by optimize() when you
use multifile and compound index format...
Otis
--- "Kauler, Leto S" <[EMAIL PROTECTED]> wrote:
> Our copy of LIA is "in the mail" ;)
>
> Yes the final t
I discuss this with myself a lot inside my head... :)
Seriously, I agree with Erik. I think this is a business opportunity.
How many people are hating me now and going "shh"? Raise your
hands!
Otis
--- David Spencer <[EMAIL PROTECTED]> wrote:
> This reminds me, has anyone every discuss
I just ran into a similar issue. When you close an IndexSearcher, it
doesn't necessarily close the underlying IndexReader. It depends
which constructor you used to create the IndexSearcher. See the
constructors javadocs or source for the details.
In my case, we were updating and optimizing the
I've often said that there is a business to be had in packaging up
Lucene (and now Nutch) into a cute little box with user friendly
management software to search your intranet. SearchBlox is already
there (except they don't include the box).
I really hope that an application like SearchBlox/Zi
Thanks for your reply.
I use QueryParser instead of TermQuery.
And all works good !.
Thanks.
Youngho
- Original Message -
From: "mark harwood" <[EMAIL PROTECTED]>
To:
Sent: Thursday, January 27, 2005 7:05 PM
Subject: Re: text highlighting
> >>sometimes the return Stirng is none.
>
How did you index the "uid" field? Field.Keyword? If not, that may be
the problem in that the field was analyzed. For a key field like this,
it needs to be unanalyzed/untokenized.
Erik
On Jan 27, 2005, at 6:21 PM, <[EMAIL PROTECTED]> wrote:
Hi,
I am trying to delete a document from Lu
I disagree. Most small companies don't have an IT staff capable of implementing
a custom search engine using Lucene for less than $5,000. Nutch might make this
possible, but compared to a plug-in-and-go solution like the Google mini, it
still would probably cost a significant amount of money.
I think Google mini also includes crawling and a server wrapper. So it
is not entirely an 1-to-1 comparison.
Of couse extending lucene to have those features are not at all
difficult anyway.
-John
On Thu, 27 Jan 2005 16:04:54 -0800 (PST), Xiaohong Yang (Sharon)
<[EMAIL PROTECTED]> wrote:
> Hi,
This reminds me, has anyone every discussed something similar:
- rackmount server ( or for coolness factor, that mini mac)
- web i/f for config/control
- of course the server would have the following s/w:
-- web server
-- lucene / nutch
Part of the work here I think is having a decent web i/f to co
Our copy of LIA is "in the mail" ;)
Yes the final three files are: the .cfs (46.8MB), deletable (4 bytes),
and segments (29 bytes).
--Leto
> -Original Message-
> From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
>
> Hello,
>
> Yes, that is how optimize works - copies all existing ind
Hi,
I agree that Google mini is quite expensive. It might be similar to the
desktop version in quality. Anyone knows google's ratio of index to text? Is
it true that Lucene's index is about 500 times the original text size (not
including image size)? I don't have one installed, so I canno
Hello,
Yes, that is how optimize works - copies all existing index segments
into one unified index segment, thus optimizing it.
see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space
However, three times the space sounds a bit too much, or I make a
mistake in the book. :)
You sa
Hi,
I was searching using google and just found that there was a new
feature called "google mini". Initially I thought it was another free
service for small companies. Then I realized that it costs quite some
money ($4,995) for the hardware and software. (I guess the proprietary
software costs a w
Hi,
I am trying to delete a document from Lucene index using:
Term aTerm = new Term( "uid", path );
aReader.delete( aTerm );
aReader.close();
If the variable path="xxx/foo.txt" then I am able to delete the document.
However, if path variable has "-" in the string, the delete me
Just a quick question: after writing an index and then calling
optimize(), is it normal for the index to expand to about three times
the size before finally compressing?
In our case the optimise grinds the disk, expanding the index into many
files of about 145MB total, before compressing down to
Jonathan Lasko wrote:
What do I call to get the term frequencies for terms in the Query? I
can't seem to find it in the Javadoc...
Do you mean the # of docs that have a term?
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#docFreq(org.apache.lucene.index.Term)
T
What do I call to get the term frequencies for terms in the Query? I
can't seem to find it in the Javadoc...
Thanks.
Jonathan
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Peter Hollas wrote:
Currently we can issue a simple search query and expect a response back
in about 0.2 seconds (~3,000 results) with the Lucene index that we have
built. Lucene gives a much more predictable and faster average query
time than using standard fulltext indexing with mySQL. This ho
Kevin A. Burton wrote:
Is there any way to reduce this footprint? The index is fully
optimized... I'm willing to take a performance hit if necessary. Is
this documented anywhere?
You can increase TermInfosWriter.indexInterval. You'll need to re-write
the .tii file for this to take effect. Th
Hello Karl,
Grab the source code for Lucene in Action, it's got code that parses
and indexes XML with DOM and SAX. You can see the coverage of that
stuff here:
http://lucenebook.com/search?query=indexing+XML+section%3A7*
I haven't used kXML, but I imagine the LIA code should get you going
quickl
That's good to know.
I'm indexing on 11 fields (9 keyword, 2 text). The documents themselves are
between 1K to 2K in size.
Is there a point at which IndexSearcher performance begins to fall off? (in
term of # of index records?)
Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne,
Hi,
I want to use kXML with Lucene to index XML files. I think it is possible to
dynamically assign Node names as Document fields and Node texts as Text
(after using an Analyser).
I have seen some XML indexing in the Sandbox. Is anybody here which has done
something with a thin pull parser (perh
Thanks Otis.
- Original Message -
From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
To: "Lucene Users List"
Sent: Thursday, January 27, 2005 12:11 PM
Subject: Re: Boosting Questions
> Luke,
>
> Boosting is only one of the factors involved in Document/Query scoring.
> Assuming that by appl
Luke,
Boosting is only one of the factors involved in Document/Query scoring.
Assuming that by applying your boosts to Document A or a single field
of Document A increases the total score enough, yes, that Document A
may have the highest score. But just because you boost a single
Document and no
Hi All;
I just want to make sure I have the right idea about boosting.
So if I boost a document (Document A) after I index it (lets say a score of
2.0) Lucene will now consider this document relativly more important than
other documents in the index with a boost factor less than 2.0. This boost
f
Make sure that the older searcher is not referenced elsewhere otherwise the
garbage collector should
delete it.
Just remember that the Garbage collector runs when memory is needed but not
immediatly after changing a reference to null.
-Message d'origine-
De : Greg Gershman [mailto:[EMA
"Jerry Jalenak" <[EMAIL PROTECTED]> writes:
> I am in the process of indexing about 1.5 million documents, and have
> started down the path of indexing these by month. Each month has between
> 100,000 and 200,000 documents. From a performance standpoint, is this the
> right approach? This allow
I have an index that is frequently updated. When
indexing is completed, an event triggers a new
Searcher to be opened. When the new Searcher is
opened, incoming searches are redirected to the new
Searcher, the old Searcher is closed and nulled, but I
still see about twice the amount of memory in
I am in the process of indexing about 1.5 million documents, and have
started down the path of indexing these by month. Each month has between
100,000 and 200,000 documents. From a performance standpoint, is this the
right approach? This allows me to use MultiSearcher (or
ParallelMultiSearcher),
Nope,
it is very possible. We have an index that holds the search info for
documents, messages in discussion threads, filled in forms etc. etc.
each having their own structure.
cheers,
Aad
Karl Koch wrote:
Hello all,
perhaps not such a sophisticated question:
I would like to have a very divers
Karl,
This is completely fine. You can have documents with different fields
in the same index.
Otis
--- Karl Koch <[EMAIL PROTECTED]> wrote:
> Hello all,
>
> perhaps not such a sophisticated question:
>
> I would like to have a very diverse set of documents in one index.
> Depending
> on th
Hello all,
perhaps not such a sophisticated question:
I would like to have a very diverse set of documents in one index. Depending
on the inside of text documents, I would like to put part of the text in
different fields. This means in the searches, when searching a particular
field, some of tho
https://lucenerar.dev.java.net
LuceneRAR is now working on two containers, verified: The J2EE 1.4 RI and
Orion. Websphere testing is underway, with JBoss to follow.
LuceneRAR is a resource adapter for Lucene, allowing J2EE components to
look up an entry in a JNDI tree, using that reference to add
>>sometimes the return Stirng is none.
>>Is the code analyzer dependancy ?
When the highlighter.getBestFragments returns nothing
this is because there was no match found for query
terms in the TokenStream supplied.
This is nearly always because of Analyzer issues.
Check the post-analysis tokens pr
More test result
if the text contains ... Family ...
Than
family query string woks OK.
But if the query stirng is Family than the highlighter return none.
Thanks.
Youngho
- Original Message -
From: "Youngho Cho" <[EMAIL PROTECTED]>
To: "Lucene Users List"
Cc: "Che Dong" <[EMAIL PRO
Hello,
When I used the code with CJKAnalyzer and search English Text
(Because the text is mixed with Korean and English )
sometimes the return Stirng is none.
Others works well.
Is the code analyzer dependancy ?
Thanks.
Youngho
--- Test Code ( Just copy of the Book code ) -
Without looking at the source, my guess is that StandardAnalyzer (and
StandardTokenizer) is the culprit. The StandardAnalyzer grammar (in
StandardTokenizer.jj) is probably defined so "x/y" parses into two
tokens, "x" and "y". "s" is a default stopword (see
StopAnalyzer.ENGLISH_STOP_WORDS), so it
Hi Jason ,
yes , the doc'n does mention escaping . but thats only for special
characters used in queries , right ?
but i've tried 'escaping' too.
to answer ur question , am sure it is not HTTP request which is eating it up.
Query query = MultiFieldQueryParser.parse("test/s",
50 matches
Mail list logo