Ben,
You do need to use a separate instance of those 3 classes for each
index yes. But this is really something like:
IndexWriter writer = new IndexWriter();
So it's normal code-writing process you don't really have to create
anything new, just use existing Lucene API. As for locking,
Make sure you are not indexing your documents using the compound index
format (default in the newer versions of Lucene). Then you will see
the .frq file. Here is an example from one of Simpy's Lucene indices:
-rw-r--r--1 simpysimpy 629073 Feb 26 13:14 _1ao.frq
Otis
--
http://www.si
Use Luke to peek in your index and find out what really got indexed.
You could also try the extreme case and set that max value to the max
Integer.
Otis
--- "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:
> Hi everyone
>
> I'm having a bizzare problem with a few of the documents here that do
>
You are right.
Since there are C++ and now C ports of Lucene, it would be interesting
to integrate them directly with DBs, so that the RDBMS full-text search
under the hood is actually powered by one of the Lucene ports.
Otis
--- David Spencer <[EMAIL PROTECTED]> wrote:
> Otis Gospodne
]> wrote:
> Wouldn't this leave open file handles? I had a problem where there
> were lots of open file handles for deleted index files, because the
> old searchers were not being closed.
>
> On Fri, 18 Feb 2005 13:41:37 -0800 (PST), Otis Gospodnetic
> <[EMAIL PROTECTED]>
Matt,
Erik and I have some code for this in Lucene in Action, but David
Spencer did this since the book was published:
http://www.lucenebook.com/blog/announcements/more_like_this.html
Otis
--- Matt Chaput <[EMAIL PROTECTED]> wrote:
> Is there a simple, efficient way to compute similarity of
Or you could just open a new IndexSearcher, forget the old one, and
have GC collect it when everyone is done with it.
Otis
--- Chris Lamprecht <[EMAIL PROTECTED]> wrote:
> I should have mentioned, the reason for not doing this the obvious,
> simple way (just close the Searcher and reopen it if a
The most obvious answer is that the full-text indexing features of
RDBMS's are not as good (as fast) as Lucene. MySQL, PostgreSQL,
Oracle, MS SQL Server etc. all have full-text indexing/searching
features, but I always hear people complaining about the speed. A
person from a well-known online boo
Hi Paul,
If I understand your setup correctly, it looks like you are running
multiple threads that create IndexWriter for the ame directory. That's
a "no no".
This section (first hit) describes all various concurrency issues with
regards to adds, updates, optimization, and searches:
http://www
Hi,
lucene.apache.org seems to work now.
Here is the query syntax:
http://lucene.apache.org/queryparsersyntax.html
[] is used as [BEGIN-RANGE-STRING TO END-RANGE-STRING]
Otis
--- Jim Lynch <[EMAIL PROTECTED]> wrote:
> First I'm getting a
>
>
> The requested URL could not be retrieved
Hi,
It's been a while since I've used that feature, but I believe they will
always be in the same order, but I seem to recall that they will be in
the reverse order. Whichever way they come, you can always reverse if
if the other order is better for you. java.util.Collections class has
a number
The QueryParser is analyzing your Field.Keyword (genre field) fields,
because it doesn't know that genre is a Keyword field and should not be
analyzed.
Check section 4.4. here:
http://www.lucenebook.com/search?query=queryparser+keyword
Otis
--- Mike Rose <[EMAIL PROTECTED]> wrote:
> Perhaps
Get and try Lucene 1.4.3. One of the older versions had a bug that was
not deleting old index files.
Otis
--- [EMAIL PROTECTED] wrote:
> Hi,
>
> When I run an optimize in our production environment, old index are
> left in the directory and are not deleted.
>
> My understanding is that an
>
Using different analyzers for indexing and searching is not
recommended.
Your numbers are not even in the index because you are using
StandardAnalyzer. Use Luke to look at your index.
Otis
--- Hetan Shah <[EMAIL PROTECTED]> wrote:
> Hello,
>
> How can one search for a document based on the qu
If you are not married to Java:
http://search.cpan.org/~kilinrax/HTML-Strip-1.04/Strip.pm
Otis
--- sergiu gordea <[EMAIL PROTECTED]> wrote:
> Karl Koch wrote:
>
> >I am in control of the html, which means it is well formated HTML. I
> use
> >only HTML files which I have transformed from XML. No
Adam,
Dawid posted some code that lets you use Carrot2 locally with Lucene,
without the componentized pipe line system described on Carrot2 site.
Otis
--- Adam Saltiel <[EMAIL PROTECTED]> wrote:
> David, Hi,
> Would you be able to comment on coincidentally recent thread " RE: ->
> Grouping Sear
;
> Just wondering:
>
> Is Lucene-in-Action being sold anywhere in Singapore?
>
>
>
> thanks!
>
>
>
> Otis Gospodnetic <[EMAIL PROTECTED]> wrote: Gospodnetiæ
> sounds like Gospodnetich and Eric is Erik :)
>
> Otis
>
> --- John Haxby wrote
Morus,
that description of 3 sets of index files is what I was imagining, too.
I'll have to test and add to the book errata, it seems.
Thanks for the info,
Otis
--- Morus Walter <[EMAIL PROTECTED]> wrote:
> Otis Gospodnetic writes:
> > Hello,
> >
> > Yes, tha
Edwin,
--- Edwin Tang <[EMAIL PROTECTED]> wrote:
> I have three indices really that I search via ParallelMultiSearcher.
> All three
> are being updated constantly. We would like to be able to perform a
> search on
> the indices and have the results reflect the latest documents
> indexed. However,
I don't think there is a direct way to get the number of (unique) terms
in the index, so yes, I think you'll have to loop through TermEnum and
count.
Otis
--- Jonathan Lasko <[EMAIL PROTECTED]> wrote:
> I'm looking for the total number of unique terms in the index. I see
>
> that I can get a T
500 times the original data? Not true! :)
Otis
--- "Xiaohong Yang (Sharon)" <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I agree that Google mini is quite expensive. It might be similar to
> the desktop version in quality. Anyone knows google's ratio of index
> to text? Is it true that Lucene's i
l" ;)
>
> Yes the final three files are: the .cfs (46.8MB), deletable (4
> bytes),
> and segments (29 bytes).
>
> --Leto
>
>
>
> > -Original Message-
> > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
> >
> > Hello,
> >
I discuss this with myself a lot inside my head... :)
Seriously, I agree with Erik. I think this is a business opportunity.
How many people are hating me now and going "shh"? Raise your
hands!
Otis
--- David Spencer <[EMAIL PROTECTED]> wrote:
> This reminds me, has anyone every discuss
Hello,
Yes, that is how optimize works - copies all existing index segments
into one unified index segment, thus optimizing it.
see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space
However, three times the space sounds a bit too much, or I make a
mistake in the book. :)
You sa
Hello Karl,
Grab the source code for Lucene in Action, it's got code that parses
and indexes XML with DOM and SAX. You can see the coverage of that
stuff here:
http://lucenebook.com/search?query=indexing+XML+section%3A7*
I haven't used kXML, but I imagine the LIA code should get you going
quickl
Luke,
Boosting is only one of the factors involved in Document/Query scoring.
Assuming that by applying your boosts to Document A or a single field
of Document A increases the total score enough, yes, that Document A
may have the highest score. But just because you boost a single
Document and no
Karl,
This is completely fine. You can have documents with different fields
in the same index.
Otis
--- Karl Koch <[EMAIL PROTECTED]> wrote:
> Hello all,
>
> perhaps not such a sophisticated question:
>
> I would like to have a very diverse set of documents in one index.
> Depending
> on th
Gospodnetić sounds like Gospodnetich and Eric is Erik :)
Otis
--- John Haxby <[EMAIL PROTECTED]> wrote:
> Otis Gospodnetic wrote:
>
> >I contacted both the US and UK Amazon sites and asked them to fix my
> >last name (the last character in my name has a little slash (no
Hi Luke,
That's not hard with RangeQuery (supported by QueryParser), take a look
at this:
http://www.lucenebook.com/search?query=date+range
The grayed-out text has the section name and page number, so you can
quickly locate this stuff in your ebook.
Otis
P.S.
Do you know if Indigo/Chapters has
Publisher -> Amazon information feed seems to be a fairly manual
process, and Amazon takes a while to update book information on their
site, including prices.
I contacted both the US and UK Amazon sites and asked them to fix my
last name (the last character in my name has a little slash (not an
ac
I don't have a document with chinese characters to verify this, but it
looks right, so I'll add your change to SearchFiles.java.
Thanks,
Otis
--- Eric Chow <[EMAIL PROTECTED]> wrote:
> Search not really correct with UTF-8 !!!
>
>
> The following is the search result that I used the SearchFiles
Hello Simeon,
Heterogenous Documents/indices are OK - check out the second hit:
http://www.lucenebook.com/search?query=heterogenous+different
Otis
--- Simeon Koptelov <[EMAIL PROTECTED]> wrote:
> Hello all. I'm new to lucene and think about using it in my project.
>
> I have prices with dyn
A number of people have tried putting Lucene indices in RDBMS. As far
as I know, all were slower than FSDirectory.
Otis
--- nafise hassani <[EMAIL PROTECTED]> wrote:
> Hi
> I want to know from the performance point of view it
> is better to save lucene indexes in database or use
> them as files
That would be a partial solution. Accents will not be a problem any
more, but if you use an Analyzer than stems tokens, they will not rally
be tokenized properly. Searches will probably work, but if you look at
the index you will see that some terms were not analyzed properly. But
it may be suff
Yes, I remember your email about the large number of Terms. If it can
be avoided and you figure out how to do it, I'd love to patch
something. :)
Otis
--- "Kevin A. Burton" <[EMAIL PROTECTED]> wrote:
> Otis Gospodnetic wrote:
>
> >It would be interesting
Hi Ansi,
If you want the print version, I would guess you could order it from
the publisher (http://www.manning.com/hatcher2) or from Amazon and they
will ship it to you in China. The electronic version (a PDF file) is
also available from the above URL.
I'll ask Manning Publications and see whet
There Kevin, that's what I was referring to, the .tii file.
Otis
--- Paul Elschot <[EMAIL PROTECTED]> wrote:
> On Saturday 22 January 2005 01:39, Kevin A. Burton wrote:
> > Kevin A. Burton wrote:
> >
> > > We have one large index right now... its about 60G ... When I
> open it
> > > the Java V
It would be interesting to know _what_exactly_ uses your memory.
Running under an optimizer should tell you that.
The only thing that comes to mind is... can't remember the details now,
but when the index is opened, I believe every 128th term is read into
memory. This, I believe, helps with inde
If you are hosting the code somewhere (e.g. your site, SF, java.net,
etc.), we should link to them from one of the Lucene pages where we
link to related external tools, apps, and such.
Otis
--- "Safarnejad, Ali (AFIS)" <[EMAIL PROTECTED]> wrote:
> I've written a Chinese Analyzer for Lucene that
Free as in orange juice.
Otis
--- "Ranjan K. Baisak" <[EMAIL PROTECTED]> wrote:
> Otis,
> Thanks for your help. Is nutch a freeware tool?
>
> regards,
> Ranjan
> --- Otis Gospodnetic <[EMAIL PROTECTED]>
> wrote:
>
> > Hi Ranjan,
> >
Hello Ashley,
You can read/search while modifying the index, but you have to ensure
only one thread or only one process is modifying an index at any given
time. Both IndexReader and IndexWriter can be used to modify an index.
The former to delete Documents and the latter to add them. You have
t
Hi Ranjan,
It sounds like you are should look at and use Nutch:
http://www.nutch.org
Otis
--- "Ranjan K. Baisak" <[EMAIL PROTECTED]> wrote:
> I am planning to move to Lucene but not have much
> knowledge on the same. The search engine which I had
> developed is searching some extranet URLs e.g.
This:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/BooleanQuery.TooManyClauses.html
?
You can control that limit via
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/BooleanQuery.html#maxClauseCount
Otis
--- Jerry Jalenak <[EMAIL PROTECTED]> wrote:
> OK.
Hi Kevin,
Stemming is an optional operation and is done in the analysis step.
Lucene comes with a Porter stemmer and a Filter that you can use in an
Analyzer:
./src/java/org/apache/lucene/analysis/PorterStemFilter.java
./src/java/org/apache/lucene/analysis/PorterStemmer.java
You can find more a
No, you can't add documents to an index once you close the IndexWriter.
You can re-open the IndexWriter and add more documents, of course.
Otis
--- Oscar Picasso <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Is it safe to add documents to an IndexWriter that has been closed?
>
> From what I have seen,
The Wiki has some info about Lucene 2.0, but that is all there is about
2.0.
Regarding transactions - have you tried DbDirectory? I believe that
will provide XA support and it won't require Lucene changes.
Otis
--- John Wang <[EMAIL PROTECTED]> wrote:
> Hi:
>
>When is lucene 2.0 schedule
Hello Chetan,
The code that comes with the Lucene book contains a little framework
for indexing rich-text documents. It sounds like you may be able to
use it as-is, and extending it with a parser for Excel files, which we
didn't include in the code (whould we include it in the next edition?).
Wh
Going for the segments file like that is not a recommended practise, or
at least not something I'd recommend. 'segments' file is really
something that a caller should not know anything about. Once day
Lucene may choose to rename the segments file or some such, and the
code that uses this trick wi
We've used PDFBox for Lucene in Action:
http://www.lucenebook.com/search?query=PDFBox
If you download the source code for the book you will get ready to use
code for parsing and indexing PDF files, as well as Word, XML, and RTF.
Otis
--- Vlachogiannis Evangelos <[EMAIL PROTECTED]> wrote:
> H
Hello,
Try:
String searchWrd = "kid \"toy\"" OR "kid \"ball\""
You'll have to use a WhitespaceAnalyzer with that, though, or a custom
Analyzer that doesn't remove the escape character (\).
Otis
--- Karthik N S <[EMAIL PROTECTED]> wrote:
>
>
> Hi Guys.
>
> Apologies.
>
>
>
>
Eh, that exactly :) When I read my emails in reverse order
--- Chris Lamprecht <[EMAIL PROTECTED]> wrote:
> What about a shutdown hook?
>
> Runtime.getRuntime().addShutdownHook(new Thread() {
> public void run() { /* whatever */ }
> });
>
> see also
> http://www.onjava.com/pub/a/onja
I didn't pay full attention to this thread, but it sounds like somebody
may be interested in RuntimeShutdownHook (or some similar name) as a
place to try to release the locks.
Otis
--- Joseph Ottinger <[EMAIL PROTECTED]> wrote:
> On Tue, 11 Jan 2005, Doug Cutting wrote:
>
> > Joseph Ottinger wr
The best place to look is:
./src/java/org/apache/lucene/analysis/standard/StandardTokenizer.jj
You can see it at:
http://cvs.apache.org/viewcvs.cgi/jakarta-lucene/src/java/org/apache/lucene/analysis/standard/
Otis
--- Shawn Konopinsky <[EMAIL PROTECTED]> wrote:
> Hey There,
>
> Wondering wher
Hello,
1) The FAQ has been moved to the Wiki, so feel free to stick it in
there.
2) http://www.lucenebook.com/search?query=unlock
Otis
--- Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
> : I'm getting
> : Lock obtain timed out.
> :
> : I was developing and forgot to close the writer. How do I
Use one index, working with a single index is simpler. Also, once you
pull a Document from Hits object, all Fields are read off of the disk.
There was some discussion about selective Field reading about a week
ago, check the list archives. Also keep in mind Field compression is
now possible (onl
Hello Mariella,
Check out the first hit here:
http://www.lucenebook.com/search?query=sort+tokenize
Otis
--
http://www.simpy.com - save, tag, index, search, and share your links
--- Mariella Di Giacomo <[EMAIL PROTECTED]> wrote:
> Hi ALL,
>
>
> I am using a java class to query an index and ret
Hello,
If you search for India OR Test, you will find both, if you use AND,
you will find none. Lucene can search any text, not just files. It
sounds like you are using Lucene's demo as a real application (not a
good practise). I suggest you take a look at the Resources page on the
Lucene Wiki
Nutch (nutch.org) has a pretty sophisticated infrastructure for
distributed searching, but it doesn't use RemoteSearcher.
Otis
--- Yura Smolsky <[EMAIL PROTECTED]> wrote:
> Hello.
>
> Does anyone know application which based on RemoteSearcher to
> distribute index on many servers?
>
> Yura Smo
The book is $44.95 USD - it's printed on the back cover. Amazon had
the correct price (minus their discount) until recently. They are just
very slow with their site/book info updates, but I'm sure they'll fix
it eventually.
Otis
--- Erik Hatcher <[EMAIL PROTECTED]> wrote:
>
> On Jan 6, 2005,
Hi John,
There is no API for this, but I recall somebody talking about adding
support for this a few months back. I even think that somebody might
have contributed a patch for this. I am not certain about this, but
check the patch queue (link on Lucene site). If there is a patch
there, even if
Hello Bill,
"I feel your pain" ;)
But seriously, there was a QueryParser mess-up in the recent minor
releases. I think this is the first time we've messed up the backward
compatibility in the last ~4 years, I believe. Lucene public API is
very 'narrow', and typically very stable. What we did wi
Any index-modifying operations need to be serializes. Searching is
read-only and can be done in parallel with anything else. See
http://www.lucenebook.com/search?query=concurrent for some hints.
Otis
--- Alex Kiselevski <[EMAIL PROTECTED]> wrote:
>
> Concerning the question about simultaneous
That's the correct place to look and it includes code samples.
Yes, it's a Jar file that you add to the CLASSPATH and use ... hm,
normally programmatically, yes :).
Otis
--- Hetan Shah <[EMAIL PROTECTED]> wrote:
> Has any one used NekoHTML ? If so how do I use it. Is it a stand
> alone
> ja
Hello,
--- mahaveer jain <[EMAIL PROTECTED]> wrote:
> I am looking out to implement sorting in my lucene application. This
> is what my code look like.
>
> I am using StandardAnalyzer() analyzer.
>
> Query query = QueryParser.parse(keyword, "contents", analyzer);
>
> Sort sortCol = new Sort
Replying to lucene-user list.
Yes, term1 AND term2, as well as term1 OR term2 should yield the same
hits.
Otis
--- ABDOU Samir <[EMAIL PROTECTED]> wrote:
> Hello,
>
> Does a query such as give the same hits as for the
> query ?
> Google seems to differentiate the two requests.
>
> Thanks.
Correct.
The self-maintenance you are referring to is Lucene's periodic segment
merging. The frequency of that can be controlled through IndexWriter's
mergeFactor.
Otis
--- aurora <[EMAIL PROTECTED]> wrote:
> > Are not optimized indices causing you any problems (e.g. slow
> searches,
> > high n
WhitespaceAnalyzer will let you have it. It just breaks the input on
spaces.
Otis
--- Jim <[EMAIL PROTECTED]> wrote:
> I've seen some discussion on this and the answer seems to be "write
> your
> own". Hasn't someone already done that by now that would share? I
> really have to be able to i
Most definitely Jetty. I can't believe you're using Tomcat for Rojo!
;)
Otis
--- Erik Hatcher <[EMAIL PROTECTED]> wrote:
> Wrong list.
>
> Though perhaps you should be using Jetty ;)
>
> Erik
>
>
> On Dec 23, 2004, at 4:17 PM, Kevin A. Burton wrote:
>
> > What in the world is up with
I _think_ you'd be better off doing it all at once, but I wouldn't
trust myself on this and would instead construct a small 3-index set
and test, looking at a) maximal disk usage, b) time, and c) RAM usage.
:)
Otis
--- Ryan Aslett <[EMAIL PROTECTED]> wrote:
>
> Hi there, Im about to embark on
For simpy.com I store the full text of web pages in Lucene, in order to
provide full-text web searches. Nutch (nutch.org) does the same. You
can set the maximal number of tokens you want indexed via IndexWriter.
You can also compress fields in the newest version of Lucene (or maybe
just the one
I suspect Martijn really wants that snippet dynamically generated, with
KWIC, as on the lucenebook.com screen shot. Thus, he can't generate
and store the snippet at index time, and has to construct it at search
time.
Otis
--- Mike Snare <[EMAIL PROTECTED]> wrote:
> > But for the other issue on
If you are not tied to Java, see 'unac' at http://www.senga.org/.
It's old, but if nothing else you could see how it works and rewrite it
in Java. And if you can, you can donate it to Lucene Sandbox.
Otis
--- Peter Pimley <[EMAIL PROTECTED]> wrote:
>
> Hi everyone,
>
> The Question:
> In Java
Martijn, have you seen the Highlighter in the Lucene Sandbox?
If you've stored your text in the Lucene index, there is no need to go
back to DB to pull out the blog, parse it, and highlight it - the
Highlighter in the Sandbox will do this for you.
Otis
--- "M. Smit" <[EMAIL PROTECTED]> wrote:
>
Hello,
I think some of these questions my be answered in the jGuru FAQ
> So my question is would it be an overkill to optimize everyday?
Only if lots of documents are being added/deleted, and you end up with
a lot of index segments.
> Is
> there
> any guideline on how often to optimize? E
You don't need to optimize to simulate an incremental update. You just
have to re-open your index with the IndexSearcher to see newly added
documents.
Otis
--- aurora <[EMAIL PROTECTED]> wrote:
> Thanks for the heads up. I'm using Lucene 1.4.2.
>
> I tried to do optimize() again but it has no
Another possibility is that you are using an older version of Lucene,
which was known to have a bug with similar symptoms. Get the latest
version of Lucene.
You shouldn't really have multiple .cfs files after optimizing your
index. Also, optimize only at the end, if you care about indexing
speed
Alex, I think you want this:
+city:London +city:Amsterdam +address:1_street +address:2_street
Otis
--- Alex Kiselevski <[EMAIL PROTECTED]> wrote:
>
> Thanks Morus
> So if I understand right
> If the seqond query is :
> +city(London) +city(Amsterdam) +address(1_street) +address(2_street)
>
>
When searching for phrases, what's important is the position of each
token/word extracted by the Analyzer.
WhitespaceAnalyzer/LowerCaseFilter don't do anything with the
positional information. There is nothing else in your Analyzer?
In any case, the following should help you see what your Analyz
The only place where you have to specify that you are using the
compound index format is on IndexWriter instance. Nothing needs to be
done at search time on IndexSearcher.
Otis
--- Hetan Shah <[EMAIL PROTECTED]> wrote:
> Thanks Chuck,
>
> I now understand why I see only one file. Another quest
Hello,
As Erik already said - that Analyzer is really there to get people
going quickly and as a 'does pretty good' Analyzer. There is no
Analyzer that will work for everyone, and Analyzers are meant to be
custom-made. It looks like you already got that figured out and have
your own Analyzer.
O
The exact disk space usage depends on the number of fields in the index
and on how many of them store the original text. You should also keep
in mind that the call to IndexWriter's optimize() will result in your
index directory size doubling while the optimization is in progress, so
if you want to
There is one case that I can think of where this 'constant' scoring
would be useful, and I think Chuck already mentioned this 1-2 months
ago. For instace, having such scores would allow one to create alert
applications where queries run by some scheduler would trigger an alert
whenever the score i
Note that this really includes some extra steps.
You don't need a temp index. Add everything to a single index using a
single IndexWriter instance. No need to call addIndexes nor optimize
until the end. Adding Documents to an index takes a constant amount of
time, regardless of the index size, b
ry
> 20,000 documents to flush memory structures to disk.
> There doesn't seem to be an equivalent in Lucene.
>
> -- Homam
>
>
>
>
>
>
> --- Otis Gospodnetic <[EMAIL PROTECTED]>
> wrote:
>
> > Hello,
> >
> > There ar
Hello,
There are a few things you can do:
1) Don't just pull all rows from the DB at once. Do that in batches.
2) If you can get a Reader from your SqlDataReader, consider this:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/document/Field.html#Text(java.lang.String,%20java.io.Read
Well, one could always partition an index, distribute pieces of it
horizontally across multiple 'search servers' and use the built-in
RMI-based and Parallel search feature. Nutch uses something similar
for search scaling.
Otis
--- Monsur Hossain <[EMAIL PROTECTED]> wrote:
> > My concern is tha
You can also see 'Books like this' example from here
https://secure.manning.com/catalog/view.php?book=hatcher2&item=source
Otis
--- Bruce Ritchie <[EMAIL PROTECTED]> wrote:
> Christoph,
>
> I'm not entirely certain if this is what you want, but a while back
> David Spencer did code up a 'More L
You can see Flickr-like tag (lookup) system at my Simpy site (
http://www.simpy.com ). It uses Lucene as the backend for lookups, but
still uses a RDBMS as the primary storage.
I find it that keeping the RDBMS and Lucene indices is a bit of a pain
and error prone, so _thin_ storage layer with sim
is, how would I go about submitting a patch?
>
> thanks
>
> -John
>
>
> On Mon, 13 Dec 2004 22:24:12 -0800 (PST), Otis Gospodnetic
> <[EMAIL PROTECTED]> wrote:
> > Hello John,
> >
> > I believe you didn't get any replies to this. What you are
>
Hello John,
I believe you didn't get any replies to this. What you are describing
cannot be done using the public, but maaay (no source code on this
machine, so I can't double-check that) be doable if you use some of the
'internal' methods.
I don't have the need for this, but others might, so
into Lucene index directories and removes
* unwanted files. In its more radical mode, this tool can be used to
* remove all non-Lucene index files from a directory. The other
* option is to remove unused Lucene segment files, should the index
* directory get polluted.
*
* TODO: this tool
Hello,
This is probably due to some bad HTML. The application you are using
is just a demo, and uses a JavaCC-based HTML parser, which may not be
resilient to invalid HTML. For Lucene in Action we developed a little
extensible indexing framework, and for HTML indexing we used 2 tools to
handle H
Guru (I thought my first name was OK until now),
Have you tried using boosts for that? You can boost individual
Document Fields when indexing, and/or you can boost individual
Documents, thus giving some more and some less 'weight', which will
have an effect on the final score.
Otis
--- Gu
Ying,
You should follow this finally block advice below. In addition, I
think you can just close the reader, and it will close the underlying
stream (I'm not sure about that, double-check it).
You are not running out of file handles, though. Your JVM is running
out of memory. You can play with
Hello Garrett,
Share some code, it will be easier for others to help you that way.
Obviously, this would be a huge bug if the problem were within Lucene.
Otis
--- Garrett Heaver <[EMAIL PROTECTED]> wrote:
> Can anyone please explain to my why maxDoc returns 0 when Luke shows
> 239,473
> docume
Hello,
You can use BooleanQuery for that.
Otis
--- Ravi <[EMAIL PROTECTED]> wrote:
>
> Hi
> How do you get all documents in lucene where a particular field
> value
> is in a given list of values (like SQL IN). What kind of Query class
> should I use?
>
> Thanks in advance.
> Ravi.
>
>
> -
Both options are good, and which one you choose depends on which one
you feel more comfortable with, I'd say.
The searcher won't see duplicates or missing documents until it is
reopened. So use a separate IndexSearcher for searching, and
reinstantiate it only after you are completely done with ei
if
> the
> field is not there, correct?
> But then is there a point putting an empty value in it, if an
> application will never search for empty values?
>
>
> thanks
>
> -pedja
>
>
> Otis Gospodnetic said the following on 12/8/2004 1:31 AM:
>
>
Leading wildcard character (*) is not allowed if you use QueryParser
that comes with Lucene. Reason: performance. See many discussions
about this on lucene-user mailing list. Also see the search sytax
document on the Lucene site. What other characters are you having
trouble with?
Otis
--- Sa
There is no need to reindex. However, I also don't quite get what the
problem is :)
Otis
--- Santosh <[EMAIL PROTECTED]> wrote:
> hi,
>
> when I restart the tomcat . the Index is getting corrupted. If I take
> the backup of Index and then restarting tomcat. the Index is not
> working properly.
1 - 100 of 966 matches
Mail list logo