After some research on addShutdownHook it seems that Eclipse terminates program
rather brutally giving neither finalize nor shutdownhook any chance to run.
This is a known bug in Eclipse.
The application I'm writing is a server that keeps a reader and a writer open
at all time. I realized last n
Get code from SVN to get some demo fixes after 2.0.
Otis
- Original Message
From: John john <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Wednesday, July 26, 2006 7:08:59 PM
Subject: Web search demo does not work
Hello,
The web search demo does not work in lucene 2.0 bec
Hello,
The web search demo does not work in lucene 2.0 because it seems that the
files results.jsp use old methods.
Is there a patch or something to fix this? or a file which is ok with lucene
2.0? I'd like to make some tests and I'm not familiar with JSP
Thanks
A document per row is seems correct to me too.
If search would be by msisdn / messageid, - and if, as it seems, these are
keywords, not free text that needs to be analyzed, they both should have
Index.UNTOKENIZED. Also, since no search is to be done by the line content,
the line should have Index.
By the resulted query toString(), boolean query would not work correctly:
qtxt: a foo
[1] Multi Field Query (OR) : (title:a body:a) (title:foo body:foo)
[2] Multi Field Query (AND): +(title:a body:a) +(title:foo body:foo)
[3] Boolean Query : (title:a title:foo) (body:a body:foo)
--> B
Well, I *suppose* you could get the bitset from the pre-existing filter,
copy it to the bitset for your new filter, and play with the bits at the
end. I'm not sure how you get rid of your original filter if you use
CachingWrapperFilter though.
But As "the guys" have pointed out in oth
As Miles said, use the DateTools (lucene) class with a DAY resolution.
That'll give you a MMDD format, which won't blow your query with a
"TooManyClauses" exception...
Remember that Lucene deals with strings, so you want to store things in
easily-manipulated string format, often one that'
I was wondering if there was a nice way to add documents to a cached filter
'manually' as it were.
The reason would be to avoid a complete refresh of the filter, if you
already knew the docids of the extra documents to add.
An example would be if I had a filter based on datetime, which contained
:) I probably did ask...my mind is turning into mush! Hehe
Ok...let me write me an email analyzer.
Thanks!
Michael
Otis Gospodnetic wrote:
You most certainly want to index the whole token, and likely portions of it
(didn't you already ask this a few weeks ago?).
You will want to write y
You most certainly want to index the whole token, and likely portions of it
(didn't you already ask this a few weeks ago?).
You will want to write your own Analyzer + Tokenizer that's
email-address-format-aware and does things like:
emit the whole token
emit the username portion
email the fully q
karl wettin wrote:
On Wed, 2006-07-26 at 16:33 -0400, Michael J. Prichard wrote:
If I want to search an email address (i.e. [EMAIL PROTECTED]) do I need to
Tokenize that field?
Do you want to match on the full address only, or on parts too?
If A, don't tokenize.
If B, tokenize. And
On Wed, 2006-07-26 at 16:33 -0400, Michael J. Prichard wrote:
> If I want to search an email address (i.e. [EMAIL PROTECTED]) do I need to
> Tokenize that field?
Do you want to match on the full address only, or on parts too?
If A, don't tokenize.
If B, tokenize. And write an analyzer that wil
If I want to search an email address (i.e. [EMAIL PROTECTED]) do I need to
Tokenize that field?
doc.add(new Field("from", (String) itemContent.get("from"),
Field.Store.YES, Field.Index.TOKENIZED));
-OR-
doc.add(new Field("from", (String) itemContent.get("from"),
Field.Store.YES, Field.Index
Hi all,
Is there a way to return with a hit, the part of the query that matched
in the corresponding document? I need to return combinations of
documents that together contain the query or a relatively large part of
the query.
Thanks,
Seeta
--
Thanks, Otis. I think the SynonymAnalyzer is the way to go, injecting the
synonyms while removing the stop words.
Andrew
-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 26, 2006 3:19 PM
To: java-user@lucene.apache.org
Subject: Re: Newbie synon
Michael J. Prichard wrote:
Miles Barr wrote:
Michael J. Prichard wrote:
I am working on indexing emails and have stored the data as
milliseconds. I was thinking of using a filter w/ my search that
would only return the email in that data range. I am currently
indexing as follows:
doc.a
Thanks for the suggestions !
I am still talking about Indexing. The 'split the files' implementation was
just an experiment as I wasn't able to get the kind of performance I needed
from making rows as Documents. Now back to using rows as Documents:
As Erick and Jeremy suggested, the MaxBufferedD
Hi Andrew,
There is othing built into Lucene for synonyms, but you can grab the code from
Lucene in Action to see how they can be handled (plus:
http://www.lucenebook.com/search?query=synonyms for some context)
Otis
- Original Message
From: "Lee, Andrew J (CA - Toronto)" <[EMAIL PROTE
When I close my application containing index writers the
lock files are left in the temp directory causing an "Lock obtain
timed out" error upon the next restart.
My guess is that you keep a writer open even though there is no activity
involving adding new documents. Unless I have a massive neve
John Haxby wrote:
Suba Suresh wrote:
Anyone know of good free email libraries I can use for lucene
indexing for Windows Outlook Express and Unix emails??
javamail. Not sure how you get hold of the messages from Outlook
Express, but getting hold of the MIME message in most Unix-based
message
Ok. I will try it. I am a little stupid. When you said go down POP or
IMAP route what did you mean? Is it for Unix/Linux alone that path?
thanks,
suba suresh.
John Haxby wrote:
Suba Suresh wrote:
Anyone know of good free email libraries I can use for lucene indexing
for Windows Outlook Expre
Suba Suresh wrote:
Anyone know of good free email libraries I can use for lucene indexing
for Windows Outlook Express and Unix emails??
javamail. Not sure how you get hold of the messages from Outlook
Express, but getting hold of the MIME message in most Unix-based message
stores is relativel
nope, haven't had to worry about it yet ...
Erick
It feels to me like you're major problem might be file IO with all those
files. There's no need to split the files up first and then index the files.
Just read through the log and index each row. The code fragment you posted
should allow you to get the line back from the "line" field of each
docum
I'm not using a Hits object. I'm using a HitCollector. I was just
curious about whether Field settings could affect search
performance. Any ideas?
Ryan
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands,
On Wed, 2006-07-26 at 16:24 +0200, Björn Ekengren wrote:
> When I close my application containing index writers the
> lock files are left in the temp directory causing an "Lock obtain
> timed out" error upon the next restart.
My guess is that you keep a writer open even though there is no activity
two ideas:
1> store a second field that contains the time resolution you need, and sort
by that. You can still search (quickly) by the day-resolution field.
2> If you KNOW that you are indexing the e-mails in time-order, then sorting
by doc_id will preserve the time ordering.
Erick
Sorry for my late response. It took us some time to run it again. We
increased the memory heap to 1G as you suggested and it works. The
indexer is not crashing. (We are running into some other problem with a
powerpoint file .That is for another email).
The code change with
PDFTextStripper.wri
Are you using a Hits object to iterate over the results? If so, you are
re-executing the query every 100 docs or so under the covers, and if there
are many results, this is very bad.
If this is the case, you want to use a TopDocs or HitCollector to iterate
through the entire result set.
Of cours
Anyone know of good free email libraries I can use for lucene indexing
for Windows Outlook Express and Unix emails??
suba suresh.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Just my 0.02, but I think you are correct in creating one document per line
in your database in order to achieve your desired result.
In my experience, there are a few things that you might do differently :
The MaxBufferedDocs parameter has a huge impact on indexing speed. The
default of 10 is v
Thanks all for the responses. I am very pleasently surprised at the helpful
responses that I am getting.
Okay, I think I still haven't understood Lucene well. I am sure that I am
not solving the problem the right way. So I am explaining the problem at a
very high level here .. please tell me what
Michael J. Prichard wrote:
I guess the more I think about it I don't really care about the
minutes in the initial. All that matters is the date (i.e.
2006-07-25). The only thing I would need the time for would be for
sorting so I need to have that too. Ideas?
Store as much detail as you
Currently, I have field "DOC" which is indexed, but not stored and
not compressed. This is the field that users query. I also have a
field "SYM" which is indexed and stored, but not compressed. For
every document returned in a query, I need its symbol. Can field
types (indexed vs. not i
Miles Barr wrote:
Michael J. Prichard wrote:
I am working on indexing emails and have stored the data as
milliseconds. I was thinking of using a filter w/ my search that
would only return the email in that data range. I am currently
indexing as follows:
doc.add(new Field("date", (String)
Ok, this might have been answered somewhere, but I can't find it so here goes:
When I close my application containing index writers the lock files are left in
the temp directory causing an "Lock obtain timed out" error upon the next
restart. I works of course if I remove the locks manually inbe
Michael J. Prichard wrote:
I am working on indexing emails and have stored the data as
milliseconds. I was thinking of using a filter w/ my search that
would only return the email in that data range. I am currently
indexing as follows:
doc.add(new Field("date", (String) itemContent.get("da
I am working on indexing emails and have stored the data as
milliseconds. I was thinking of using a filter w/ my search that would
only return the email in that data range. I am currently indexing as
follows:
doc.add(new Field("date", (String) itemContent.get("date").toString(),
Field.Store
I think you're back to Karl's suggestion. Implement a HitCollector and
ignore all hits on a group ID after the first one. You even get the most
relevant article in the group that way ...
Best
Erick
headhunter wrote:
I guess the recommended way to implement paging of results is to do your own
query-results caching, right? Or does lucene also do this for me?
The other guys have covered caching of results in a general way, so I
won't go into that.
For a search application I've written I
Hi,
I am working on faceted navigation. This is nothing new but I am
anticpating an index that changes very frequently (every couple of
seconds). After the index has been updated, I need to cache the bit sets
of the facet values so I can do counting during searches later on.
Because I need to get
Unfortunately this is not that easy. Because I must be able to retrieve only
one article and if i index all the content in one document then all the
document will be retrieved instead of the single article.
Chris Hostetter <[EMAIL PROTECTED]> a écrit :
: Then if I search for a word which is pr
Chris Hostetter wrote:
>
> [..]
>
> : In the first case: there is no uneccessary work. Lucene must look at
> : every matching docId in order to determing which docs should be the
> first
> : 10.
> [..]
>
Yes, you are right. Haven't thought of that :)
'Bout the second thing: You're right too.
The only way you might get the performance you want is to have multiple
IndexWriters writing to different indexes and then addAll are the end.
You would obviously have to handle the multi threading and distribution
of the parts of the log to each writer.
Mike
www.ardentia.com the home of NetSearc
: I'm still a little worried about doing uneccesarry work - this is totally
: different from what I know when working with DBMS.
What are you describing as "uneccesarry work" examining every document
even though you only care about the first 10, or re-executing the search
when you want results 11
Chris,
Thanks for this I will have to do it the long hand way, we are trying
to create "search marts" containing a smaller index from a much larger
one, so cloning and deleting will not work.
Thanks
Mike
www.ardentia.com the home of NetSearch
-Original Message-
From: Chris Hostetter
Hello Daniel,
thank you for your answer.
I'm still a little worried about doing uneccesarry work - this is totally
different from what I know when working with DBMS.
Johannes
--
View this message in context:
http://www.nabble.com/Limit-number-of-search-results-tf1998377.html#a5498842
Sent f
On Mittwoch 26 Juli 2006 08:24, headhunter wrote:
> Is it recommended to do the search again - discarding the uninteresting
> values - because lucene caches the results, or just because lucene is so
> damn fast?
Lucene is fast enough in 99% of the cases. Caching is only done by the
operating sys
48 matches
Mail list logo