Michael Celona wrote:
My index is changing in real time constantly... in this case I guess this
will not work for me any suggestions...
using a singleton pattern for the your index searcher makes sense anyway
... I don'T think that you change
the index after each search. the computing effor
Karl Koch wrote:
When I switch to Java 1.2, I can also not run it. Also I cannot index
anything. I have no idea why...
Can sombody help me?
I think you are a pioneer in this domain :) . I'm not very familiar with
the lucene source code, but I think it uses the
advantages of java 1.3 and 1.4.
P
Erik Hatcher wrote:
On Feb 8, 2005, at 10:37 AM, sergiu gordea wrote:
Hi Erik,
I'm not changing any functionality. WildcardQuery will still
support leading wildcard characters, QueryParser will still disallow
them. All I'm going to change is the javadoc that makes it sound
like Wil
index our pages.
Best,
Sergiu
-Original Message-
From: sergiu gordea [mailto:[EMAIL PROTECTED]
Sent: Tuesday, February 08, 2005 10:38 AM
To: Lucene Users List
Subject: Re: Starts With x and Ends With x Queries
From what I was reading in the mailing list there are more lucene users
Hi Erik,
I'm not changing any functionality. WildcardQuery will still support
leading wildcard characters, QueryParser will still disallow them.
All I'm going to change is the javadoc that makes it sound like
WildcardQuery does not support leading wildcard characters.
Erik
From what I was
Hi Erick,
"In order to prevent extremely slow WildcardQueries, a Wildcard term
must not start with one of the wildcards * or
?."
I don't read that as saying you cannot use an initial wildcard
character, but rather as if you use a leading wildcard character you
risk performance issues. I'm go
Karl Koch wrote:
I appologise in advance, if some of my writing here has been said before.
The last three answers to my question have been suggesting pattern matching
solutions and Swing. Pattern matching was introduced in Java 1.4 and Swing
is something I cannot use since I work with Java 1.1 on a
/HTML-Strip-1.04/Strip.pm
Otis
--- sergiu gordea <[EMAIL PROTECTED]> wrote:
Karl Koch wrote:
I am in control of the html, which means it is well formated HTML. I
use
only HTML files which I have transformed from XML. No external HTML
(e.g.
the web).
Ar
Karl Koch wrote:
Hello Sergiu,
thank you for your help so far. I appreciate it.
I am working with Java 1.1 which does not include regular expressions.
Why are you using Java 1.1? Are you so limited in resources?
What operating system do you use?
I asume that you just need to index the html files
Kauler, Leto S wrote:
Another very cheap, but robust solution in the case you use linux is to
make lynx to parse your pages.
lynx page.html > page.txt.
This will strip out all html and script, style, csimport tags. And you
will have a .txt file ready for indexing.
Best,
Sergiu
We index the c
Karl Koch wrote:
I am in control of the html, which means it is well formated HTML. I use
only HTML files which I have transformed from XML. No external HTML (e.g.
the web).
Are there any very-short solutions for that?
if you are using only correct formated HTML pages and you are in control
of
Karl Koch wrote:
Hi,
yes, but the library your are using is quite big. I was thinking that a 5kB
code could actually do that. That sourceforge project is doing much more
than that but I do not need it.
you need just the htmlparser.jar 200k.
... you know ... the functionality is strongly correcla
Tim Lebedkov (UPK) wrote:
Hi,
is there a way to make QueryParser accept *term?
yes, if you apply a patch the lucene sources.
Search for "*term search" in lucene archive.
Best,
Sergiu
thank you
--Tim
-
To unsubscribe, e-mail: [E
Hi Karl,
I already submitted a peace of code that removes the html tags.
Search for my previous answer in this thread.
Best,
Sergiu
Karl Koch wrote:
Hello,
I have been following this thread and have another question.
Is there a piece of sourcecode (which is preferably very short and simple
(
Erik Hatcher wrote:
On Feb 1, 2005, at 10:51 AM, Jerry Jalenak wrote:
OK - but I'm dealing with indexing between 1.5 and 2 million
documents, so I
really don't want to 'batch' them up if I can avoid it. And I also
don't
think I can keep an IndexRead open to the index at the same time I
have an
Jingkang Zhang wrote:
>Three HTML parsers(Lucene web application
>demo,CyberNeko HTML Parser,JTidy) are mentioned in
>Lucene FAQ
>1.3.27.Which is the best?Can it filter tags that are
>auto-created by MS-word 'Save As HTML files' function?
>
>
maybe you can try this library...
http://htmlparser
Hi to all lucene developers,
The "read fields selectively" feature would be a very useful for me.
Do you plan to include it in the next lucene realeases?
I can patch lucene, but I will need to do it each time I upgrade my version,
and probably I would need to run the unit tests, and this is just
I think you should change a little bit your plans, and to think that
your goal is to
create a fast search engine not a fast indexing engine.
When you plan to index a lot of documents then it is possible to creata
a lot of segments (if you don't optimize the index)
and the serch will be very slow
Gururaja H wrote:
Hi !
Have two applications. Both are supposed
to write Lucene index files and the WebApplication is supposed to read
these index files.
Here are the questions:
1. Can two applications write index files, in the same directory, at the same time ?
if you implement the synchronis
Paul wrote:
Hi,
how yould you restrict the search results for a certain user? I'm
indexing all the existing data in my application but there are certain
access levels so some users should see more results then an other.
Each lucene document has a field with an internal id and I want to
restrict on
Otis Gospodnetic wrote:
1. yes
2. yes error, meaningful, it depends what you find meaningful :)
3. searcher will still find the document, unless you close it and
reopen it (searcher)
... What about LockException? I tried to index objects in a thread and
to use a IndexSearcher
to search objects,
[EMAIL PROTECTED] wrote:
I am interested in pursuing experienced peoples' understanding as I have half the queue approach developed already.
well I think that experienced people developed lucene :) theyoffered us
the possibility to use multithreading and concurent searching.
Of course .. depen
Luke Shannon wrote:
I like the sound of the Queue approach. I also don't like that I have to
focefully unlock the index.
Personally I don't like the Queue aproach... because I already
implemented multithreading in out application
to improve its performance. In our application indexing is not a
Hetan Shah wrote:
Hello All,
Is it possible to index the documents in incremental fashion. What I
mean by this is, update the document in the index only if it has
changed since last time it was indexed. This can save considerable
amount of time while indexing.
Sure .. but I'm affraid you have t
patch that Otis suggested. If anybody knows
some way in CVS to avoid this problem, I'd love to hear about it.
I hope cvsignore works. I work with ant tasks and also with Eclipse and
I don't have this kind of problems.
Sergiu
Thanks,
Chuck
> -Original Message-
> From: se
Chuck Williams wrote:
Otis, thanks for looking at this. The stack trace of the exception is
below. I looked at the code. It wants to delete every file in the
index directory, but fails to delete the CVS subdirectory entry
(presumably because it is marked read-only; the specific exception is
swal
en you can refactor your code in the future. So ... chose one
solution and implement the first prototype, and keep in mind that your
information is managed by the database, and lucene is just your search
module.
Sergiu
On Thu, 04 Nov 2004 19:01:53 +0100, Sergiu Gordea
<[EMAIL PROTECTED]>
javier muguruza wrote:
Hi Javier,
I think the your optimization should take care of the response time of
search queries. I asume that this is the
variable you need to optimize. Probably it will be a good thing to read
first the lucene benchmarks:
http://jakarta.apache.org/lucene/docs/benchmarks
Bill Janssen wrote:
Thanks to Bill Tschumy, who points out that Lucene 1.4.21 *breaks* the
API exported by 1.4 by removing a parameter from
QueryParser.getFieldQuery(). That means that my
NewMultiFieldQueryParser also breaks, since it overrides that method.
To fix, just remove the Analyzer paramet
Daniel Taurat wrote:
Hi,
I have just another stupid parser question:
There seems to be a special handling of the dash sign "-" different from
Lucene 1.2 at least in Lucene 1.4.RC3
StandardAnalyzer.
From the behaviour you describe I think that the dash sign is removed
from the text by the analyz
Morus Walter wrote:
Bill Janssen writes:
Try to see the behavior if you want to have a single term query
juat something like: "robust" .. and print out the query string ...
Sure, that works fine. For instance, if you have the three default
fields "title", "authors", and "contents",
Bill Janssen wrote:
Try to see the behavior if you want to have a single term query
juat something like: "robust" .. and print out the query string ...
Sure, that works fine. For instance, if you have the three default
fields "title", "authors", and "contents", the one-word search
"robus
Bill Tschumy wrote:
I have a need to search an index for documents that were taken ffrom
particulars files in the filesystem.
Each document in the index has a field named "url" that is created using:
doc.add(Field.Text("url", urlStr));
I understand this is both stored and indexed.
My search w
Bill Janssen wrote:
I'm not sure this solution is very robust
Thanks, but I'm pretty sure it *is* robust. Can you please offer a
specific critique? Always happy to learn and improve :-).
Try to see the behavior if you want to have a single term query
juat something like: "robust
Bill Janssen wrote:
I'm not sure this solution is very robust
I think I already sent an email with a better code...
Sergiu
Thanks to something Doug said when I first opened this discussion, I
went back and looked at my implementation. He said, "Can't we just do
this in getFieldQuery?". Figu
Genty Jean-Paul wrote:
At 17:05 25/10/2004, you wrote:
of course POI, for open source.
There are some commercial products based on POI also.
for WORD consider textmining.org
for XLS, POI does anything you need
for powerpoint there is one commercial (it's about 1000$), but you
can also find some s
Ben Litchfield wrote:
In order to write software that consumes PDF documents you must agree to a
list of conditions. One of those conditions is that permissions specified
by the author of the PDF document are respected.
PDFBox complies with this statement, if there is software that does not
then t
of course POI, for open source.
There are some commercial products based on POI also.
for WORD consider textmining.org
for XLS, POI does anything you need
for powerpoint there is one commercial (it's about 1000$), but you can
also find some source code in archives.
All the best,
Sergiu
[EMAIL P
[EMAIL PROTECTED] wrote:
Hi Iouli,
If you don't think is illegal, you can hack the pdfbox code to remove
the protection ...
Sergiu
PDFbox stumbles also with "class java.io.IOException with message: - You
do not have permission to extract text" in case the doc is copy/print
protected.
I tes
Erik Hatcher wrote:
On Oct 21, 2004, at 5:38 AM, sergiu gordea wrote:
Erik Hatcher wrote:
I don't like the idea of users having to know how a field was
indexed though. That seems to defeat the purpose of a
general-purpose QueryParser.
Erik
I agree that, but maybe lucene should pr
Erik Hatcher wrote:
I don't like the idea of users having to know how a field was indexed
though. That seems to defeat the purpose of a general-purpose
QueryParser.
Erik
I agree that, but maybe lucene should provide some subclasses of
QueryParser that should deal this problems.
I'm just a
Rupinder Singh Mazara wrote:
hi
the basic problem here is that there are data source which contain
a) id, b) text c) title d) authors AND d) subject heading
text, title and authors need to be tokenized
the subject heading can be one or more words,
the subject must be also tokennized, otherw
Erik Hatcher wrote:
On Oct 20, 2004, at 9:55 AM, Aviran wrote:
AFIK if the term "Election 2004" will be between quotation marks this
should
work fine.
No, it won't. The Analyzer will analyze it, and the
WhitespaceAnalyzer would split it into two tokens [Election] and [2004].
This is a tricky s
On Sep 8, 2004, at 6:26 AM, sergiu gordea wrote:
I want to discuss a little problem, lucene doesn't suppor
Chris Fraschetti wrote:
absoultely, limiting the user's query is no problem here. I've
currently implemented the lucene javascript to catcha lot of user
quries that could cause issues.. blank queries, ? or * at the
beginning of query, etc etc... but I couldn't think of a way to
prevent the user fro
Daan Hoogland wrote:
H all,
I try to create different indices using different Analyzer-classes. I
tried standard, german, russian, and cjk. They all produce exactly the
same index file (md5-wise). There are over 280 pages so I expected at
least some differences.
Take a look in the lucene sou
mahaveer jain wrote:
Hi all,
I have implemented lucene search for my documents and db successfully.
Now my problem is, the index i created is indexing to my local disk, i want the index
to be created with reference to my server.
Right now I index C:/tomcat/webapps/jetspeed/document, but I want to
Fred Toth wrote:
Hi Sergiu,
Thanks for your suggestions. I will try using just the
IndexSearcher(String...)
and see if that makes a difference in the problem. I can confirm that
I am doing a proper close() and that I'm checking for exceptions. Again,
the problem is not with the search function, bu
Hi Fred,
That's right, there are many references to this kind of problems in the
lucene-user list.
This suggestions were already made, but I'll list them once again:
1. One way to use the IndexSearcher is to use yopur code, but I don't
encourage users to do that
IndexReader reader = n
Hi Fred,
I think that we can help you if you provide us your code, and the
context in which it is used.
we need to see how you open and close the searcher and the reader, and
what operations are you doing on index.
All the best,
Sergiu
Fred Toth wrote:
Hi,
I have built a nice lucene applicat
Hi Polima,
It seems to me that your query string is not correct ...
(A AND -(B))
AND = "+"
NOT = "-"
In lucene AND and NOT opperators are mapped internal to +/-,
(AND and NOT are supported only because they are comming from natural language)
so ...
A + - (B) makes no sense ...
Sergiu
Polina Litva
ptable in your project, I suggest to try to create a new Analyzer.
I whish you luck,
Sergiu
Regards,
Natarajan.
-Original Message-
From: sergiu gordea [mailto:[EMAIL PROTECTED]
Sent: Tuesday, September 14, 2004 7:38 PM
To: Lucene Users List
Subject: Re: Search PharseQuery
Natarajan
Natarajan.T wrote:
Hi,
Thanks for your response.
For example search keyword is like below...
Language "what is java"
Token 1: language
Token 2: what is java(like google)
Regards,
Natarajan.
Lucene works exaclty as you describe above with a simple correction ...
The analyzer has a list of s
String queryString = "\"waht is java\"";
Query q = QueryParser.parse(queryString, "field", new StandardAnalyzer());
System.out.println(q.toString());
This is enough for starting consult Lucene API for more information
Sergiu
Natarajan.T wrote:
Hi,
Thanks for your mail, that link says only th
jar and in deterministic way produce OutOfMemoryError.
That's all.
Jiri.
-Original Message-----
From: sergiu gordea [mailto:[EMAIL PROTECTED]
Sent: Monday, September 13, 2004 5:16 PM
To: Lucene Users List
Subject: Re: OutOfMemory example
I have a few comments regarding your code ...
1.
I have a few comments regarding your code ...
1. Why do you use RamDirectory and not the hard disk?
2. as John said, you should reuse the index instead of creating it each
time in the main function
if(!indexExists(File indexFile))
IndexWriter writer = new IndexWriter(directory, new
Sta
.
I reckon there has been a discussion (and solution :-) on how to achieve the
functionality you've been
after:
http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]&msgId=1798116
I'm not sure if this would be the same though.
Best regards,
René
Hi all,
I took the code indicated by Rene but I've
René Hackl wrote:
is it a problem if the users will search "coffee OR tea" as a search
string in the case that MultifieldQueryParser is
modifyed as Bill suggested?, and the default opperator is set to AND?
No. There's not a problem with the proposed correction to MFQP. MFQP should
work the wa
René Hackl wrote:
Bill,
Thank you for clarifying on that issue. I missed the...
(title:cutting OR author:cutting) AND (title:lucene OR author:lucene)
...
(title:cutting OR title:lucene) AND (author:cutting OR author:lucene)
Note that this would match even if only "lucene" occurred in t
Hi Bill,
I think that more people wait for this patch of MultifieldIndexParser.
It would be nice if it will be included in the next realease candidate
All the best,
Sergiu
Bill Janssen wrote:
René,
Thanks for your note.
I'd think that if a user specified a query "cutting lucene", with
The class is at the end of the message.
But it hink that a better solution is that one suggested by Rene:
http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]&msgId=1798116
Wermus Fernando wrote:
Bill,
I don't receive any .java. Could you send it again?
Thanks.
-Mensaje original-
Hi all,
I want to discuss a little problem, lucene doesn't support *Term like
queries.
I know that this can bring a lot of results in the memory and therefore
it is restricted.
I think that allowing this kind of search and limiting the amount of
returned results would be
a more usefull aproach
maybe you should encode the html code ...
Patrick Burleson wrote:
Why oh why did you send this to the tomcat lists?
Don't cross post! Especially when the question doesn't even apply to
one of the lists.
Patrick
On Tue, 7 Sep 2004 16:35:35 -0400, hui liu <[EMAIL PROTECTED]> wrote:
Hi,
I have such
Sory,
I send this email to transfer my contacts between Mozilla and
Thunderbird email client.
Sergiu
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Probably it will be a good idea to provide the stack trace of the error
you get.
It's a little bit hard to guess the error in the code you provided.
Sergiu
xuemei li wrote:
hi,all
I am using lucene to search.It works fine before I put the code into the
doPost of servlet.But after that it will th
I have the same problem. Right now I think is not possible to do what
you want by using MultifieldQueryParser.
Right now I iplemented a query normalization for our product, but I
consider that the best way is to take the source code
and to implement:
Query q = MultiFieldQueryParser.parse(line,fi
you have to delete the documents using IndexReader and write the
Documents using IndexWriter,
both of them place a lock on the index file, so ... you cannot work with
both of them in the same time.
(you get errors when you have an opened IndexWriter and try to delete a
document with an IndexWrit
don't fill that this is the right way to solve the problem.
Sergiu
Aviran wrote:
Why don't you just build a new index in a different location and at the end
add the missing documents from the old index to the new one, and then delete
the old index.
Aviran
-Original Message-----
From
Hi all,
I have a question related to reindexing of documents with lucene.
We want to implement the functinality of rebuilding lucene index.
That means I want to delete all documents in the index and to add newer
versions.
All information I need to reindex is kept in the database so that I have
a
t;AND
group:developers" to the user's query. Then you will not have to merge
results.
-Will
-Original Message-----
From: Sergiu Gordea [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 15, 2004 2:58 AM
To: Lucene Users List
Subject: Re: Searching against Database
Hi,
I have a simillar problem. I
Hi again,
I'm thinking to get the list of IDs from the database and the list of
hits from Lucene Index and to create a comparator in order to eliminate the
not permitted Hits from the list.
Which solution do you think is better?
Thanks,
Sergiu
Sergiu Gordea wrote:
Hi,
I have a simillar pr
Hi,
I have a simillar problem. I'm working on a web application in which the
users have different permissions.
Not all information stored in the index is public for all users.
The documents in Index are identified by the same ID that the rows
have in database tables.
I can get the IDs of the
NATARAJAN THILLAI wrote:
Hi Sergiu,
I am Natarajan from India and now I was working search engine
project. I saw u r article in the net
(http://article.gmane.org/gmane.comp.jakarta.poi.user/4851). It's very
nice and useful to me.
I want to Indexing exe file so pls send me your
"com.con
Daniel Naber wrote:
On Tuesday 06 July 2004 10:09, Sergiu Gordea wrote:
Do we have an alternative solution, reasonably simple for this problem?
No, but are you sure that MultifieldQueryParser does the right thing at all?
If someone searches for +a +b the parser will (currently) build
Hi all,
I have a question,
I have an index with more fileds and I have to create conjunctive
queries by default.
So what I'm trying to say is that we develop a project and we provide
search functionality
basing on lucene indexer.
From what I can see, Multifield query parser creates disjunctive q
.
Sergiu
- Original Message -
From: "Sergiu Gordea" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>
Cc: "POI Users List" <[EMAIL PROTECTED]>
Sent: Friday, June 25, 2004 8:42 AM
Subject: Index MSOf
s
and will a better source code.
Congratulations to all people involved in development of the Jakarta
project and it's subprojects,
Sergiu Gordea
Ps: ExeConverteImpl uses an external stand alone application (like
antiwort or pdf2txt) to extract the text.
/* @(#) CWK 1.4 07.06.2004
*
* Copy
77 matches
Mail list logo