to get the string value of a inputstream you can use it to fill a
ByteArrayInputStream and get the content from that;
ByteArrayInputStream bais = new ByteArrayInputStream(inputstream);
System.out.println( new String(bais.getBytes()) );
mvh karl øie
On Friday, Dec 27, 2002, at 07:34 Europe/Oslo
you also want to check out the POI project
http://jakarta.apache.org/poi/index.html
it has office readers that can extract the content as text.
mvh karl øie
On Tuesday, Dec 17, 2002, at 16:00 Europe/Oslo, Diego Gutierrez Alonso
wrote:
Hi, i´d like to index .doc files, but i don´t know how
Sorry, my bad! Didn't read this informative post :-)
mvh karl øie
On Thursday, Nov 21, 2002, at 16:35 Europe/Oslo, Otis Gospodnetic wrote:
Look at CHANGES.txt document in CVS - there is some new stuff in
org.apache.lucene.analysis.ru package that you will want to use.
Get the Lucene from the
Hi i took a look at Andrey Grishin russian character problem and found
something strange happening while we tried to debug it. It seems that
he has avoided the usual "querying with different encoding than
indexed" problem as he can dump out correctly encoded russian at all
points in his applica
I have a index that is compiled each night that indexes 1,3gb with XML
data that results into a 1,4gb index. The index takes about 11 hours to
build on a dual 700mhz xeon processor with 768mb of ram. The index
contains 4.388.730 documents and 953.632 terms.
Mvh karl øie
On Thursday, Nov 21
irectory?
the org.apache.lucene.index.IndexReader class contains a delete()
function to delete documents from lucene. But as said before, if your
index is big it's best not to delete the documents just because a
client goes offline, its better to filter out the hits.
mvh karl øie
--
To
s you should store them into a byte or char array in
a file or database.
mvh karl øie
On Monday, Nov 18, 2002, at 03:24 Europe/Oslo, Vinay Kakade wrote:
Hi
I am trying to use RAMDirectory to store the input
HTML documents which are used to create index by the
IndexHTML demo program, but I am f
cocoon/components/search/
Crawler implementation:
>
http://cvs.apache.org/viewcvs.cgi/xml-cocoon2/src/java/org/apache/
cocoon/components/crawler/
This impl is indexing XML, but the principe is the same...
mvh karl øie
On Monday, Nov 4, 2002, at 14:29 Europe/Oslo, Friaa Nafaa wrote:
ring = new String(querystring.getBytes("ISO-8859-1"));
...
mvh karl øie
On søndag, okt 13, 2002, at 14:15 Europe/Oslo, Chris Davis wrote:
> To Dominator,
>
> Where you able to solve the display problem as well? I am having a
> similiar problem with documents that co
if you still have problems, take a look at this note found in the
newest tomcat release... it might help.
mvh karl øie
> ---
> Linux and Sun JDK 1.2.x - 1.3.x:
> ---
>
> Virtual machine crashes can be experienced whe
to re-encode the query in UTF-8/16:
String querystring = argv[0]; ' String querystring =
httprequest.getParameter("query");
querystring = new String(querystring.getBytes("UTF-8"));
...
this fixed my norwegian/samii problems...
mvh karl øie
On mandag, okt 7, 2002, at 13:
hen it comes to thread
performance?
there is also a 1.3 jvm from a group called "blackdown" that is free
and optimized for linux. there was some talking in the news about it
being very good at threading... you could try it.. (
http://www.blackdown.org/ )
mvh karl øie
On onsdag, okt
Try to run your vm in classic mode "java -classic" to disable the
hotspot features...
mvh karl øie
On tirsdag, okt 1, 2002, at 18:16 Europe/Oslo, Stas Chetvertkov wrote:
> Hi All,
>
> I am building a search engine based on Lucene. Recently I created a
> test
>
it works :-) when i see this i understand that the term being parsed by
the queryparser is sent trough the analyzer as well... thanks!
mvh karl øie
On torsdag, sep 26, 2002, at 18:44 Europe/Oslo, Doug Cutting wrote:
> karl øie wrote:
>> I have a Lucene Document with a field named
Hm.. a misunderstanding: i don't create the field with the value
"POST?" i create it with "POST". "element:POST?" or "element:POST*" are
the strings i send to the QueryParser for searching.
mvh Karl Øie
On torsdag, sep 26, 2002, at 14:13 Europe/
ering
"element:POST?" or "element:POST*" in the QueryParser class.
Have anyone here run into this problem?
I am using the 1.2 release version of Lucene.
Mvh Karl Øie
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
thank you, that works! :-) and saves my day!
mvh karl øie
-Original Message-
From: Terry Steichen [mailto:[EMAIL PROTECTED]]
Sent: 10. august 2002 18:29
To: Lucene Users List; [EMAIL PROTECTED]
Subject: Re: Problems understanding RangeQuery...
Hi Karl,
I have discovered that with
Hi, i have a problem with understanding RangeQueries in Lucene-1.2:
I have created an index with posts that has the field W_PUBLISHING_YEAR
which contains the year of publishing. After indexing i loop through
the terms and finds that i have the following terms present in the index:
1923,192
your implementation of this as i find this area to
be the only weak point in lucene.
mvh karl øie
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
ld think
crash/recovery/rollback functionality to benefit lucene greatly.
I have indexes that uses 5 days to build, and it's really bad to receive
exceptions during a long index run, and no recovery/rollback functionality.
Mvh Karl Øie
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
oh, i see. i was misleaded by the Bean part of the SearchBean... im sorry! :-)
Anyhow, if it is not a Statefull SessionBean you are not restricted by EJB
rules and can thus serialize everything you want to disk or db...
mvh karl øie
On Wednesday 03 July 2002 17:20, Otis Gospodnetic wrote
; persistence, there should be no problem storing it in memory to serve
> subsequent requests. I just can't figure out how to modify the SearchBean
> code to do this. I seemed like it would be simple, but try as I might,
> nothing has so far worked.
>
> Regards,
>
>
if the array is of a serializable sort, just store it in a sql table !?!
mvh karl øie
On Wednesday 03 July 2002 16:22, Terry Steichen wrote:
> I'm using Peter's SearchBean code to sort search results. It works fine,
> but it creates the sorting field array from scratch with
in the end: is there a reason why lucene doesn't use java interfaces for
eh. interfaces like the Query class?
mvh karl øie
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
under development
mvh karl øie
On Wednesday 29 May 2002 11:48, Rama Krishna wrote:
> Hi,
>
> I am trying to build a search engine which search in MS Word, excel, ppt
> and adobe pdf. I am not sure whether i can use Lucene for this or not. pl.
> help me out in this regard.
>
>
you better test it, it does not handle slavic and urgic characters well, but i
don't know where the problems lies
mvh karl øie
On Tuesday 28 May 2002 10:52, jamin rubio wrote:
> Hello,
>
> I have a newbie question ? Is lucene fully unicode compliant ?
>
> Thanks
even better, remove this standard scare-monger from the bottom of your emails,
(sic) corporate busllshit...
mvh karl øie
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
please note that the "license" in the email from the symbian employee actually
tries to inciminate you just by replying to him!!!
mvh karl øie
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
es so it should performe good
anyhow
happy hacking!
mvh karl øie
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
what language are you trying to use lucene with?
mh karl øie
On Tuesday 30 April 2002 18:57, Hyong Ko wrote:
> Hello,
>
> I think there's something wrong with the QueryParser.jj file. I downloaded
> lucene-1.2-rc4-src and compiled successfully with JAVA_UNICODE_
memory while indexing and merging, so checking
the systems free memory is easier that trying to calculate memoryusage
mvh karl øie
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
never experienced a failure while merging a
RAMDir into a FSDir regardless of size, so it's my systems memory that is the
problem
mvh karl øie
On Friday 26 April 2002 15:33, petite_abeille wrote:
> >> Thanks. What's is your heuristic to flush the RAMDirectory?
> >
lush the RAMDirectory?
please explain this because i don't understand english that good :-(
mvh karl øie
On Friday 26 April 2002 14:23, petite_abeille wrote:
> > using a RAMDir as a middle man solved my problems...
>
> Thanks. What's is your heuristic to flush the RAMDirector
xing large documents
with many fields
using a RAMDir as a middle man solved my problems...
mvh karl øie
On Friday 26 April 2002 13:54, petite_abeille wrote:
> Hello,
>
> I'm starting to wander how "bullet proof" are Lucene indexes? Do they
> get corrupted easely?
it's actually the IndexReader, not the IndexWriter...
happy hacking!
On Wednesday 24 April 2002 15:27, Tim Tschampel wrote:
> How do you delete a document from the index?
> I see in the FAQ to user IndexWriter.delete(Term), however I don't see
> this in the current API JavaDocs, and don't hav
hm... this looks very interesting! if it is a perl exe you can just copy the
text into a temp file and run the per exe on that file and redirect the
output to another tmp file. then read the file and use the result in a lucene
keyword.
mvh karl øie
On Wednesday 24 April 2002 13:46, [EMAIL
combined with that you could use an italian stop-word list to run statistics
on a page :-) ?!?
On Wednesday 24 April 2002 11:02, [EMAIL PROTECTED] wrote:
> Hi all,
>
> I'm using Jobo for spidering web sites and lucene for indexing. The
> problem is that I'd like spidering only Italian web site
tract the links and process each of these links in the same manner. for
this you will need a html parser..
happy hacking!
mvh karl øie
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
n for this is that i have never encountered "Too many open files"
when indexing clean text into one large field, but when creating many-many
fields as required by indexing xml i got a "Too many open files" until i had
to use a ram-dir to index document batches into..
mvh karl
thank you! i actually ran into this today when i buildt a index with crond as
root and found that even my own user could read the index, lucene couldn't.
:-D
mvh karl øie
On Friday 05 April 2002 15:15, you wrote:
> Hi,
> after some trial with Lucene, I discovered it doesn&
could you put up the source? i would really appreciate it.
mvh karl øie
On Wednesday 03 April 2002 21:27, you wrote:
> I am doing some testing on managing the underlying data in a zip archive
> and found that there is about a 15ms hit to use a zip vs. grabbing directly
> from fi
generate this stream, equally on insert we must accept the stream and
break it up into keys...
is it possible to "intercept" lucene's work at the key-handling point? or
would this require a larger rewrite?
mvh karl øie
On Wednesday 03 April 2002 16:55, you wrote:
> >
indexes i have
experienced that BerkeleyDB runs circles around any SQL database (including
db2 and oracle!!!).
Berkeley has a java-api and a b-tree record type that could be a very good
match for a key-based searchtree, and it's free. take a look at it!
mvh karl øie
(ps: i am not payed by
reads that indexes into its
own sepparate ramdir, then flushes these ramdirs into each separate fsdir
(hench i have a fsdir for each workerthread), this because you can only write
to a dir by one thread.
in the end this imporved my indexing time a lot...
hope some of this can help you!
mvh kar
100 files. This made
me work around both "out of memory" and "too many files" exceptions...
mvh karl øie
-Original Message-
From: Paul Friedman [mailto:[EMAIL PROTECTED]]
Sent: 28. februar 2002 21:38
To: Lucene Users List
Subject: Re: optimizing index - too many
.org/viewcvs/jakarta-lucene/src/demo/org/apache/lucene/demo
/SearchFiles.java?rev=1.1&content-type=text/vnd.viewcvs-markup
mvh karl øie
-Original Message-
From: Parag Dharmadhikari [mailto:[EMAIL PROTECTED]]
Sent: 19. februar 2002 10:12
To: lucene-user
Subject: How to do web sea
Yes it does support boolean queries, you can read about its features here:
http://jakarta.apache.org/lucene/docs/index.html
mvh karl
-Original Message-
From: Biswas, Goutam_Kumar [mailto:[EMAIL PROTECTED]]
Sent: 19. februar 2002 14:18
To: Lucene-User (E-mail)
Subject: does lucene suppo
ow!... sorry to bother you!
mvh karl øie
-Original Message-----
From: Karl Øie [mailto:[EMAIL PROTECTED]]
Sent: 28. januar 2002 18:16
To: [EMAIL PROTECTED]
Subject: strange search problems(cannot query for more than the first
1 words!?!)
I have created a testclass for working with An
urns 22 i can not
say...
mvh karl øie
-Original Message-
From: Jan Stövesand [mailto:[EMAIL PROTECTED]]
Sent: 20. desember 2001 12:36
To: Lucene Users List
Subject: Strange Results with German Analyzer
Hi,
I used a German Analyzer for Indexing and Searching. afaik, the search is
*.jj files are compiled with javacc, there is a javacc.zip file in your lib
directory, but you should download the compilerset.
mvh karl øie
-Original Message-
From: Christophe GOGUYER DESSAGNES [mailto:[EMAIL PROTECTED]]
Sent: 17. desember 2001 17:32
To: [EMAIL PROTECTED]
Subject: HTML
a/org/apache/lucene/quer
yParser/
mvh karl øie
-Original Message-
From: Kiran Kumar K.G [mailto:[EMAIL PROTECTED]]
Sent: 8. desember 2001 12:43
To: [EMAIL PROTECTED]
Subject: searching words starting with accent characters using UTF-8
Iam trying to search for words starting with a
you will need javacc.zip in your classpath to compile lucene. it can be
found in the jakarta-lucene-1.2-rc2/lib/ directory.
mvh karl øie
-Original Message-
From: Patrick Codere [mailto:[EMAIL PROTECTED]]
Sent: 5. desember 2001 16:00
To: '[EMAIL PROTECTED]'
Subject: FW: In
/portuguese/stemmer.html
mvh karl øie
-Original Message-
From: Bizu de Anúncio [mailto:[EMAIL PROTECTED]]
Sent: 3. desember 2001 13:22
To: [EMAIL PROTECTED]
Subject: Filter and stop-words
I'm new to Lucene. First of all I would like to know if there is a search
arquive like "su
e from the browser
to utf-8 and it worked (guess the browser sent the string as ascii!!! i'm so
happy and thanks to you both jonas and david!!
String query = this.request.getParameter( "query" );
if( query!=null ) {
query = new String( query.getBytes(), "UTF-8" );
after i had replaced "QueryParser.jj" with the newest version from cvs the
queryparser accepts my query, and i can now perform ø/æ/å searches from
commandline, then i guess there is something wrong with my search servlets
unicode handling :-)
thank you very much!
karl øie
it's still translated into ä ?!?
the strange thing is that the cvs version actually already has this into
it's code.. perhaps I should try a full rebuild from the cvs version...
could you send me your "QueryParser.jj" so i could have a look at it?
btw: thanks for the tips!
mvh
changed on the way in. if i search for
>"fjøs" (fjøs) i get the swedish "fjä" (fjÄ). Where ø is
>changed to Ä and 's' is removed.
>
>is the querystring translated some where?
>
>mvh karl øie
> -Original Message-
> From: David Bonilla
i tried the SimpleAnalyzer and got the same result. but i forgot to provide
the stacktrace;
org.apache.lucene.queryParser.TokenMgrError: Lexical error at line 1, column
1. Encountered: "\u00c3" (195), after : ""
at
org.apache.lucene.queryParser.QueryParserTokenManager.getNextToken(Unknow
no it's even stranger than that, i have decoded the querystring, the problem
is that it seems like something is changed on the way in. if i search for
"fjøs" (fjøs) i get the swedish "fjä" (fjÄ). Where ø is
changed to Ä and 's' is removed.
is the querystring t
Hi, i got a problem with scandinavian characters (æåø), when i insert text
with scand-chars it passes the analyzer correctly, but the QueryParser
chokes when i try to search for the same characters.
anyone know anything about how i can fix this?
karl øie/gan meida
--
To unsubscribe, e-mail
60 matches
Mail list logo