Thilo,
thanks for your effort. Could you please open a new entry in Bugzilla,
mark it as [PATCH] and add the diff file with your changes. This ensures
that the sources and the information will not get lost in the huge
universe of mailing lists. As soon there is time, one of the comitters
will r
Hello
I encoutered a problem when i tried to index large document collections
(about 20 mio documents).
The indexing failed with the IOException:
"Cannot delete deletables"
I tried different times (with the same document collection) and allways
received the error, but after a different number
of
Hi, i have problem indexing in the rout C:\TXT\DOC\
But i indexing in the rout C:\TXT is OK
Why is the problem ??
P.D Anybody speak spanish in the list please reply
P.D. Si alguien habla español por favor respodame gracias..
--
Miguel Angel Angeles R.
Asesoria en Conectividad y
OutOfMemoryError.
By default, no more than 10,000 terms will be indexed for a field.
-Original Message-
From: Gilberto Rodriguez [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 26, 2004 4:04 PM
To: [EMAIL PROTECTED]
Subject: Problem Indexing Large Document Field
I am trying to index a field
]
Sent: Wednesday, May 26, 2004 4:04 PM
To: [EMAIL PROTECTED]
Subject: Problem Indexing Large Document Field
I am trying to index a field in a Lucene document with about 90,000
characters. The problem is that it only indexes part of the document.
It seems to only index about 65,00 characters. So
Thanks, James... That solved the problem.
On May 26, 2004, at 4:15 PM, James Dunn wrote:
Gilberto,
Look at the IndexWriter class. It has a property,
maxFieldLength, which you can set to determine the max
number of characters to be stored in the index.
http://jakarta.apache.org/lucene/docs/api/org
Gilberto,
Look at the IndexWriter class. It has a property,
maxFieldLength, which you can set to determine the max
number of characters to be stored in the index.
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWriter.html
Jim
--- Gilberto Rodriguez
<[EMAIL PROTECTED]> w
I am trying to index a field in a Lucene document with about 90,000
characters. The problem is that it only indexes part of the document.
It seems to only index about 65,00 characters. So, if I search on terms
that are at the beginning of the text, the search works, but it fails
for terms that
Hi all,
Martin was right. I just adapt the HTML demo as Wallen recommended and it
worked. Now I have only to deal with some crazy documents which are UTF-8
decoded mixed with entities.
Does anyone know a class which can translate entities into UTF-8 or any
other encoding?
Peter MH
-Ursprüngli
rt parsing
}
return pipeIn;
}
-Original Message-
From: Martin Remy [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 19, 2004 2:09 PM
To: 'Lucene Users List'
Subject: RE: AW: Problem indexing Spanish Characters
The tokenizers deal with unicode ch
r example.
Martin
-Original Message-
From: Hannah c [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 19, 2004 10:35 AM
To: [EMAIL PROTECTED]
Subject: RE: AW: Problem indexing Spanish Characters
Hi,
I had a quick look at the sandbox but my problem is that I don't need a
spanish stemmer
]>
To: <[EMAIL PROTECTED]>
Subject: Re: Problem indexing Spanish Characters
Date: Wed, 19 May 2004 11:41:28 -0400
could you send some sample text that causes this to happen?
- Original Message -
From: "Hannah c" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sen
lauts"
some minutes ago which describes my problem and Hannahs seem to be similar.
Do you have also UTF-8 encoded pages?
Peter MH
-Ursprüngliche Nachricht-
Von: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Gesendet: Mittwoch, 19. Mai 2004 17:42
An: Lucene Users List
Betreff: Re: Proble
It looks like Snowball project supports Spanish:
http://www.google.com/search?q=snowball spanish
If it does, take a look at Lucene Sandbox. There is a project that
allows you to use Snowball analyzers with Lucene.
Otis
--- Hannah c <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I am indexing a numb
Hi,
I am indexing a number of English articles on Spanish resorts. As such
there are a number of spanish characters throught the text, most of these
are in the place names which are the type of words I would like to use as
queries. My problem is with the StandardTokenizer class which cuts the w
15 matches
Mail list logo