Re: problem indexing large document collction on windows xp

2004-12-30 Thread Bernhard Messer
Thilo, thanks for your effort. Could you please open a new entry in Bugzilla, mark it as [PATCH] and add the diff file with your changes. This ensures that the sources and the information will not get lost in the huge universe of mailing lists. As soon there is time, one of the comitters will r

problem indexing large document collction on windows xp

2004-12-30 Thread Thilo Will
Hello I encoutered a problem when i tried to index large document collections (about 20 mio documents). The indexing failed with the IOException: "Cannot delete deletables" I tried different times (with the same document collection) and allways received the error, but after a different number of

Problem indexing

2004-10-12 Thread Miguel Angel
Hi, i have problem indexing in the rout C:\TXT\DOC\ But i indexing in the rout C:\TXT is OK Why is the problem ?? P.D Anybody speak spanish in the list please reply P.D. Si alguien habla español por favor respodame gracias.. -- Miguel Angel Angeles R. Asesoria en Conectividad y

Re: Problem Indexing Large Document Field

2004-05-26 Thread Gilberto Rodriguez
OutOfMemoryError. By default, no more than 10,000 terms will be indexed for a field. -Original Message- From: Gilberto Rodriguez [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 26, 2004 4:04 PM To: [EMAIL PROTECTED] Subject: Problem Indexing Large Document Field I am trying to index a field

RE: Problem Indexing Large Document Field

2004-05-26 Thread wallen
] Sent: Wednesday, May 26, 2004 4:04 PM To: [EMAIL PROTECTED] Subject: Problem Indexing Large Document Field I am trying to index a field in a Lucene document with about 90,000 characters. The problem is that it only indexes part of the document. It seems to only index about 65,00 characters. So

Re: Problem Indexing Large Document Field

2004-05-26 Thread Gilberto Rodriguez
Thanks, James... That solved the problem. On May 26, 2004, at 4:15 PM, James Dunn wrote: Gilberto, Look at the IndexWriter class. It has a property, maxFieldLength, which you can set to determine the max number of characters to be stored in the index. http://jakarta.apache.org/lucene/docs/api/org

Re: Problem Indexing Large Document Field

2004-05-26 Thread James Dunn
Gilberto, Look at the IndexWriter class. It has a property, maxFieldLength, which you can set to determine the max number of characters to be stored in the index. http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWriter.html Jim --- Gilberto Rodriguez <[EMAIL PROTECTED]> w

Problem Indexing Large Document Field

2004-05-26 Thread Gilberto Rodriguez
I am trying to index a field in a Lucene document with about 90,000 characters. The problem is that it only indexes part of the document. It seems to only index about 65,00 characters. So, if I search on terms that are at the beginning of the text, the search works, but it fails for terms that

AW: Problem indexing Spanish Characters

2004-05-21 Thread PEP AD Server Administrator
Hi all, Martin was right. I just adapt the HTML demo as Wallen recommended and it worked. Now I have only to deal with some crazy documents which are UTF-8 decoded mixed with entities. Does anyone know a class which can translate entities into UTF-8 or any other encoding? Peter MH -Ursprüngli

RE: AW: Problem indexing Spanish Characters

2004-05-19 Thread wallen
rt parsing } return pipeIn; } -Original Message- From: Martin Remy [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 19, 2004 2:09 PM To: 'Lucene Users List' Subject: RE: AW: Problem indexing Spanish Characters The tokenizers deal with unicode ch

RE: AW: Problem indexing Spanish Characters

2004-05-19 Thread Martin Remy
r example. Martin -Original Message- From: Hannah c [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 19, 2004 10:35 AM To: [EMAIL PROTECTED] Subject: RE: AW: Problem indexing Spanish Characters Hi, I had a quick look at the sandbox but my problem is that I don't need a spanish stemmer

RE: AW: Problem indexing Spanish Characters

2004-05-19 Thread Hannah c
]> To: <[EMAIL PROTECTED]> Subject: Re: Problem indexing Spanish Characters Date: Wed, 19 May 2004 11:41:28 -0400 could you send some sample text that causes this to happen? - Original Message - From: "Hannah c" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sen

AW: Problem indexing Spanish Characters

2004-05-19 Thread PEP AD Server Administrator
lauts" some minutes ago which describes my problem and Hannahs seem to be similar. Do you have also UTF-8 encoded pages? Peter MH -Ursprüngliche Nachricht- Von: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Gesendet: Mittwoch, 19. Mai 2004 17:42 An: Lucene Users List Betreff: Re: Proble

Re: Problem indexing Spanish Characters

2004-05-19 Thread Otis Gospodnetic
It looks like Snowball project supports Spanish: http://www.google.com/search?q=snowball spanish If it does, take a look at Lucene Sandbox. There is a project that allows you to use Snowball analyzers with Lucene. Otis --- Hannah c <[EMAIL PROTECTED]> wrote: > > Hi, > > I am indexing a numb

Problem indexing Spanish Characters

2004-05-19 Thread Hannah c
Hi, I am indexing a number of English articles on Spanish resorts. As such there are a number of spanish characters throught the text, most of these are in the place names which are the type of words I would like to use as queries. My problem is with the StandardTokenizer class which cuts the w