Hello, when indexing files which contain several thenthousand individual documents, I want to keep for each document the name of the file where it comes from and its byte position. In a query, I want to seek to the byte position to then read the document. I cannot store all the documents in the index. The whole corpus is about 50GB.
Question: For indexing I read through the file and add a Document to lucene every time I find it. Is there an easy way to keep track of byte positions while reading characters from the file? Or do I have to run the CharsetDecoder myself on top of reading bytes? Harald. -- ------------------------------------------------------------------------ Harald Kirsch | [EMAIL PROTECTED] | +44 (0) 1223/49-2593 --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]