On 10 April 2013 08:11, sdspieg <sdsp...@mail.ru> wrote:
> Another progress report. I 'flattened' all the folders which contained the
> pdf files with Fileboss and then moved the pdf files to the directory where
> I found the post.jar file (in solr-4.2.1\solr-4.2.1\example\exampledocs). I
> then ran "java -Ddata=files -jar post.jar *.pdf" and in the command window
> it seemed to be working fine (these are just academic articles in pdf-format
> that I downloaded with ZOtyero from EBSCO):
[...]

If it works, great, but it is not generally advisable to have a large number
of files under one directory. However, that is not the source of your error
here.
> But then when I looked in solr, I saw the following:
> 04:34:41
> SEVERE
> SolrCore
> org.apache.solr.common.SolrException: Invalid UTF-8 middle byte 0xe3 (at
> char #10, byte #-1)
[...]

Your files seem to have some encoding other than UTF-8: My random
guess would be Windows-1252. You need to convert the files to UTF-8.

Regards,
Gora

Reply via email to