On 10 April 2013 08:11, sdspieg <sdsp...@mail.ru> wrote: > Another progress report. I 'flattened' all the folders which contained the > pdf files with Fileboss and then moved the pdf files to the directory where > I found the post.jar file (in solr-4.2.1\solr-4.2.1\example\exampledocs). I > then ran "java -Ddata=files -jar post.jar *.pdf" and in the command window > it seemed to be working fine (these are just academic articles in pdf-format > that I downloaded with ZOtyero from EBSCO): [...]
If it works, great, but it is not generally advisable to have a large number of files under one directory. However, that is not the source of your error here. > But then when I looked in solr, I saw the following: > 04:34:41 > SEVERE > SolrCore > org.apache.solr.common.SolrException: Invalid UTF-8 middle byte 0xe3 (at > char #10, byte #-1) [...] Your files seem to have some encoding other than UTF-8: My random guess would be Windows-1252. You need to convert the files to UTF-8. Regards, Gora