I've uloaded the file here: http://www.filesonic.com/file/2342166624/Starting_a_Search_Application.pdf
try this, thanks 2011/10/5 Michael McCandless <luc...@mikemccandless.com> > Hmm, no attachment; maybe it's too large? > > Can you send it directly to me? > > Mike McCandless > > http://blog.mikemccandless.com > > 2011/10/5 Héctor Trujillo <hecto...@gmail.com>: > > This is the file that give me errors. > > > > 2011/10/5 Michael McCandless <luc...@mikemccandless.com> > >> > >> Can you attach this PDF to an email & send to the list? Or is it too > >> large for that? > >> > >> Or, you can try running Tika directly on the PDF to see if it's able > >> to extract the text. > >> > >> Mike McCandless > >> > >> http://blog.mikemccandless.com > >> > >> 2011/10/5 Héctor Trujillo <hecto...@gmail.com>: > >> > Sorry you have the reason, this file was indexed with a .Net web > service > >> > client, that calls a Java application(a web service) that calls Solr > >> > using > >> > SolrJ. > >> > > >> > I will try to index this in a different way, may be this resolve the > >> > problem. > >> > > >> > Thanks > >> > > >> > Best regards > >> > > >> > > >> > > >> > El 5 de octubre de 2011 08:42, Héctor Trujillo > >> > <hecto...@gmail.com>escribió: > >> > > >> >> It seems unreasonable that if I want to index a local file, I have > to > >> >> references this local file by an URL. > >> >> > >> >> This isn't a estrange file, this is a file downloaded from lucid web > >> >> portal > >> >> called: Starting a Search Application.pdf > >> >> > >> >> This problem may be a codification problem, or char set problem. I > open > >> >> this file with a PDF Reader and I have no problems, and I don’t Know > >> >> why > >> >> referencing this file with and URL will fix this problem, can you > help > >> >> me? > >> >> > >> >> I'm working with SolrJ, from Java, does some have the same problem > with > >> >> SolrJ? > >> >> > >> >> > >> >> > >> >> Thanks to Paul Libbrecht, for your option. > >> >> > >> >> > >> >> > >> >> Best regards > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> 2011/10/4 Paul Libbrecht <p...@hoplahup.net> > >> >> > >> >>> full of boxes for me. > >> >>> Héctor, you need another way to reference these! > >> >>> (e.g. a URL) > >> >>> > >> >>> paul > >> >>> > >> >>> > >> >>> Le 4 oct. 2011 à 16:49, Héctor Trujillo a écrit : > >> >>> > >> >>> > Hi all, I'm indexing pdf's files with SolrJ, and most of them > work. > >> >>> > But > >> >>> with > >> >>> > some files I’ve got problems because they stored estrange > >> >>> > characters. I > >> >>> got > >> >>> > stored this content: > >> >>> > +++++++ > >> >>> > > >> >>> > Starting a Search Application > >> >>> > > >> >>> > >> >>> > > >> >>> > Abstract > >> >>> > > >> >>> > >> >>> > Starting > >> >>> > a Search Application A Lucid Imagination White Paper ¥ April 2009 > >> >>> > Page > >> >>> i > >> >>> > > >> >>> > >> >>> > > >> >>> > Starting a Search Application A Lucid Imagination White Paper ¥ > >> >>> > April > >> >>> 2009 > >> >>> > Page ii Do You Need Full-text Search? > >> >>> > > >> >>> > >> >>> > ∞ > >> >>> > > >> >>> > >> >>> > ∞ > >> >>> > ∞ > >> >>> > > >> >>> > >> >>> > Starting > >> >>> > a Search Application A Lucid Imagination White Paper ¥ April 2009 > >> >>> > Page > >> >>> 1 > >> >>> > > >> >>> > >> >>> > Identifying > >> >>> > Ideal Results > >> >>> > > >> >>> > >> >>> > Starting > >> >>> > a Search Application A Lucid Imagination White Paper ¥ April 2009 > >> >>> > Page > >> >>> 2 > >> >>> > > >> >>> > >> >>> > Starting > >> >>> > a Search Application A Lucid Imagination White Paper > >> >>> > > >> >>> > > >> >>> > +++++++ > >> >>> > > >> >>> > But if I open the pdf file I have no problem to see the content > >> >>> correctly. > >> >>> > > >> >>> > I think this is a question of the charset encoding, but I don't > know > >> >>> > if > >> >>> I > >> >>> > can avoid this behaviour with a different analyzer o tokenizer to > be > >> >>> applied > >> >>> > in indexing time, may be. > >> >>> > > >> >>> > I've got this problem with some documents downloaded from Lucid's > >> >>> > Web. > >> >>> > > >> >>> > > >> >>> > > >> >>> > I don't know if some have had the same problem and know how to > solve > >> >>> this. > >> >>> > > >> >>> > Thanks > >> >>> > > >> >>> > Best regards > >> >>> > >> >>> > >> >> > >> > > > > > >