Thanks so much. I didn't know how to make any changes in schema.xml for pdf files. I used solr default schema.xml. Please tell me what I need do in schema.xml.
The simple java program I use is following. I also attached that pdf file. I really appreciate your help! ********************************* public class importPDF { public static void main(String[] args) { try { String fileName = "pub2009001.pdf"; String solrId = "pub2009001.pdf"; indexFilesSolrCell(fileName, solrId); } catch (Exception ex) { System.out.println(ex.toString()); } } public static void indexFilesSolrCell(String fileName, String solrId) throws IOException, SolrServerException { String urlString = "http://lhcinternal.nlm.nih.gov:8989/solr/lhcpdf"; SolrServer solr = new CommonsHttpSolrServer(urlString); ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract"); up.addFile(new File(fileName)); up.setParam("literal.id", solrId); up.setParam("uprefix", "attr_"); up.setParam("fmap.content", "attr_content"); up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); solr.request(up); } } ******************************************** -----Original Message----- From: Marco Martinez [mailto:mmarti...@paradigmatecnologico.com] Sent: Thursday, August 12, 2010 11:45 AM To: solr-user@lucene.apache.org Subject: Re: index pdf files To help you we need the description of your fields in your schema.xml and the query that you do when you search only a single word. Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2010/8/12 Ma, Xiaohui (NIH/NLM/LHC) [C] <xiao...@mail.nlm.nih.gov> > I wrote a simple java program to import a pdf file. I can get a result when > I do search *:* from admin page. I get nothing if I search a word. I wonder > if I did something wrong or miss set something. > > Here is part of result I get when do *:* search: > ********************************************* > - <doc> > - <arr name="attr_Author"> > <str>Hristovski D</str> > </arr> > - <arr name="attr_Content-Type"> > <str>application/pdf</str> > </arr> > - <arr name="attr_Keywords"> > <str>microarray analysis, literature-based discovery, semantic > predications, natural language processing</str> > </arr> > - <arr name="attr_Last-Modified"> > <str>Thu Aug 12 10:58:37 EDT 2010</str> > </arr> > - <arr name="attr_content"> > <str>Combining Semantic Relations and DNA Microarray Data for Novel > Hypotheses Generation Combining Semantic Relations and DNA Microarray Data > for Novel Hypotheses Generation Dimitar Hristovski, PhD,1 Andrej > Kastrin,2............... > ********************************************* > Please help me out if anyone has experience with pdf files. I really > appreciate it! > > Thanks so much, > >