No, these new documents are not html, these are pure text, like the ones you see in notepad or Microsoft Word. I have no problem indexing Html, but I got stuck with these pure text.
________________________________ From: Scott Gonyea <sc...@aitrus.org> To: solr-user@lucene.apache.org Sent: Wed, September 29, 2010 1:20:20 PM Subject: Re: How to Index Pure Text into Seperate Fields? Break your HTML pages into the desired fields, format it as follows: http://wiki.apache.org/solr/UpdateXmlMessages And away you go. You may want to search / review the Wiki. Also, if you're indexing websites and want to place it in Solr, you should look at Nutch. It can do all that work for you, and more. Scott On Wed, Sep 29, 2010 at 12:56 PM, Savannah Beckett <savannah_becket...@yahoo.com> wrote: > Hi, > I am using xpath to index different parts of the html pages into different > fields. Now, I have some pure text documents that has no html. So I can't use > xpath. How do I index these pure text into different fields of the index? How > do I make nutch/solr understand these different parts belong to different > fields? Maybe I can use existing content in the fields in my index? > Thanks. > > >