No, these new documents are not html, these are pure text, like the ones you 
see 
in notepad or Microsoft Word.  I have no problem indexing Html, but I got stuck 
with these pure text.




________________________________
From: Scott Gonyea <sc...@aitrus.org>
To: solr-user@lucene.apache.org
Sent: Wed, September 29, 2010 1:20:20 PM
Subject: Re: How to Index Pure Text into Seperate Fields?

Break your HTML pages into the desired fields, format it as follows:

http://wiki.apache.org/solr/UpdateXmlMessages

And away you go.  You may want to search / review the Wiki.  Also, if
you're indexing websites and want to place it in Solr, you should look
at Nutch.  It can do all that work for you, and more.

Scott

On Wed, Sep 29, 2010 at 12:56 PM, Savannah Beckett
<savannah_becket...@yahoo.com> wrote:
> Hi,
>   I am using xpath to index different parts of the html pages into different
> fields.  Now, I have some pure text documents that has no html.  So I can't 
use
> xpath.  How do I index these pure text into different fields of the index?  
How
> do I make nutch/solr understand these different parts belong to different
> fields?  Maybe I can use existing content in the fields in my index?
> Thanks.
>
>
>



      

Reply via email to