Re: How to Index Pure Text into Seperate Fields?

2010-09-29 Thread Lance Norskog
tely fine with html. > Thanks. > > > > > > From: Erick Erickson > To: solr-user@lucene.apache.org > Sent: Wed, September 29, 2010 2:59:26 PM > Subject: Re: How to Index Pure Text into Seperate Fields? > > Can you provide a few more det

Re: How to Index Pure Text into Seperate Fields?

2010-09-29 Thread Savannah Beckett
: Re: How to Index Pure Text into Seperate Fields? Can you provide a few more details? You mention xpath, which leads me to believe that you are using DIH, is that true? How are you getting your documents to index? Parts of a filesystem? Because it's possible to do many things. If you're

Re: How to Index Pure Text into Seperate Fields?

2010-09-29 Thread Erick Erickson
Can you provide a few more details? You mention xpath, which leads me to believe that you are using DIH, is that true? How are you getting your documents to index? Parts of a filesystem? Because it's possible to do many things. If you're using DIH against a filesystem, you could use two fileDataSo

Re: How to Index Pure Text into Seperate Fields?

2010-09-29 Thread Savannah Beckett
, 2010 1:20:20 PM Subject: Re: How to Index Pure Text into Seperate Fields? Break your HTML pages into the desired fields, format it as follows: http://wiki.apache.org/solr/UpdateXmlMessages And away you go.  You may want to search / review the Wiki.  Also, if you're indexing websites and wa

Re: How to Index Pure Text into Seperate Fields?

2010-09-29 Thread Scott Gonyea
Break your HTML pages into the desired fields, format it as follows: http://wiki.apache.org/solr/UpdateXmlMessages And away you go. You may want to search / review the Wiki. Also, if you're indexing websites and want to place it in Solr, you should look at Nutch. It can do all that work for yo

How to Index Pure Text into Seperate Fields?

2010-09-29 Thread Savannah Beckett
Hi,   I am using xpath to index different parts of the html pages into different fields.  Now, I have some pure text documents that has no html.  So I can't use xpath.  How do I index these pure text into different fields of the index?  How do I make nutch/solr understand these different parts b