On Feb 16, 2015, at 4:54 PM, Levy, Michael ml...@ushmm.org wrote:
I think you can accomplish what you want by using ICUFoldingFilterFactory
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingFilterFactory
which should simply perform ICU (cf
I know the documents I’m indexing are written in Spanish, and adding the
following filters to my field definition, I believe I have resolved my problem:
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=Spanish /
In other words, my searchable
@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] indexing word documents using solr [diacritics]
How do I retain diacritics in a Solr index, and how to I search for words
containing them?
I have extracted the plain text out of set of Word documents. I have then used
a Perl interface (WebService::Solr) to add
On Feb 10, 2015, at 11:46 AM, Erik Hatcher erikhatc...@mac.com wrote:
bin/post -c collection_name /path/to/file.doc
The almost trivial command to index a Word document in Solr, above, is most
certainly appealing, but I’m wondering about the underlying index’s schema.
Tika makes every effort
Can somebody point me to a good tutorial on how to index Word documents using
Solr?
I have a few hundred Microsoft Word documents I want to search. Through the use
of the Tika library it seems as if I ought to be able to index my Word
documents directly into Solr, but none of the tutorials I
I found this book helped me get my head around Solr:
https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-beginner%E2%80%99s-guide.
Chapter 8 explains indexing rich text formats including MS Word.
Chris Gray
Systems Analyst
519-888-4567, ext. 35764
cpg...@uwaterloo.ca
On Feb 10, 2015, at 12:43, Eric Lease Morgan emor...@nd.edu wrote:
On Feb 10, 2015, at 11:46 AM, Erik Hatcher erikhatc...@mac.com wrote:
First, with Solr 5, it’s this easy:
Where can I download Solr 5 because none of the other version seem to be
complete. —ELM
It's not yet released