Re: [CODE4LIB] indexing word documents using solr [diacritics, resolved (i think) ]

2015-02-20 Thread Eric Lease Morgan
On Feb 16, 2015, at 4:54 PM, Levy, Michael ml...@ushmm.org wrote: I think you can accomplish what you want by using ICUFoldingFilterFactory https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingFilterFactory which should simply perform ICU (cf

Re: [CODE4LIB] indexing word documents using solr [diacritics, resolved (i think) ]

2015-02-16 Thread Eric Lease Morgan
I know the documents I’m indexing are written in Spanish, and adding the following filters to my field definition, I believe I have resolved my problem: filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=Spanish / In other words, my searchable

Re: [CODE4LIB] indexing word documents using solr [diacritics]

2015-02-12 Thread Karl Holten
@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] indexing word documents using solr [diacritics] How do I retain diacritics in a Solr index, and how to I search for words containing them? I have extracted the plain text out of set of Word documents. I have then used a Perl interface (WebService::Solr) to add

Re: [CODE4LIB] indexing word documents using solr

2015-02-11 Thread Eric Lease Morgan
On Feb 10, 2015, at 11:46 AM, Erik Hatcher erikhatc...@mac.com wrote: bin/post -c collection_name /path/to/file.doc The almost trivial command to index a Word document in Solr, above, is most certainly appealing, but I’m wondering about the underlying index’s schema. Tika makes every effort

[CODE4LIB] indexing word documents using solr

2015-02-10 Thread Eric Lease Morgan
Can somebody point me to a good tutorial on how to index Word documents using Solr? I have a few hundred Microsoft Word documents I want to search. Through the use of the Tika library it seems as if I ought to be able to index my Word documents directly into Solr, but none of the tutorials I

Re: [CODE4LIB] indexing word documents using solr

2015-02-10 Thread Chris Gray
I found this book helped me get my head around Solr: https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-beginner%E2%80%99s-guide. Chapter 8 explains indexing rich text formats including MS Word. Chris Gray Systems Analyst 519-888-4567, ext. 35764 cpg...@uwaterloo.ca

Re: [CODE4LIB] indexing word documents using solr

2015-02-10 Thread Erik Hatcher
On Feb 10, 2015, at 12:43, Eric Lease Morgan emor...@nd.edu wrote: On Feb 10, 2015, at 11:46 AM, Erik Hatcher erikhatc...@mac.com wrote: First, with Solr 5, it’s this easy: Where can I download Solr 5 because none of the other version seem to be complete. —ELM It's not yet released