Have you looked at Solr Cell? See: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
When working with things like MS word, there are a couple of things to be aware of: 1> there has to be a mapping between the meta-data (last_edited, author, whatever) and the field in Solr you want that meta-data to go to. 2> each type of document may have different meta-data meaning the same thing. The other alternative is to use Tika directly in a Java program and take full control of what goes where, here's an example (you can remove the database stuff easily): https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/ Best, Erick On Thu, Dec 3, 2015 at 4:00 AM, kostali hassan <med.has.kost...@gmail.com> wrote: > I start working in solr 5x by extract solr in D://solr and run solr server > with : > > D:\solr\solr-5.3.1\bin>solr start ; > > Then I create a core in standalone mode : > > D:\solr\solr-5.3.1\bin>solr create -c mycore > > I need indexing from system files (word and pdf) and the schema API don’t > have a field “name” of document, then I Add this field using curl : > > curl -X POST -H 'Content-type:application/json' --data-binary '{ > > "add-field":{ > > "name":"name", > > "type":"text_general", > > "stored":true, > > “indexed”:true } > > }' http://localhost:8983/solr/mycore/schema > > > > And re-index all document.with windows SimplepostTools: > > D:\solr\solr-5.3.1>java -classpath example\exampledocs\post.jar -Dauto=yes > -Dc=mycore -Ddata=files -Drecursive=yes org.apache.solr.util.SimplePostTool > D:\Lucene\document ; > > > > But even if the field “name” is succeffly added he is empty ; the field > title get the name for only pdf document not for msword(.doc and .docx). > > > > Then I choose indexing with techproducts example because he don’t use > schema.xml API then I can modified my schema: > > > > D:\solr\solr-5.3.1>solr –e techproducts > > > > Techproducts return the name of all files.xml indexed; > > > > Then I create a new core based in solr_home example/techproducts/solr and I > use schema.xml (contient field “name”) and solrConfig.xml from techproducts > in this new core called demo. > > When I indexed all document the field name exist but still empty for all > document indexed. > > > > My question is how I can get just the name of each document(msword and pdf) > not the path like the field “id” or field “ressource_name” ; I have to > create new Typefield or exist another way. > > > > Sorry for my basic English. > > Thank you.