See my response to your question on the Solr users’ list here: 
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201602.mbox/%3CCY1PR09MB0795E8DBA7B2B6603A45820EC7A80%40CY1PR09MB0795.namprd09.prod.outlook.com%3E

I don’t think this is a Tika problem.  This is the standard way that Solr’s DIH 
handles embedded documents…it concatenates all embedded documents onto one 
String.

If you want to treat each individual attachment as a separate file, you’ll have 
to do preprocessing on your pst or run Tika on your own (see the 
RecursiveParserWrapper, perhaps) and send documents to Solr via SolrJ 
(https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/).




From: Sreenivasa Kallu [mailto:sreenivasaka...@gmail.com]
Sent: Tuesday, February 16, 2016 6:35 PM
To: user@tika.apache.org
Subject: tika is unable to extract outlook messages

Hi ,
       I am currently indexing individual outlook messages and searching is 
working fine.
I have created solr core using following command.
 ./solr create -c sreenimsg1 -d data_driven_schema_configs

I am using following command to index individual messages.
curl  
"http://localhost:8983/solr/sreenimsg/update/extract?literal.id=msg9&uprefix=attr_&fmap.content=attr_content&commit=true";
 -F "myfile=@/home/ec2-user/msg9.msg<mailto:myfile=@/home/ec2-user/msg9.msg>"

This setup is working fine.

But new requirement is extract messages using outlook pst file.
I tried following command to extract messages from outlook pst file.

curl  
"http://localhost:8983/solr/sreenimsg1/update/extract?literal.id=msg7&uprefix=attr_&fmap.content=attr_content&commit=true";
 -F 
"myfile=@/home/ec2-user/sateamc_0006.pst<mailto:myfile=@/home/ec2-user/sateamc_0006.pst>"

This command extracting only high level tags and extracting all messages into 
one message. I am not getting all tags when extracted individual messgaes. is 
above command is correct? is it problem not using recursion?  how to add 
recursion to above command ? is it tika library problem?

Please help to solve above problem.

Advanced Thanks.
--sreenivasa kallu

Reply via email to