[ 
https://issues.apache.org/jira/browse/SOLR-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Preetam Rao updated SOLR-934:
-----------------------------

    Description: 
Enable importing of mails into solr through DIH. Take one or more mailbox 
credentials, download and index their content along with the content from 
attachments.

The folders to fetch can be made configurable based on various criteria.

Apache Tika can be used for extracting content from different kinds of 
attachments.
JavaMail can be used for mail box related operations like fetching mails, 
filtering them etc.

The basic configuration for one mail box can look something like this:

<document>
   <entity processor="org.apache.solr.handler.dataimport.MailEntityProcessor"
 user="someb...@gmail.com"
password="something"
host="imap.gmail.com"
protocol="imaps"
folder="test1"/>
</document>

- This can be enhanced with timeouts, list to be read from a file, folder 
filters, delta import etc.

  was:
Enable importing of mails into solr through DIH. Take one or more mailbox 
credentials, download and index their content along with the content from 
attachments.

The folders to fetch can be made configurable based on various criteria.

Apache Tika can be used for extracting content from different kinds of 
attachments.
JavaMail can be used for mail box related operations like fetching mails, 
filtering them etc.


> Enable importing of mails into a solr index through DIH.
> --------------------------------------------------------
>
>                 Key: SOLR-934
>                 URL: https://issues.apache.org/jira/browse/SOLR-934
>             Project: Solr
>          Issue Type: New Feature
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: Preetam Rao
>            Priority: Minor
>             Fix For: 1.4
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Enable importing of mails into solr through DIH. Take one or more mailbox 
> credentials, download and index their content along with the content from 
> attachments.
> The folders to fetch can be made configurable based on various criteria.
> Apache Tika can be used for extracting content from different kinds of 
> attachments.
> JavaMail can be used for mail box related operations like fetching mails, 
> filtering them etc.
> The basic configuration for one mail box can look something like this:
> <document>
>    <entity processor="org.apache.solr.handler.dataimport.MailEntityProcessor"
>  user="someb...@gmail.com"
> password="something"
> host="imap.gmail.com"
> protocol="imaps"
> folder="test1"/>
> </document>
> - This can be enhanced with timeouts, list to be read from a file, folder 
> filters, delta import etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to