Hello Marius - thank you for the detailed reply. My goal is (2) - to find all 
documents with a .7z attachment, where those attachments include file(s) 
containing "foo". If I read your email correctly, Tika 1.6 (5) is root cause 
for my failure to search successfully for text within the files contained in a 
.7z attachment. I am successful with my search when using a .zip file as the 
attachment - so we will instruct wiki users to avoid .7z attachments.

Garth

> -----Original Message-----
> Message: 2
> Date: Thu, 11 Dec 2014 08:42:20 +0200
> From: Marius Dumitru Florea <mariusdumitru.flo...@xwiki.com>
> To: XWiki Users <users@xwiki.org>
> Subject: Re: [xwiki-users] XWiki search/Solr support for additional
>                 filetypes
> Message-ID:
>                 <CALZcbBbprk=SJjhqGKKX1tx-TcMQbcq+qby6ZfnQqXZ-
> akc...@mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> It depends what you mean by "search attachments that are 7-Zip .7z
> archives":
>
> (1) Give me all the documents that have an attachment of mime type
> application/x-7z-compressed
> (2) Give me all the documents that have a 7-Zip archive attached that
> includes a file that contains the word "foo"
>
> If you use Solr, the default search engine for XWiki 6.2.4, then the
> code that is responsible for indexing the attachments is
> AttachmentSolrMetadataExtractor [1]. This is a component so it can be
> overridden as per [2]. The current implementation uses Tika [3] to:
>
> (1) detect the mime type of the attachment
> (2) extract indexable content from the attachment (whatever its mime
> type may be)
>
> For (1) Tika supports detecting the 7-Zip mime type since version 1.2
> [4]. For (2) judging by [5] it seems Tika also supports reading 7-ZIP
> archives but there were some issues in 1.6 that have been fixed in
> 1.7. We are currently using Tika 1.6 in XWiki. We should probably
> upgrade.
>
> Hope this helps,
> Marius
>
> [1] https://github.com/xwiki/xwiki-platform/blob/master/xwiki-platform-
> core/xwiki-platform-search/xwiki-platform-search-solr/xwiki-platform-
> search-solr-
> api/src/main/java/org/xwiki/search/solr/internal/metadata/AttachmentSolr
> MetadataExtractor.java
> [2]
> http://extensions.xwiki.org/xwiki/bin/view/Extension/Component+Module
> #HOverrides
> [3] https://github.com/xwiki/xwiki-platform/blob/master/xwiki-platform-
> core/xwiki-platform-search/xwiki-platform-search-solr/xwiki-platform-
> search-solr-
> api/src/main/java/org/xwiki/search/solr/internal/metadata/AbstractSolrMet
> adataExtractor.java#L458
> [4] https://issues.apache.org/jira/browse/TIKA-940
> [5] https://issues.apache.org/jira/browse/TIKA-1411
>
> On Wed, Dec 10, 2014 at 9:20 PM, Arnold, Garth <arnol...@ghc.org> wrote:
> > Hello - is it possible to enable searching of additional filetypes within 
> > XWiki
> 6.2.4? Specifically I would like to be able to search attachments that are 
> 7-Zip
> .7z archives. It looks to me as though the underlying library (Commons
> Compress) supports this filetype, but I am a new XWiki user and non-java
> programmer so I may be assuming too much.
> >
> > Thanks in advance for your thoughts on this -
> >
> > Garth Arnold


________________________________

GHC Confidentiality Statement

This message and any attached files might contain confidential information 
protected by federal and state law. The information is intended only for the 
use of the individual(s) or entities originally named as addressees. The 
improper disclosure of such information may be subject to civil or criminal 
penalties. If this message reached you in error, please contact the sender and 
destroy this message. Disclosing, copying, forwarding, or distributing the 
information by unauthorized individuals or entities is strictly prohibited by 
law.

_______________________________________________
users mailing list
users@xwiki.org
http://lists.xwiki.org/mailman/listinfo/users

Reply via email to