[ https://issues.apache.org/jira/browse/CONNECTORS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920801#action_12920801 ]
Jack Krupansky edited comment on CONNECTORS-118 at 10/13/10 7:35 PM: --------------------------------------------------------------------- I have personally written unit tests that generated most of those formats which Aperture then extracted. See: http://sourceforge.net/apps/trac/aperture/wiki/SubCrawlers org.apache.tools.bzip2 - BZIP2 archives. java.util.zip.GZIPInputStream - GZIP archives. javax.mail - message/rfc822-style messages and mbox files. org.apache.tools.tar - tar archives. was (Author: jkrupan): One of those VFS links points to all the Java packages used to access the list of archive formats I listed. I have personally written unit tests that generated most of those formats which Aperture then extracted. > Crawled archive files should be expanded into their constituent files > --------------------------------------------------------------------- > > Key: CONNECTORS-118 > URL: https://issues.apache.org/jira/browse/CONNECTORS-118 > Project: ManifoldCF > Issue Type: New Feature > Components: Framework crawler agent > Reporter: Jack Krupansky > > Archive files such as zip, mbox, tar, etc. should be expanded into their > constituent files during crawling of repositories so that any output > connector would output the flattened archive. > This could be an option, defaulted to ON, since someone may want to implement > a "copy" connector that maintains crawled files as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.