[ https://issues.apache.org/jira/browse/SOLR-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735926#action_12735926 ]
Noble Paul commented on SOLR-1313: ---------------------------------- reading from gzipped files is good .but unless we have a corresponding means to walk through the file list in the zipped file it is not so useful. Most of the zip files will have multiple files. I should be able to do as follows {code:xml} <dataSource name="zip" type="CompressedFileDataSource" format="gzip"/> <document> <entity name="f" processor="CompressedFileListEntityProcessor" file="/some/path/to/files.gz" fileName=".*xml" recursive="true" rootEntity="false"> <entity name="x" dataSource="zip" processor="XPathEntityProcessor" forEach="/the/record/xpath" url="${f.fileAbsolutePath}"> <field column="full_name" xpath="/field/xpath"/> </entity> </entity> </document> {code} I guess the URLDataSource should be able to accept urls of the format "jar:file:/home/duke/duke.jar!/a.xml" > DIH should be able to read gziped files > --------------------------------------- > > Key: SOLR-1313 > URL: https://issues.apache.org/jira/browse/SOLR-1313 > Project: Solr > Issue Type: New Feature > Components: contrib - DataImportHandler > Affects Versions: 1.5 > Reporter: Yousef Ourabi > Attachments: GzipFileDataSource.java > > Original Estimate: 2h > Remaining Estimate: 2h > > For very large (file) imports it would be beneficial to be able to read from > gzipped files which should also improve performance (less disk I/O) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.