[ 
https://issues.apache.org/jira/browse/SOLR-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735926#action_12735926
 ] 

Noble Paul commented on SOLR-1313:
----------------------------------

reading from gzipped files is good .but unless we have a corresponding means to 
walk through the file list in the zipped file it is not so useful. Most of the 
zip files will have multiple files. 


I should be able to do as follows
{code:xml}
<dataSource name="zip" type="CompressedFileDataSource" format="gzip"/>
    <document>
        <entity name="f" processor="CompressedFileListEntityProcessor" 
                     file="/some/path/to/files.gz" fileName=".*xml" 
                     recursive="true" rootEntity="false">
            <entity name="x" dataSource="zip" 
                          processor="XPathEntityProcessor" 
                          forEach="/the/record/xpath" 
url="${f.fileAbsolutePath}">
                <field column="full_name" xpath="/field/xpath"/>
            </entity>
        </entity>
    </document>
{code}


I guess the URLDataSource should be able to accept urls of the format 
"jar:file:/home/duke/duke.jar!/a.xml"

> DIH should be able to read gziped files
> ---------------------------------------
>
>                 Key: SOLR-1313
>                 URL: https://issues.apache.org/jira/browse/SOLR-1313
>             Project: Solr
>          Issue Type: New Feature
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.5
>            Reporter: Yousef Ourabi
>         Attachments: GzipFileDataSource.java
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> For very large (file) imports it would be beneficial to be able to read from 
> gzipped files which should also improve performance (less disk I/O)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to