Fabio Batista da Silva created ARROW-5317:
---------------------------------------------

             Summary: [Rust] [Parquet] impl IntoIterator for 
SerializedFileReader
                 Key: ARROW-5317
                 URL: https://issues.apache.org/jira/browse/ARROW-5317
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Rust
            Reporter: Fabio Batista da Silva


This is a follow up to [https://github.com/apache/arrow/issues/4301].

The current implementation of a row iterator *RowIter* borrows the *FileReader*
 which the user has to keep the file reader alive for as long as the iterator 
is alive..

And make is hard to iterate over multiple *FileReader* / *RowIter*..
{code:java}
fn main() {
    let path1 = Path::new("path-to/1.snappy.parquet");
    let path2 = Path::new("path-to/2.snappy.parquet");
    let vec = vec![path1, path2];
    let it = vec.iter()
        .map(|p| {
            File::open(p).unwrap()
        })
        .map(|f| {
            SerializedFileReader::new(f).unwrap()
        })
        .flat_map(|reader| -> RowIter {
            RowIter::from_file(None, &reader).unwrap()
//|             |                        |
//|             |                        `reader` is borrowed here
//|             returns a value referencing data owned by the current function
        })
    ;

    for r in it {
        println!("{}", r);
    }
}
{code}
One solution could be to implement a row iterator that takes owners of the 
reader.

Perhaps implementing *std::iter::IntoIterator* for the *SerializedFileReader*
{code:java}
....
.map(|p| {
    File::open(p).unwrap()
})
.map(|f| {
    SerializedFileReader::new(f).unwrap()
})
.flat_map(|r| -> r)
....
{code}
 

Happy to put a PR out with this..
Please let me know if this makes sense and you guys already have some way of 
doing this..



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to