[ https://issues.apache.org/jira/browse/ARROW-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861790#comment-16861790 ]
Joris Van den Bossche commented on ARROW-5568: ---------------------------------------------- {quote}I have JSON data where the columnar (line-delimited) part is in a `data` subkey:{quote} Note that the {{data}} subpart is not line delimited, but a comma-delimited JSON array. So that's a first thing that would be good to support. Some additional resources that might be useful: in pandas there are many formats supported, called "orients", see the overview table at http://pandas.pydata.org/pandas-docs/version/0.24/user_guide/io.html#reading-json (disclaimer: I don't know how common the different formats are, so it doesn't necessarily makes sense to copy them all from pandas). One of the formats is the JSON Table Schema (https://frictionlessdata.io/specs/table-schema/), which is a json file with a {{'metadata'}} and {{'data'}} top-level keys, where the {{'data'}} then consists of comma-delimited records (so very similar in structure as what [~dhirschfeld] showed above). > [Python] Allow parsing more general JSON formats > ------------------------------------------------ > > Key: ARROW-5568 > URL: https://issues.apache.org/jira/browse/ARROW-5568 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Reporter: Dave Hirschfeld > Priority: Minor > > I have JSON data where the columnar (line-delimited) part is in a `data` > subkey: > {code:java} > { > "metadata": {"name": "block1"}, > "data" : [ > {"a": 1, "b": 2.0, "c": "foo", "d": false}, > {"a": 4, "b": -5.5, "c": null, "d": true} > ] > } > {code} > > > It would be good if the arrow JSON parser could allow specifying where the > columnar data is stored. > Since the `metadata` is also important to me it would be even better if the > rest of the JSON could be returned as a Python dict with the only the > specified keys parsed as arrow tables - e.g. > > {code:java} > >>> block1 = json.read_json(fn, tables=['data']) > >>> block1['data'] > pyarrow.Table > a: int64 > b: double > c: string > d: bool > >>> block1['metadata'] > {'name': 'block1'} > >>> block1 > { > "metadata": {"name": "block1"}, > "data" : pyarrow.Table > }{code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)