[jira] [Commented] (ARROW-81) C++: Add a Category nested type

2016-08-19 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428957#comment-15428957 ] Wes McKinney commented on ARROW-81: --- We're running into what "first class type" means again. I'm going to

Re: [jira] [Commented] (ARROW-81) C++: Add a Category nested type

2016-08-19 Thread Wes McKinney
hi Pino, can you reply on JIRA for the sake of keeping the discussion in one place? >From what I know about sector categorization I think this is a slightly separate question -- here we are only concerned with the metadata and memory representation of data with a fixed number of categories (where

[jira] [Commented] (ARROW-81) C++: Add a Category nested type

2016-08-19 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428657#comment-15428657 ] Wes McKinney commented on ARROW-81: --- Here's a couple examples from Python and R (I believe SAS / Stata /

[jira] [Commented] (ARROW-81) C++: Add a Category nested type

2016-08-19 Thread Jacques Nadeau (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428316#comment-15428316 ] Jacques Nadeau commented on ARROW-81: - Can you guys provide two small example datasets in JSON format

[jira] [Commented] (ARROW-81) C++: Add a Category nested type

2016-08-19 Thread Hadley Wickham (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428308#comment-15428308 ] Hadley Wickham commented on ARROW-81: - I agree with Wes that factors/categories are a fundamental data

[jira] [Commented] (ARROW-81) C++: Add a Category nested type

2016-08-17 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15425061#comment-15425061 ] Wes McKinney commented on ARROW-81: --- The other question I have is the category / dictionary indices. For

[jira] [Commented] (ARROW-81) C++: Add a Category nested type

2016-08-17 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15425026#comment-15425026 ] Wes McKinney commented on ARROW-81: --- There is no doubt that a Category logical type / metadata is

[jira] [Commented] (ARROW-81) C++: Add a Category nested type

2016-08-17 Thread Mohit Jaggi (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424893#comment-15424893 ] Mohit Jaggi commented on ARROW-81: -- When I was working on feature engineering earlier I struggled with this

[jira] [Commented] (ARROW-81) C++: Add a Category nested type

2016-08-17 Thread Julian Hyde (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424330#comment-15424330 ] Julian Hyde commented on ARROW-81: -- Since Arrow is a general-purpose data format, this requirement seems to

[jira] [Commented] (ARROW-81) C++: Add a Category nested type

2016-08-09 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414445#comment-15414445 ] Wes McKinney commented on ARROW-81: --- A couple more notes on this: While creating the Feather format,