Re: [CSV] Headers and the first record

2013-07-31 Thread Gary Gregory
On Wed, Jul 31, 2013 at 4:38 PM, Mark Fortner wrote: > Hi Gary, > > > > This does not look like a classic CSV file. > > > I guess it depends on what your definition of "classic" is. :-) This is > pretty typical for most drug discovery companies. > > > > It sounds like your files contain differen

Re: [CSV] Headers and the first record

2013-07-31 Thread Mark Fortner
Hi Gary, > This does not look like a classic CSV file. I guess it depends on what your definition of "classic" is. :-) This is pretty typical for most drug discovery companies. > It sounds like your files contain different sections in different formats. > True. > > In its current state, c

Re: [CSV] Headers and the first record

2013-07-31 Thread Gary Gregory
On Wed, Jul 31, 2013 at 3:44 PM, Mark Fortner wrote: > Hi Gary, > One other complication I forgot to mention. Compounds are usually run > multiple times. So the same compound will appear with the same set of > concentrations. In practice you would end up with column headers that have > the sam

Re: [CSV] Headers and the first record

2013-07-31 Thread Mark Fortner
Hi Gary, One other complication I forgot to mention. Compounds are usually run multiple times. So the same compound will appear with the same set of concentrations. In practice you would end up with column headers that have the same text in them, so this issue with using a Set vs String[] for th

Re: [CSV] Accessing a subset of the available headers (Was: Re: [CSV] Headers and the first record)

2013-07-31 Thread Gary Gregory
On Wed, Jul 31, 2013 at 2:38 PM, Benedikt Ritter wrote: > 2013/7/31 Gary Gregory > > > On Wed, Jul 31, 2013 at 10:42 AM, Benedikt Ritter > >wrote: > > > > > > > > > > > >> A use case I have now is a CSV file with a lot of columns (~90) but > I > > > only > > > >> care about a small subset of t

Re: [CSV] Accessing a subset of the available headers (Was: Re: [CSV] Headers and the first record)

2013-07-31 Thread Benedikt Ritter
2013/7/31 Gary Gregory > On Wed, Jul 31, 2013 at 10:42 AM, Benedikt Ritter >wrote: > > > > > > > >> A use case I have now is a CSV file with a lot of columns (~90) but I > > only > > >> care about a small subset of the columns (~10). I'd like to be able to > > say > > >> withHeader(Set) where t

Re: [CSV] Headers and the first record

2013-07-31 Thread Gary Gregory
On Wed, Jul 31, 2013 at 10:48 AM, Gary Gregory wrote: > On Wed, Jul 31, 2013 at 9:34 AM, Emmanuel Bourg wrote: > >> Le 31/07/2013 15:08, Gary Gregory a écrit : >> >> > But that is exactly what _was_ happening! ;) >> > >> > If I called withHeader("A", "B", "C") the header was not skipped. >> >> So

Re: [CSV] Headers and the first record

2013-07-31 Thread Gary Gregory
On Wed, Jul 31, 2013 at 11:14 AM, Mark Fortner wrote: > I took a brief look at the API for CSV, and thought I would share a typical > use case from the biotech industry. We deal with a lot of instruments that > produce a multiline header. The header usually contains "experiment > conditions".

Re: [CSV] Headers and the first record

2013-07-31 Thread Mark Fortner
I took a brief look at the API for CSV, and thought I would share a typical use case from the biotech industry. We deal with a lot of instruments that produce a multiline header. The header usually contains "experiment conditions". You can think of this as metadata for the columnar data. The ex

Re: [CSV] Accessing a subset of the available headers (Was: Re: [CSV] Headers and the first record)

2013-07-31 Thread Gary Gregory
On Wed, Jul 31, 2013 at 10:42 AM, Benedikt Ritter wrote: > > > >> A use case I have now is a CSV file with a lot of columns (~90) but I > only > >> care about a small subset of the columns (~10). I'd like to be able to > say > >> withHeader(Set) where the Set may be a subset of the actual column

Re: [CSV] Headers and the first record

2013-07-31 Thread Gary Gregory
On Wed, Jul 31, 2013 at 9:34 AM, Emmanuel Bourg wrote: > Le 31/07/2013 15:08, Gary Gregory a écrit : > > > But that is exactly what _was_ happening! ;) > > > > If I called withHeader("A", "B", "C") the header was not skipped. > > Sounds good. The header is defined in the code, we don't expect to

[CSV] Accessing a subset of the available headers (Was: Re: [CSV] Headers and the first record)

2013-07-31 Thread Benedikt Ritter
>> A use case I have now is a CSV file with a lot of columns (~90) but I only >> care about a small subset of the columns (~10). I'd like to be able to say >> withHeader(Set) where the Set may be a subset of the actual column names in >> the header line. This is different from withHeader(String[]

Re: [CSV] Headers and the first record

2013-07-31 Thread Gary Gregory
On Wed, Jul 31, 2013 at 8:58 AM, Gary Gregory wrote: > On Jul 31, 2013, at 3:38, Benedikt Ritter wrote: > > > 2013/7/31 Gary Gregory > > > >> On Tue, Jul 30, 2013 at 5:29 PM, Emmanuel Bourg > wrote: > >> > >>> Le 30/07/2013 23:26, Gary Gregory a écrit : > And another thing: internally, the

Re: [CSV] Headers and the first record

2013-07-31 Thread Emmanuel Bourg
Le 31/07/2013 15:08, Gary Gregory a écrit : > But that is exactly what _was_ happening! ;) > > If I called withHeader("A", "B", "C") the header was not skipped. Sounds good. The header is defined in the code, we don't expect to see the header in the file so nothing is skipped. > If I called wit

Re: [CSV] Headers and the first record

2013-07-31 Thread Gary Gregory
On Tue, Jul 30, 2013 at 5:47 PM, Emmanuel Bourg wrote: > Le 30/07/2013 23:24, Gary Gregory a écrit : > > > Yeah, that's too clever IMO. I expected the same behavior WRT record > > reading with the only difference being if I let the parser guess or not. > > Too clever? I didn't feel like I designe

Re: [CSV] Headers and the first record

2013-07-31 Thread Gary Gregory
On Jul 31, 2013, at 3:38, Benedikt Ritter wrote: > 2013/7/31 Gary Gregory > >> On Tue, Jul 30, 2013 at 5:29 PM, Emmanuel Bourg wrote: >> >>> Le 30/07/2013 23:26, Gary Gregory a écrit : And another thing: internally, the header should be a Set, not >> a String[]. I plan on fixing that

Re: [CSV] Headers and the first record

2013-07-31 Thread sebb
On 31 July 2013 08:38, Benedikt Ritter wrote: > 2013/7/31 Gary Gregory > >> On Tue, Jul 30, 2013 at 5:29 PM, Emmanuel Bourg wrote: >> >> > Le 30/07/2013 23:26, Gary Gregory a écrit : >> > > And another thing: internally, the header should be a Set, not >> a >> > > String[]. I plan on fixing that

Re: [CSV] Headers and the first record

2013-07-31 Thread Benedikt Ritter
2013/7/31 Gary Gregory > On Tue, Jul 30, 2013 at 5:29 PM, Emmanuel Bourg wrote: > > > Le 30/07/2013 23:26, Gary Gregory a écrit : > > > And another thing: internally, the header should be a Set, not > a > > > String[]. I plan on fixing that later too. > > > > Why should it be a set? Is there an

Re: [CSV] Headers and the first record

2013-07-30 Thread Gary Gregory
On Tue, Jul 30, 2013 at 5:47 PM, Emmanuel Bourg wrote: > Le 30/07/2013 23:24, Gary Gregory a écrit : > > > Yeah, that's too clever IMO. I expected the same behavior WRT record > > reading with the only difference being if I let the parser guess or not. > > Too clever? I didn't feel like I designe

Re: [CSV] Headers and the first record

2013-07-30 Thread Gary Gregory
On Tue, Jul 30, 2013 at 5:29 PM, Emmanuel Bourg wrote: > Le 30/07/2013 23:26, Gary Gregory a écrit : > > And another thing: internally, the header should be a Set, not a > > String[]. I plan on fixing that later too. > > Why should it be a set? Is there an impact on the performance? > Well, I di

Re: [CSV] Headers and the first record

2013-07-30 Thread Emmanuel Bourg
Le 30/07/2013 23:24, Gary Gregory a écrit : > Yeah, that's too clever IMO. I expected the same behavior WRT record > reading with the only difference being if I let the parser guess or not. Too clever? I didn't feel like I designed a rocket with this feature though :) That's an important feature

Re: [CSV] Headers and the first record

2013-07-30 Thread Emmanuel Bourg
Le 30/07/2013 23:26, Gary Gregory a écrit : > And another thing: internally, the header should be a Set, not a > String[]. I plan on fixing that later too. Why should it be a set? Is there an impact on the performance? Emmanuel Bourg

Re: [CSV] Headers and the first record

2013-07-30 Thread Gary Gregory
And another thing: internally, the header should be a Set, not a String[]. I plan on fixing that later too. Gary On Tue, Jul 30, 2013 at 5:24 PM, Gary Gregory wrote: > On Tue, Jul 30, 2013 at 5:15 PM, Emmanuel Bourg wrote: > >> I haven't checked the current code, but the intended behavior was:

Re: [CSV] Headers and the first record

2013-07-30 Thread Gary Gregory
On Tue, Jul 30, 2013 at 5:15 PM, Emmanuel Bourg wrote: > I haven't checked the current code, but the intended behavior was: > > - no args: the first record defines the header and is not returned when > iterating > > - args: the header is defined independently of the data, all the records > are re

Re: [CSV] Headers and the first record

2013-07-30 Thread Emmanuel Bourg
I haven't checked the current code, but the intended behavior was: - no args: the first record defines the header and is not returned when iterating - args: the header is defined independently of the data, all the records are returned when iterating Emmanuel Bourg Le 30/07/2013 22:23, Gary Gre

Re: [CSV] Headers and the first record

2013-07-30 Thread Gary Gregory
Actually, if you use withHeader(), no args, you _cannot_ get back the first record, so that makes skipHeader=false not possible without making the parser track the first record separately. In the interest of simplicity, I am going to make it simple: if you use withHeader of any kind, then the firs

Re: [CSV] Headers and the first record

2013-07-30 Thread Gary Gregory
Hi All: I see now, the behavior is different depending on what you pass to withHeader()! Confusing indeed. If you call withHeader with Strings, the first line is not read and it is returned as a record. If you call withHeader with no arguments, the first line _is_ read and it is NOT returned as

[CSV] Headers and the first record

2013-07-30 Thread Gary Gregory
Hi All: I have Excel files with headers. So I use withHeaders() of course to map the headers. When I call parser.iterator().next(), the first record is the header record, not data. I always have to skip this first line since it is not data. I wonder if: 1) We should automatically skip the head

Re: [csv] Headers

2012-03-15 Thread Emmanuel Bourg
Le 15/03/2012 08:55, Benedikt Ritter a écrit : I'm not sure if I understand the approach completely. The Header can not be accessed as a CSVRecord, right? CSVRecords know the header values through get(string). What happens if the format does not support a header? UnsupportedOperationException?

Re: [csv] Headers

2012-03-15 Thread Benedikt Ritter
Am 15. März 2012 01:58 schrieb Emmanuel Bourg : > There is another alternative, we might replace the records returned as a > String[] by a CSVRecord class able to access the fields by id or by name. > This would be similar to a JDBC resultset (except for the looping logic) > sounds good. This disc

Re: [csv] Headers

2012-03-14 Thread Emmanuel Bourg
There is another alternative, we might replace the records returned as a String[] by a CSVRecord class able to access the fields by id or by name. This would be similar to a JDBC resultset (except for the looping logic) This avoids the duplication of the parser, which might still be generifie

Re: [csv] Headers

2012-03-13 Thread Benedikt Ritter
I think transforming the result of the parse process into instances of some class is a different concern. That should not be part of as CSVParser. In Hibernate they use ResultTransformers for this purpose [1]. I think we should separate this concerns as well. [1] http://docs.jboss.org/hibernate/o

Re: [csv] Headers

2012-03-13 Thread Emmanuel Bourg
Le 13/03/2012 09:56, sebb a écrit : It needs to be possible to access columns by index without having to use annotations. That's still possible with the low level API. I'm just exploring the features I would expect of a bean mapping. Emmanuel Bourg smime.p7s Description: S/MIME Cryptogra

Re: [csv] Headers

2012-03-13 Thread sebb
On 13 March 2012 08:52, Emmanuel Bourg wrote: > Le 13/03/2012 09:21, Jörg Schaible a écrit : > > >>> If the file has a header, the fields are matched by attribute name, and >>> an annotation can override the name of the column associated to an >>> attribute. >> >> >> Yeah, but that's not required.

Re: [csv] Headers

2012-03-13 Thread Emmanuel Bourg
Le 13/03/2012 09:21, Jörg Schaible a écrit : If the file has a header, the fields are matched by attribute name, and an annotation can override the name of the column associated to an attribute. Yeah, but that's not required. Just because you can read the names of the columns does not mean tha

Re: [csv] Headers

2012-03-13 Thread Jörg Schaible
Emmanuel Bourg wrote: > Le 13/03/2012 00:56, sebb a écrit : > >>> 1. Do nothing and address it in the next release with the bean mapping. >>> Parsing the file would then look like this: >>> >>> CSVFormat format = CSVFormat.DEFAULT.withType(Person.class); >>> for (Person person : format.parse

Re: [csv] Headers

2012-03-13 Thread Luc Maisonobe
Le 13/03/2012 00:56, sebb a écrit : > On 12 March 2012 22:11, Emmanuel Bourg wrote: >> [csv] is missing some elements to ease the use of headers. I have no clear >> idea on how to address this, here are my thoughts. >> >> Headers are used when the fields are accessed by the column name rather than

Re: [csv] Headers

2012-03-12 Thread Emmanuel Bourg
Le 13/03/2012 00:56, sebb a écrit : 1. Do nothing and address it in the next release with the bean mapping. Parsing the file would then look like this: CSVFormat format = CSVFormat.DEFAULT.withType(Person.class); for (Person person : format.parse(in)) { persons.add(person); } Do

Re: [csv] Headers

2012-03-12 Thread sebb
On 12 March 2012 22:11, Emmanuel Bourg wrote: > [csv] is missing some elements to ease the use of headers. I have no clear > idea on how to address this, here are my thoughts. > > Headers are used when the fields are accessed by the column name rather than > by the index. This provides some flexib

[csv] Headers

2012-03-12 Thread Emmanuel Bourg
[csv] is missing some elements to ease the use of headers. I have no clear idea on how to address this, here are my thoughts. Headers are used when the fields are accessed by the column name rather than by the index. This provides some flexibility because the input file can be slightly modifie