I have an incoming xlsx file, with many sheets. I am trying to use ConvertExcelToCSVProcessor processor to extract the sheets. It is currently erroring when it extracts the header.
Here is what the header is in one of the sheets (commas added by me for clarity): Order,FAO,ISO3,Country/territory,Region,Sub-Region,Inc I,Inc II,Notes I,Notes II,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,,1961 This is what it should be: Order,FAO,ISO3,Country/territory,Region,Sub-Region,Inc I,Inc II,Notes I,Notes II,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019, *2020,,1961* ,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020 Note that there is a gap column between the two sections - one section provides data in one unit of measure, the second in hectares of arable land. This is how I have my Convert processor configured: Sheets to Extract No value setNumber of Rows to Skip 0 Columns To Skip No value setFormat Cell Values true CSV Format Microsoft Excel Include Header Line true How must I tune this configuration to successfully extract all the header fields? And why would it make it past that gap column, only to choke after the first column of the second section? It seems that if it was going to give up the ghost, it would do it when it encounters that gap column. How does this processor identify the row that is the header? I'd like to know what it does in case I need to write my own Groovy script to pull this data. What characteristic does it use? What heuristic does it apply to skip all the comment rows that sometimes precede the header row. Thank you.