Y, that’s my belief. As of now, we’re treating them as text files, which can lead to some really long = bogus tokens in Lucene/Solr with analyzers that don’t split on commas. ☹
Detection without filename would be difficult. From: lewis john mcgibbney [mailto:lewi...@apache.org] Sent: Friday, June 19, 2015 9:59 AM To: user@tika.apache.org Subject: CSV Parser in Tika Hi Folks, Am I correct in saying that we can't detect CSV in Tika? We import commons-csv in tika-parsers/pom.xml, however I don't see a csv package and registered parser. Also, when I use the webapp I get the following for a test csv file with semicolon ';' separators Content-Encoding: ISO-8859-1 Content-Length: 217 Content-Type: text/plain; charset=ISO-8859-1 X-Parsed-By: org.apache.tika.parser.DefaultParser resourceName: test-semicolon.csv Any comments please? Thanks Lewis