Hey Tim, and Lewis, My students and I did a Tika TSVParser and a JSONContentHandler in my course a few semesters ago. I am going to whip it up and contribute it back.
Cheers, Chris — Chris Mattmann chris.mattm...@gmail.com -----Original Message----- From: "Allison, Timothy B." <talli...@mitre.org> Reply-To: <user@tika.apache.org> Date: Friday, June 19, 2015 at 7:27 AM To: "user@tika.apache.org" <user@tika.apache.org> Subject: RE: CSV Parser in Tika >Y, that’s my belief. > >As of now, we’re treating them as text files, which can lead to some >really long = bogus tokens in Lucene/Solr with analyzers that don’t split >on commas. >L > >Detection without filename would be difficult. > > > > > >From: lewis john mcgibbney [mailto:lewi...@apache.org] > >Sent: Friday, June 19, 2015 9:59 AM >To: user@tika.apache.org >Subject: CSV Parser in Tika > >Hi Folks, > >Am I correct in saying that we can't detect CSV in Tika? > >We import commons-csv in tika-parsers/pom.xml, however I don't see a csv >package and registered parser. > >Also, when I use the webapp I get the following for a test csv file with >semicolon ';' separators > >Content-Encoding: ISO-8859-1 >Content-Length: 217 >Content-Type: text/plain; charset=ISO-8859-1 >X-Parsed-By: org.apache.tika.parser.DefaultParser >resourceName: test-semicolon.csv > >Any comments please? > >Thanks > >Lewis > > > > > > > > >