Re: CSV Parser in Tika

Chris Mattmann Wed, 01 Jul 2015 15:08:58 -0700

Hey Tim, and Lewis,

My students and I did a Tika TSVParser and a JSONContentHandler
in my course a few semesters ago. I am going to whip it up
and contribute it back.


Cheers,
Chris

—
Chris Mattmann
chris.mattm...@gmail.com






-----Original Message-----
From: "Allison, Timothy B." <talli...@mitre.org>
Reply-To: <user@tika.apache.org>
Date: Friday, June 19, 2015 at 7:27 AM
To: "user@tika.apache.org" <user@tika.apache.org>
Subject: RE: CSV Parser in Tika

>Y, that’s my belief.
> 
>As of now, we’re treating them as text files, which can lead to some
>really long = bogus tokens in Lucene/Solr with analyzers that don’t split
>on commas.
>L
> 
>Detection without filename would be difficult.
> 
> 
> 
> 
> 
>From: lewis john mcgibbney [mailto:lewi...@apache.org]
>
>Sent: Friday, June 19, 2015 9:59 AM
>To: user@tika.apache.org
>Subject: CSV Parser in Tika
> 
>Hi Folks,
>
>Am I correct in saying that we can't detect CSV in Tika?
>
>We import commons-csv in tika-parsers/pom.xml, however I don't see a csv
>package and registered parser.
>
>Also, when I use the webapp I get the following for a test csv file with
>semicolon ';' separators
>
>Content-Encoding: ISO-8859-1
>Content-Length: 217
>Content-Type: text/plain; charset=ISO-8859-1
>X-Parsed-By: org.apache.tika.parser.DefaultParser
>resourceName: test-semicolon.csv
>
>Any comments please?
>
>Thanks
>
>Lewis
> 
>
>
>
>
>
>
>
>

Re: CSV Parser in Tika

Reply via email to