Guys -
These are great ideas, and I look forward to having those features.
As a short term marginal improvement though, until we can do the full
solution, would it make sense to consider .csv plain text? I'm assuming
that's just a matter of adding to tika-mimetypes.xml a line:
<glob pattern="*.csv" />
to:
<mime-type type="text/plain">
<magic priority="50">
<match value="This is TeX," type="string" offset="0" />
<match value="This is METAFONT," type="string"
offset="0" />
</magic>
<glob pattern="*.txt" />
<glob pattern="*.asc" />
</mime-type>
Would doing so cause other problems though? Should we consider non-.txt and
non-.asc files binary unless byte header detection reports they're plain
text? Perhaps we should wait until that's working instead of adding the
.csv glob pattern?
- Keith
--
View this message in context:
http://www.nabble.com/Add-CSV-as-a-plain-text-extension--tf4649726.html#a13296449
Sent from the Apache Tika - Development mailing list archive at Nabble.com.