Guys -

These are great ideas, and I look forward to having those features.

As a short term marginal improvement though, until we can do the full
solution, would it make sense to consider .csv plain text?  I'm assuming
that's just a matter of adding to tika-mimetypes.xml a line:

                <glob pattern="*.csv" />

to:

        <mime-type type="text/plain">
                <magic priority="50">
                        <match value="This is TeX," type="string" offset="0" />
                        <match value="This is METAFONT," type="string" 
offset="0" />
                </magic>
                <glob pattern="*.txt" />
                <glob pattern="*.asc" />
        </mime-type>

Would doing so cause other problems though?  Should we consider non-.txt and
non-.asc files binary unless byte header detection reports they're plain
text?  Perhaps we should wait until that's working instead of adding the
.csv glob pattern?

- Keith


-- 
View this message in context: 
http://www.nabble.com/Add-CSV-as-a-plain-text-extension--tf4649726.html#a13296449
Sent from the Apache Tika - Development mailing list archive at Nabble.com.

Reply via email to