Hi, On 10/19/07, Keith R. Bennett <[EMAIL PROTECTED]> wrote: > As a short term marginal improvement though, until we can do the full > solution, would it make sense to consider .csv plain text? I'm assuming > that's just a matter of adding to tika-mimetypes.xml a line: > > <glob pattern="*.csv" />
Sounds OK. File an improvement request for that and feel free to commit the change. > Would doing so cause other problems though? Should we consider non-.txt and > non-.asc files binary unless byte header detection reports they're plain > text? Perhaps we should wait until that's working instead of adding the > .csv glob pattern? I don't see how that could cause problems. Anything that's a sequence of characters should be fine as text/plain unless we have a more specific type available. In fact, based on http://www.apache.org/dev/svn-eol-style.txt we coud add the following as text/plain globs: <glob pattern="INSTALL"/> <glob pattern="KEYS"/> <glob pattern="Makefile"/> <glob pattern="README"/> <glob pattern="abs-linkmap"/> <glob pattern="abs-menulinks"/> <glob pattern="*.aart"/> <glob pattern="*.ac"/> <glob pattern="*.am"/> <glob pattern="*.bat"/> <glob pattern="*.c"/> <glob pattern="*.cat"/> <glob pattern="*.cgi"/> <glob pattern="*.classpath"/> <glob pattern="*.cmd"/> <glob pattern="*.conf"/> <glob pattern="*.config"/> <glob pattern="*.cpp"/> <glob pattern="*.css"/> <glob pattern="*.cwiki"/> <glob pattern="*.data"/> <glob pattern="*.dcl"/> <glob pattern="*.dtd"/> <glob pattern="*.egrm"/> <glob pattern="*.ent"/> <glob pattern="*.ft"/> <glob pattern="*.fn"/> <glob pattern="*.fv"/> <glob pattern="*.grm"/> <glob pattern="*.g"/> <glob pattern="*.h"/> <glob pattern=".htaccess"/> <glob pattern="*.ihtml"/> <glob pattern="*.in"/> <glob pattern="*.java"/> <glob pattern="*.jmx"/> <glob pattern="*.jsp"/> <glob pattern="*.js"/> <glob pattern="*.junit"/> <glob pattern="*.jx"/> <glob pattern="*.manifest"/> <glob pattern="*.m4"/> <glob pattern="*.mf"/> <glob pattern="*.MF"/> <glob pattern="*.meta"/> <glob pattern="*.mod"/> <glob pattern="*.n3"/> <glob pattern="*.pen"/> <glob pattern="*.pl"/> <glob pattern="*.pm"/> <glob pattern="*.pod"/> <glob pattern="*.pom"/> <glob pattern="*.project"/> <glob pattern="*.properties"/> <glob pattern="*.py"/> <glob pattern="*.rb"/> <glob pattern="*.rdf"/> <glob pattern="*.rnc"/> <glob pattern="*.rng"/> <glob pattern="*.rnx"/> <glob pattern="*.roles"/> <glob pattern="*.rss"/> <glob pattern="*.sh"/> <glob pattern="*.sql"/> <glob pattern="*.svg"/> <glob pattern="*.tld"/> <glob pattern="*.types"/> <glob pattern="*.vm"/> <glob pattern="*.vsl"/> <glob pattern="*.wsdd"/> <glob pattern="*.wsdl"/> <glob pattern="*.xargs"/> <glob pattern="*.xcat"/> <glob pattern="*.xconf"/> <glob pattern="*.xegrm"/> <glob pattern="*.xgrm"/> <glob pattern="*.xlex"/> <glob pattern="*.xlog"/> <glob pattern="*.xmap"/> <glob pattern="*.xroles"/> <glob pattern="*.xsamples"/> <glob pattern="*.xsd"/> <glob pattern="*.xsl"/> <glob pattern="*.xslt"/> <glob pattern="*.xsp"/> <glob pattern="*.xul"/> <glob pattern="*.xweb"/> <glob pattern="*.xwelcome"/> BR, Jukka Zitting
