[ https://issues.apache.org/jira/browse/TIKA-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637302#action_12637302 ]
Jukka Zitting commented on TIKA-167: ------------------------------------ The presentation looks great! > What does TIKA mean? (literally) "Tika" comes from the name of Jérôme Charron's (who first proposed the project in 2006) son. > How many registered media types; glob/magic header (slide 7) We currently have 77 registered mime types (plus 36 aliases), 179 glob patterns, and 18 magic patterns. > How many supported media types (slide 7) Depends on what you mean by "supported". We currently have 15 parser classes configured for a total of 66 mime types. > How many committers? (slide 7) Six, plus Dave Meikle who was just voted in. > Do we have a download/chackout history for Tika? (eventually slide 10) Not really. I could try to dig something up if you want, though I expect the numbers to be fairly low still as we've kept a relatively low profile so far. > Future goals; to be completed? (slide 31) Main goals off the top of my head, see the issue tracker for more: - Improved metadata handling, perhaps with XMP support - Better configurability of Tika - Improved media type registry - More parser implementations > Next parsers to be implemented? (slide 32) - Office Open XML based on a POI upgrade - Structural parsers (i.e. more than just a flat text stream) for PDF, Word, OpenDocument, etc. - More multimedia formats: image, audio, video > Who uses Tika? projects using Tika (slide 33) > Integration scenarios with other Lucene projects (slide 34) Not that many now that we're still incubating. Beyond Nutch we have at least Apache Jackrabbit with a sandbox component with Tika support, the Droids lab (to be incubated) that is currently adding Tika integration, and the UIMA project (incubating) that has a proposed patch with Tika support. > Related projects: others? (slide 34) Aperture (http://aperture.sourceforge.net/) is another project with similar (though wider) goals. > Tika presentation @ ApacheConUs 2008: review > -------------------------------------------- > > Key: TIKA-167 > URL: https://issues.apache.org/jira/browse/TIKA-167 > Project: Tika > Issue Type: Task > Components: documentation > Affects Versions: 0.2-incubating > Reporter: Paolo Mottadelli > Attachments: ApacheConUS2008_Tika_PaoloMottadelli.pdf > > > As I have not been involved in the development process, it would be great if > someone could review the Tika part of my presentation. I am attaching a rough > version of my slides concerning the Tika presentation and listing some *** > Open Points ***. Please, let me know if I am out of scope in some parts and > if I can get better anyhow. > *** Open Points: *** > * What does TIKA mean? (literally) > * How many registered media types; glob/magic header (slide 7) > * How many supported media types (slide 7) > * How many committers? (slide 7) > * Do we have a download/chackout history for Tika? (eventually slide 10) > * Future goals; to be completed? (slide 31) > * Next parsers to be implemented? (slide 32) > * Who uses Tika? projects using Tika (slide 33) > * Integration scenarios with other Lucene projects (slide 34) > * Related projects: others? (slide 34) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.