[ https://issues.apache.org/jira/browse/TIKA-94?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lewis John McGibbney resolved TIKA-94. -------------------------------------- Resolution: Fixed 4923 days after this ticket was originally I'm glad to say that it's been merged into main! The implementation can be improved. For example, we may deprecate the tika.transcribe API and simply `hide` the transcribe functionality behind tika.parse. At the end of the day we we just want to interpret a file and get text back so why introduce a new API to do that? That however remains as follow-on work. :) Excellent work and kudos to the following contributors (alphabetical order) Ashank Behara Minh Phan Nic Prate Rohan Surana Sahan Kumarasinghe This feature was developed through the HackIllinois Program - https://www.hackillinois.org/ which was very well organized and well executed. > Speech-to-text transcription > ---------------------------- > > Key: TIKA-94 > URL: https://issues.apache.org/jira/browse/TIKA-94 > Project: Tika > Issue Type: New Feature > Components: parser > Reporter: Jukka Zitting > Assignee: Lewis John McGibbney > Priority: Minor > Labels: new-parser > > Like OCR for image files (TIKA-93), we could try using speech recognition to > extract text content (where available) from audio (and video!) files. > The CMU Sphinx engine (http://cmusphinx.sourceforge.net/) looks promising and > comes with a friendly license. -- This message was sent by Atlassian Jira (v8.3.4#803005)