Tika OneNote Support

2012-11-14 Thread 122jxgcn
Hello, Is there anyone who worked on extracting contents from MS OneNote file? (*.one) It will be great if someone can tell me how to work with parsing OneNote files programatically. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Tika-OneNote-Support-tp4020393.htm

How can I let Tika know the resource name?

2012-08-13 Thread 122jxgcn
Hello, I'm using Solr's ExtractingRequestHandler to let Tika know the name of the file when indexing. I'm currently sending HTTP request something like /update/extract?stream.file=#{filepath}&literal.id=#{filepath}&resource.name=#{resource_name}&commit=true Will setting the resource.name variabl

Detecting content type with file extension

2012-08-07 Thread 122jxgcn
Hello, I'm having a trouble with auto detection of custom content type. I have done some debugging and found out some sequences. So inside the test, when AutoDetectParser() gets called, detector tries to detect MediaType. And inside the detect function of MimeTypes.java (still not sure if this is

Re: AutoDetectParser not picking up custom parser

2012-08-06 Thread 122jxgcn
Nick Burch-2 wrote > > On Mon, 6 Aug 2012, 122jxgcn wrote: >> In order to AutoDetectParser to pick up my parser, I followed the >> instructions listed in >> http://tika.apache.org/1.1/parser_guide.html#List_the_new_parser >> However, AutoDetectParser is not

AutoDetectParser not picking up custom parser

2012-08-06 Thread 122jxgcn
Hello, thanks to the users who helped with my custom parser. I have one last question (hopefully) regarding my parser. In order to AutoDetectParser to pick up my parser, I followed the instructions listed in http://tika.apache.org/1.1/parser_guide.html#List_the_new_parser However, AutoDetectParser

Executing file inside Parser

2012-08-02 Thread 122jxgcn
Hi, I'm trying to execute binary file inside my custom parser. I put binary file on directory tika-parsers/src/main/resources/bin/hwp2xml.bin and I'm doing something like on my parser if (!tstream.hasFile()) File f = tstream.getFile(); Process ps = Runtime.getRuntime().exec("/bin/hwp2xml.bin",

Re: Custom parser error

2012-07-31 Thread 122jxgcn
Hi Nick, sorry to bother again but I'm not quite sure of what you have said. Nick Burch-2 wrote > > On Tue, 31 Jul 2012, 122jxgcn wrote: > If your TikaInputStream lacks a file, and getFile is called, one will > automatically be created for you. (That's part of the point!

Re: Custom parser error

2012-07-31 Thread 122jxgcn
Hi Nick, I tried TikaInputStream.get() and tstream is no longer null. But it seems that tstream.hasFile() is null. I'm pretty sure I'm loading the file right, as I did same thing with parser for pdf. -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-parser-error-tp39983

Custom parser error

2012-07-31 Thread 122jxgcn
Hi, I'm continuing my question from http://lucene.472066.n3.nabble.com/Convert-file-before-Tika-processes-it-td3990629.html this post So, I wrote some code and test, but it's not passing On the test, I did something like InputStream stream = HWPParserTest.class.getResourceAsStream( "/t

Tika build error using Maven

2012-07-11 Thread 122jxgcn
Hi, I'm trying to build Apache Tika from the source, but when I run mvn -e clean install on the base directory, I get the following compile error message error: error reading /home/park/.m2/repository/org/apache/poi/poi/3.8-beta5/poi-3.8-beta5.jar; error in opening zip file [INFO] Prepari

Convert file before Tika processes it?

2012-06-20 Thread 122jxgcn
Hi, I'm currently working on Tika to properly process custom file type (*.hwp file) I have a binary executable file which converts hwp file into xml file. I'm not sure how can I include this binary file so that when Tika encounters hwp file, it can automatically convert in to xml file using the bin