[jira] [Updated] (TIKA-1437) encoding issue in AutoDetectReader

2014-10-05 Thread Shuai Liu (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuai Liu updated TIKA-1437: Attachment: ef.jpg e9.jpg the e9.jpg is a screenshot of the raw tsv file; you can see the hex

[jira] [Updated] (TIKA-1437) encoding issue in AutoDetectReader

2014-10-05 Thread Shuai Liu (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuai Liu updated TIKA-1437: Description: We are having an encoding problem with Tika AutoDetectReader; we are using AutoDetectReader to r

[jira] [Updated] (TIKA-1437) encoding issue in AutoDetectReader

2014-10-05 Thread Shuai Liu (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuai Liu updated TIKA-1437: Attachment: computrabajo-ar-20121108.tsv The problem tsv file with which we are having the encoding problem.

[jira] [Updated] (TIKA-1437) encoding issue in AutoDetectReader

2014-10-05 Thread Shuai Liu (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuai Liu updated TIKA-1437: Attachment: EncodingProblem.java Encoding that reads a bunch of tsv files from a directory, and print out the

[jira] [Created] (TIKA-1437) encoding issue in AutoDetectReader

2014-10-05 Thread Shuai Liu (JIRA)
Shuai Liu created TIKA-1437: --- Summary: encoding issue in AutoDetectReader Key: TIKA-1437 URL: https://issues.apache.org/jira/browse/TIKA-1437 Project: Tika Issue Type: Bug Components: det

Re: [PDFParser] - patch proposal

2014-10-05 Thread Stefano Fornari
done, thanks! https://issues.apache.org/jira/browse/TIKA-1436 Ste On Sun, Oct 5, 2014 at 6:40 PM, Tyler Palsulich wrote: > Hi Stefano, > > Thank you for the patch and the reminder! Could you please create an issue > on the TIKA JIRA [0]? Or, if this patch corresponds to a particular issue, > a

[jira] [Updated] (TIKA-1436) improvement to PDFParser

2014-10-05 Thread Stefano Fornari (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefano Fornari updated TIKA-1436: -- Attachment: ste-20140927.patch > improvement to PDFParser > > >

[jira] [Created] (TIKA-1436) improvement to PDFParser

2014-10-05 Thread Stefano Fornari (JIRA)
Stefano Fornari created TIKA-1436: - Summary: improvement to PDFParser Key: TIKA-1436 URL: https://issues.apache.org/jira/browse/TIKA-1436 Project: Tika Issue Type: Improvement Compo

Re: [PDFParser] - patch proposal

2014-10-05 Thread Tyler Palsulich
Hi Stefano, Thank you for the patch and the reminder! Could you please create an issue on the TIKA JIRA [0]? Or, if this patch corresponds to a particular issue, attach your patch to that issue? Thank you! Tyler [0] https://issues.apache.org/jira/browse/TIKA On Oct 5, 2014 5:57 AM, "Stefano Forn

Re: [PDFParser] - patch proposal

2014-10-05 Thread Stefano Fornari
hi, a friendly reminder to get a feedback on this. Ste On Sat, Sep 27, 2014 at 3:08 PM, Stefano Fornari wrote: > Hi All, > with regards to the thread "[PDFParser] - read limited number of > characters" on Mar 29, I would like to propose the attached patch. I > noticed that in Tika 1.6 there h