Re: A problem in the right-to-left languages

2011-11-06 Thread Ahmad Ajiloo
Hi Did your probe conclude a result? On Wed, Nov 2, 2011 at 4:40 AM, Ken Krugler wrote: > I know some of the original team members - I could ask. > > Are there specific questions, or just "is anybody still minding the fire"? > > -- Ken > > On Nov 1, 2011, at 2:43pm, Nick Burch wrote: > > > On Tue

Re: A problem in the right-to-left languages

2011-11-01 Thread Ken Krugler
I know some of the original team members - I could ask. Are there specific questions, or just "is anybody still minding the fire"? -- Ken On Nov 1, 2011, at 2:43pm, Nick Burch wrote: > On Tue, 1 Nov 2011, Robert Muir wrote: >> Well as an alternative for them committing the ebcdic detection, per

Re: A problem in the right-to-left languages

2011-11-01 Thread Nick Burch
On Tue, 1 Nov 2011, Robert Muir wrote: Well as an alternative for them committing the ebcdic detection, perhaps we could look at the Charset detection apis and propose some API additions so that users (like Tika) can plug in custom detectors? In theory it should be pluggable, but I seem to rec

Re: A problem in the right-to-left languages

2011-11-01 Thread Robert Muir
On Tue, Nov 1, 2011 at 12:47 PM, Nick Burch wrote: > I've not had any luck with this - I tried submitting some of our changes > back (eg the ebcidic detector) but they didn't seem to want them > Well as an alternative for them committing the ebcdic detection, perhaps we could look at the Charset

Re: A problem in the right-to-left languages

2011-11-01 Thread Nick Burch
On Tue, 1 Nov 2011, Robert Muir wrote: it would be nice to look at trying to remove the forked charsetdetection code too (whatever changes tika has, get them into ICU, etc) I've not had any luck with this - I tried submitting some of our changes back (eg the ebcidic detector) but they didn't s

Re: A problem in the right-to-left languages

2011-11-01 Thread Robert Muir
On Tue, Nov 1, 2011 at 9:14 AM, Jukka Zitting wrote: > Hi, > > On Tue, Nov 1, 2011 at 1:48 PM, Robert Muir wrote: >> I really think tika should include the parts of icu4j it depends on. >> Often open source projects are hesitant to include icu jar because of >> its size, but thats silly since the

Re: A problem in the right-to-left languages

2011-11-01 Thread Jukka Zitting
Hi, On Tue, Nov 1, 2011 at 1:48 PM, Robert Muir wrote: > I really think tika should include the parts of icu4j it depends on. > Often open source projects are hesitant to include icu jar because of > its size, but thats silly since the size is just a catch-all. > We can use the webapp to make a s

Re: A problem in the right-to-left languages

2011-11-01 Thread Michael McCandless
On Tue, Nov 1, 2011 at 8:48 AM, Robert Muir wrote: > I really think tika should include the parts of icu4j it depends on. > Often open source projects are hesitant to include icu jar because of > its size, but thats silly since the size is just a catch-all. > We can use the webapp to make a small

Re: A problem in the right-to-left languages

2011-11-01 Thread Robert Muir
On Tue, Nov 1, 2011 at 6:24 AM, Ahmad Ajiloo wrote: > Yes there is a difference. In Nutch we have a ICU4J library in lib > directory. but there is no ICU4J lib or class file in a single tika jar > file. for example in pdfbox jar file we have this path: com.ibm.icu . but > there is no com.ibm path

Re: A problem in the right-to-left languages

2011-11-01 Thread Ahmad Ajiloo
Yes there is a difference. In Nutch we have a ICU4J library in lib directory. but there is no ICU4J lib or class file in a single tika jar file. for example in pdfbox jar file we have this path: com.ibm.icu . but there is no com.ibm path in a tika jar file. How can i add ICU4J library to the tika j

Re: A problem in the right-to-left languages

2011-10-31 Thread Robert Muir
Do you have ICU4J jar in your classpath in both situations? On Mon, Oct 31, 2011 at 1:35 PM, ahmad ajiloo wrote: > Hello > When I use Tika for extracting my persian pdf files, all the characters will > be extracted vice versa. I mean that the characters showed from beginning of > the line to the

A problem in the right-to-left languages

2011-10-31 Thread ahmad ajiloo
Hello When I use Tika for extracting my persian pdf files, all the characters will be extracted vice versa. I mean that the characters showed from beginning of the line to the end, but from left to right. However when I use Tika gui via Nutch there is no mistake and the output text is right-to-left