Hi
Did your probe conclude a result?
On Wed, Nov 2, 2011 at 4:40 AM, Ken Krugler wrote:
> I know some of the original team members - I could ask.
>
> Are there specific questions, or just "is anybody still minding the fire"?
>
> -- Ken
>
> On Nov 1, 2011, at 2:43pm, Nick Burch wrote:
>
> > On Tue
I know some of the original team members - I could ask.
Are there specific questions, or just "is anybody still minding the fire"?
-- Ken
On Nov 1, 2011, at 2:43pm, Nick Burch wrote:
> On Tue, 1 Nov 2011, Robert Muir wrote:
>> Well as an alternative for them committing the ebcdic detection, per
On Tue, 1 Nov 2011, Robert Muir wrote:
Well as an alternative for them committing the ebcdic detection, perhaps
we could look at the Charset detection apis and propose some API
additions so that users (like Tika) can plug in custom detectors?
In theory it should be pluggable, but I seem to rec
On Tue, Nov 1, 2011 at 12:47 PM, Nick Burch wrote:
> I've not had any luck with this - I tried submitting some of our changes
> back (eg the ebcidic detector) but they didn't seem to want them
>
Well as an alternative for them committing the ebcdic detection,
perhaps we could look at the Charset
On Tue, 1 Nov 2011, Robert Muir wrote:
it would be nice to look at trying to remove the forked charsetdetection
code too (whatever changes tika has, get them into ICU, etc)
I've not had any luck with this - I tried submitting some of our changes
back (eg the ebcidic detector) but they didn't s
On Tue, Nov 1, 2011 at 9:14 AM, Jukka Zitting wrote:
> Hi,
>
> On Tue, Nov 1, 2011 at 1:48 PM, Robert Muir wrote:
>> I really think tika should include the parts of icu4j it depends on.
>> Often open source projects are hesitant to include icu jar because of
>> its size, but thats silly since the
Hi,
On Tue, Nov 1, 2011 at 1:48 PM, Robert Muir wrote:
> I really think tika should include the parts of icu4j it depends on.
> Often open source projects are hesitant to include icu jar because of
> its size, but thats silly since the size is just a catch-all.
> We can use the webapp to make a s
On Tue, Nov 1, 2011 at 8:48 AM, Robert Muir wrote:
> I really think tika should include the parts of icu4j it depends on.
> Often open source projects are hesitant to include icu jar because of
> its size, but thats silly since the size is just a catch-all.
> We can use the webapp to make a small
On Tue, Nov 1, 2011 at 6:24 AM, Ahmad Ajiloo wrote:
> Yes there is a difference. In Nutch we have a ICU4J library in lib
> directory. but there is no ICU4J lib or class file in a single tika jar
> file. for example in pdfbox jar file we have this path: com.ibm.icu . but
> there is no com.ibm path
Yes there is a difference. In Nutch we have a ICU4J library in lib
directory. but there is no ICU4J lib or class file in a single tika jar
file. for example in pdfbox jar file we have this path: com.ibm.icu . but
there is no com.ibm path in a tika jar file.
How can i add ICU4J library to the tika j
Do you have ICU4J jar in your classpath in both situations?
On Mon, Oct 31, 2011 at 1:35 PM, ahmad ajiloo wrote:
> Hello
> When I use Tika for extracting my persian pdf files, all the characters will
> be extracted vice versa. I mean that the characters showed from beginning of
> the line to the
Hello
When I use Tika for extracting my persian pdf files, all the characters
will be extracted vice versa. I mean that the characters showed from
beginning of the line to the end, but from left to right. However when I
use Tika gui via Nutch there is no mistake and the output text is
right-to-left
12 matches
Mail list logo