Hi
Did your probe conclude a result?
On Wed, Nov 2, 2011 at 4:40 AM, Ken Krugler wrote:
> I know some of the original team members - I could ask.
>
> Are there specific questions, or just "is anybody still minding the fire"?
>
> -- Ken
>
> On Nov 1, 2011, at 2:43pm, Nick Burch wrote:
>
> > On Tue
apache.pdfbox
> pdfbox
> 1.5.0
>
>
> Change also a version tag to the appropriate number. Then, go to
> ../tika-site (top level directory of tika project) and rerun mvn clean
> install.
>
> If all were right you will have a new tika .
>
> Hope it helps,
>
>
jar file?
On Mon, Oct 31, 2011 at 10:49 PM, Robert Muir wrote:
> Do you have ICU4J jar in your classpath in both situations?
>
> On Mon, Oct 31, 2011 at 1:35 PM, ahmad ajiloo
> wrote:
> > Hello
> > When I use Tika for extracting my persian pdf files, all the characters
Hello
I have an edited file in pdfbox project and want to rebuild Tika with
this new file. But i can't find location of pdfbox sources in Tika
sources to change that. can anyone help me?
thanks
Hello
When I use Tika for extracting my persian pdf files, all the characters
will be extracted vice versa. I mean that the characters showed from
beginning of the line to the end, but from left to right. However when I
use Tika gui via Nutch there is no mistake and the output text is
right-to-left
[
https://issues.apache.org/jira/browse/TIKA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ahmad Ajiloo updated TIKA-713:
--
Attachment: Simple3.pdf
Complex.pdf
I attached this two files for more researching
[
https://issues.apache.org/jira/browse/TIKA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140134#comment-13140134
]
Ahmad Ajiloo commented on TIKA-713:
---
I'm testing new Encoding.java file w
[
https://issues.apache.org/jira/browse/TIKA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ahmad Ajiloo updated TIKA-713:
--
Attachment: Simple2.pdf
> Tika can not parse all of the persian pdf fi
[
https://issues.apache.org/jira/browse/TIKA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121376#comment-13121376
]
Ahmad Ajiloo commented on TIKA-713:
---
Thanks a lot
> Tika can no
[
https://issues.apache.org/jira/browse/TIKA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ahmad Ajiloo updated TIKA-713:
--
Attachment: ebrat.pdf
this is a persian pdf file that Tika can't parse it.
> Tika can not pars
Versions: 0.9
Reporter: Ahmad Ajiloo
Fix For: 0.9
Hello
I used Tika (of course in Nutch) to parse some persian pdf files. some of the
files clearly transformed to a plain text. but about some of them, output was
corrupted. I used ICU4J v4 library and the text changed to right
11 matches
Mail list logo