Subject: Re: Extracting Text from embedded images in PDF docs
Hi Tim
Sure, once I get an initial PR ready I'll send an update and I'll explain what
I did for a start and we will discuss it further
Beam, tho.
:)
-Original Message-
From: Sergey Beryozkin [mailto:sberyoz...@gmail.com]
Sent: Friday, May 19, 2017 12:40 PM
To: user@tika.apache.org
Subject: Re: Extracting Text from embedded images in PDF docs
Hi Tim
On 19/05/17 17:31, Allison, Timothy B. wrote:
The autoscaling feature of Beam a
This is fantastic news! Let me know if I can help...I know _nothing_ about
Beam, tho.
:)
-Original Message-
From: Sergey Beryozkin [mailto:sberyoz...@gmail.com]
Sent: Friday, May 19, 2017 12:40 PM
To: user@tika.apache.org
Subject: Re: Extracting Text from embedded images in PDF docs
ira/browse/BEAM-2328
It will take me few more weeks to create a PR,
Thanks, Sergey
-Original Message-
From: Sergey Beryozkin [mailto:sberyoz...@gmail.com]
Sent: Friday, May 19, 2017 12:27 PM
To: user@tika.apache.org
Subject: Re: Extracting Text from embedded images in PDF docs
Hi Chr
Message-
From: Sergey Beryozkin [mailto:sberyoz...@gmail.com]
Sent: Friday, May 19, 2017 12:27 PM
To: user@tika.apache.org
Subject: Re: Extracting Text from embedded images in PDF docs
Hi Chris
I'm getting nervous now, what will happen to me if it will not work out in the
end :-). Though, it
💯
On 5/19/17, 9:27 AM, "Sergey Beryozkin" wrote:
Hi Chris
I'm getting nervous now, what will happen to me if it will not work out
in the end :-). Though, it actually does work, for me at least :-)
Cheers, Sergey
On 19/05/17 17:23, Mattmann, Chris A (3010) wrote:
> Well, I'm trying to integrate Tika with Apache Beam,
Awesome! I saw two fantastic Beam talks at ApacheCon (two days ago?). I won't
tell anyone. ;)
Hi Chris
I'm getting nervous now, what will happen to me if it will not work out
in the end :-). Though, it actually does work, for me at least :-)
Cheers, Sergey
On 19/05/17 17:23, Mattmann, Chris A (3010) wrote:
Thanks Sergey what an awesome surprise you are the best!
+
Thanks Sergey what an awesome surprise you are the best!
++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010)
Manager, NSF & Open Source Projects Formulation and Development Offices (8212
Hi Tim
On 19/05/17 16:47, Allison, Timothy B. wrote:
Yes I was asking about it as I thought it was confusing it did not work
- I saw you following up on this possible issue in the other email...
Y, I agree. That _should_ work.
I'm doing some work with Tika now so it was of an immediate inte
>Yes I was asking about it as I thought it was confusing it did not work
>- I saw you following up on this possible issue in the other email...
Y, I agree. That _should_ work.
>I'm doing some work with Tika now so it was of an immediate interest to me...
Yay! What are you working on?
>Sure. By
On 19/05/17 16:25, Allison, Timothy B. wrote:
and when is "extractInlineImages" actually effective ?
Not sure I understand the question exactly?
If the question is "why didn't extractInlineImages work on a specific
document"? That's probably a bug or could be user error in the configuratio
>>and when is "extractInlineImages" actually effective ?
Not sure I understand the question exactly?
If the question is "why didn't extractInlineImages work on a specific
document"? That's probably a bug or could be user error in the
configuration...either way, please follow up and help us so
rom embedded images in PDF docs
Got it working. In case someone else hits the same issue, here is my config
file... Well... That was obvious :D
ocr_and_text
David
Le 19 mai 2017 Ã 10:59, David Pilato mailto
Extracting Text from embedded images in PDF docs
Got it working. In case someone else hits the same issue, here is my
config file... Well... That was obvious :D
/
/<*properties*>
<*parsers*>
<*parser class="org.apache.tika.parser.DefaultParser"*/>
<*parser cl
documentation so that you don’t waste an hour?
From: David Pilato [mailto:da...@pilato.fr]
Sent: Friday, May 19, 2017 5:55 AM
To: user@tika.apache.org
Subject: Re: Extracting Text from embedded images in PDF docs
Got it working. In case someone else hits the same issue, here is my config
file... Well... That
tika.apache.org"
Subject: Re: Extracting Text from embedded images in PDF docs
Got it working. In case someone else hits the same issue, here is my config
file... Well... That was obvious :D
  Â
  Â
  Â
  Â
   ocr_and_text
  Â
  Â
  Â
Got it working. In case someone else hits the same issue, here is my config
file... Well... That was obvious :D
ocr_and_text
David
> Le 19 mai 2017 à 10:59, David Pilato a écrit :
>
> So I saw in debug mode tha
So I saw in debug mode that indeed config.getExtractInlineImages() is false so
I'm going to check my config.
:D
David
> Le 18 mai 2017 à 22:18, David Pilato a écrit :
>
> Hey guys
>
>
> First post here ;)
>
> I'm trying to play with OCR with Tika. I installed Tesseract and I can
> extract
Hey guys
First post here ;)
I'm trying to play with OCR with Tika. I installed Tesseract and I can extract
text from a PNG image.
I created a PDF document with this image embedded and I'm trying now to extract
the text out of it.
I added this configuration but I guess I'm doing it wrong:
20 matches
Mail list logo