Hi,

Your question is a little difficult to understand - it sounds like you are
saying on the one hand you have no OCR or image processing background, know
Java, and want to modify Tesseract toward some aim that you do not specify?

Tesseract as far as I understand is developed using C/C++ and not Java.
Only the Android JNI bindings would be Java.

You can find the Tesseract source code at:

https://github.com/tesseract-ocr/tesseract

In terms of concepts you should read "An Overview of the Tesseract OCR
Engine" written by Tesseract's lead Ray Smith as it will give you insight
into the algorithms that are employed for its OCR.

http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/33418.pdf

Further concepts for algorithms can be found in the "Techniques" section at:

https://en.wikipedia.org/wiki/Optical_character_recognition

Sounds like an uphill struggle to me but I wish you luck!

Cheers


On 15 June 2016 at 07:28, ravi katiyar <ravirange...@gmail.com> wrote:

> Hello All,
>
> I am new to the world of OCR and image processing as well. I am come from
> a java background.
> can someone tell what are the pre-requisite to understand the tesseract
> code ?
> Like java.awt.image package , Digital image processing concepts ? what
> would I need to be thorough with so that the I am able to understand
> tesseract code .
>
> I want this understanding because I am aiming to make modifications to
> this code , so that tesseract is able to extract text from a movie poster
> printed in a newspaper.
> Tesseract cannot do this currently.
>
> Thanks
> Ravi Katiyar
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/9a488786-ac4d-4d2e-a047-ebe329df1ea8%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/9a488786-ac4d-4d2e-a047-ebe329df1ea8%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAORW5vguXjQnO-c2h9Std0T%2B951Upv3yY_qen65EkAk_EUbHCg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to