2011/5/19 Mostafa <[email protected]>

> Hi Again,
>
> Seems no body knows where it is hiding.
> Should I contact with CIA agent ? lol
>

If somebody is really interesting she/he can know answer ;-). Within 1
minute ;-) ([1] [2] [3]). BTW: there is Developers
forum<http://groups.google.com/group/tesseract-dev>
.


> But I am kinda serious about the data.
>

There were several requests for training data (in forum, in issues). I did
it too. There was no official reply to such requests. AFAIK Google is
not obliged to release them. So I guess they have a reason for not providing
them.

On other hand this could be opportunity for tesseract community :-): to
create alternative training set. As Ray mentioned ([3]) they use "more
automated training process based on rendering text from fonts", so training
base on "real world" scanned documents could be interesting (but more
difficult)


Zdenko

[1] http://code.google.com/p/tesseract-ocr/people/list
[2] http://code.google.com/p/tesseract-ocr/source/list
[3] http://groups.google.com/group/tesseract-dev/msg/1cdf3ebe8743d935


>  Mostafa
>
> On May 18, 2:43 am, Илья <[email protected]> wrote:
> > He need for table that contains all supported alphabetics characters.
> > Also, Parts of scanned books could not be protected by copyright.
> >
> > Can you give any contacts of "jpn.traindata" dev team?
> >
> > --
> >         Best regards,
> >          Ilia.
> >
> > В Втр, 17/05/2011 в 18:24 +0200, zdenko podobny пишет:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > > On Tue, May 17, 2011 at 5:01 PM, Илья <[email protected]> wrote:
> > >         IMHO alphabets can't be protected by copyright.
> >
> > > Mostafa did not asked for an alphabets. He asked for 'all the tif
> > > files that used for creating...' and content of tiff file (e.g.
> > > scanned books) could be protected by copyright.
> >
> > >         --
> > >         Best regards,
> > >         Ilia.
> >
> > >         В Втр, 17/05/2011 в 09:24 -0400, Dmitri Silaev пишет:
> >
> > >         > I think copyright issues are preventing the dev team from
> > >         publishing
> > >         > these source files. However you can try to contact this
> > >         forum's
> > >         > moderator directly - he probably can take decision to share.
> >
> > >         > --
> > >         > Dmitri
> >
> > >         > On Tue, May 17, 2011 at 4:58 AM, Mostafa
> > >         <[email protected]> wrote:
> > >         > > Hi,
> >
> > >         > > I am interested to get all the tif files that used for
> > >         creating the
> > >         > >jpn.traindata.
> > >         > > I just want to see how many characters are supported in
> > >         that file.
> > >         > > Because I have some other Japanese characters that can't
> > >         be recognized
> > >         > > by
> > >         > > the tesseract OCR.
> >
> > >         > > Does anybody know, where are those tif files ?
> >
> > >         > > Thanks
> >
> > >         > > --
> > >         > > You received this message because you are subscribed to
> > >         the Google
> > >         > > Groups "tesseract-ocr" group.
> > >         > > To post to this group, send email to
> > >         [email protected]
> > >         > > To unsubscribe from this group, send email to
> > >         > > [email protected]
> > >         > > For more options, visit this group at
> > >         > >http://groups.google.com/group/tesseract-ocr?hl=en
> >
> > >         --
> > >         You received this message because you are subscribed to the
> > >         Google
> > >         Groups "tesseract-ocr" group.
> > >         To post to this group, send email to
> > >         [email protected]
> > >         To unsubscribe from this group, send email to
> > >         [email protected]
> > >         For more options, visit this group at
> > >        http://groups.google.com/group/tesseract-ocr?hl=en
> >
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups "tesseract-ocr" group.
> > > To post to this group, send email to [email protected]
> > > To unsubscribe from this group, send email to
> > > [email protected]
> > > For more options, visit this group at
> > >http://groups.google.com/group/tesseract-ocr?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to