for your information. ---------- Forwarded message ---------- From: Soon Hui Ngu <[email protected]> Date: Thu, Aug 7, 2008 at 11:33 AM Subject: Re: source codes compiled in VC++2008 To: 74yrs old <[email protected]>
Hi, sorry for the late reply. I've seen your output, the chinese translation accuracies are not very good as compared to english...the word to word accuracies ( the accuracies in identifying chinese character) is 90%, there are 458 chinese words, and about 40 words are not properly identified. Besides that, some punctuations are misplaced. But overall a person literate in chinese can still identify what the passage says. On Thu, Aug 7, 2008 at 1:02 PM, 74yrs old <[email protected]> wrote: > Soonhui, > Awaiting anxious to know your evaluation of output text of chinese > generated. > Greetings, > -sriranga(75yrsold) > > > On Wed, Aug 6, 2008 at 4:51 PM, 74yrs old <[email protected]> wrote: > >> I may kindly be informed percentage(or number of mistakes) in the >> chinese-output text. From my experience, output generally have 95 to 98% >> correct. >> >> >> On Wed, Aug 6, 2008 at 4:22 PM, 74yrs old <[email protected]>wrote: >> >>> forwarded chinese-tessdata.zip >>> >>> >>> On Wed, Aug 6, 2008 at 4:19 PM, 74yrs old <[email protected]>wrote: >>> >>>> Soon, >>>> Without installing any fonts, succeeded to generate bmp file >>>> (attached herewith as compressed tif) as well as box. also attached >>>> tesseract log report as well as output text.(all in zip) >>>> >>>> I think output appears to be perfect - of course there may few mistakes. >>>> Kindly feedback about correctness/ >>>> -Greetings, >>>> >>>> >>>> >>>> On Wed, Aug 6, 2008 at 3:20 PM, 74yrs old <[email protected]>wrote: >>>> >>>>> Thanks I shall check and feedback to you. >>>>> >>>>> 2008/8/6 Soon Hui Ngu <[email protected]> >>>>> >>>>> Oh OK, sorry :) >>>>>> >>>>>> Here it is. >>>>>> >>>>>> >>>>>> On Wed, Aug 6, 2008 at 5:34 PM, 74yrs old <[email protected]>wrote: >>>>>> >>>>>>> It means not similar to English - which has independent vowels. As >>>>>>> such complete set of Characters have to be trained. >>>>>>> >>>>>>> Required sample in *text form* (*Notepad - text*) - NOT >>>>>>> image(bmp)file >>>>>>> Sample text (.txt) is required to generate image based on the text >>>>>>> file in bbt tool.. >>>>>>> >>>>>>> >>>>>>> On Wed, Aug 6, 2008 at 1:38 PM, Soon Hui Ngu >>>>>>> <[email protected]>wrote: >>>>>>> >>>>>>>> Hi, Mandarin has no dependent vowels. In fact, the whole concept of >>>>>>>> vowel is alien in Mandarin. >>>>>>>> >>>>>>>> As for how to install Mandarin font, you may want to consult >>>>>>>> http://www.yellowbridge.com/chinese/fonts.php for more information. >>>>>>>> >>>>>>>> I attach a sample text here. >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Aug 6, 2008 at 3:45 PM, 74yrs old >>>>>>>> <[email protected]>wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> I like to know whether Mandarin has dependent vowels ? >>>>>>>>> Will you forward sample text to enable me to generate sample >>>>>>>>> datafiles and forward to you. >>>>>>>>> Since in XP I could not locate Mandarin or chinese font - how to >>>>>>>>> install the same in XP? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Aug 6, 2008 at 11:17 AM, Soon Hui Ngu < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> I'm a chinese, and I write Mandarin. I would be interested in >>>>>>>>>> training Tesseract to recognize chinese words...not sure whether >>>>>>>>>> other devs >>>>>>>>>> have done or not... >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Aug 6, 2008 at 1:20 PM, 74yrs old < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> Thanks for the uploading in the forum - which will be benefited >>>>>>>>>>> tesseract users. >>>>>>>>>>> >>>>>>>>>>> I am interested to know which mother tongue you speak and write. >>>>>>>>>>> I am thinking to experiment in your local lang in tesseract.if >>>>>>>>>>> possible and >>>>>>>>>>> feedback to you >>>>>>>>>>> - Cheers >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Aug 6, 2008 at 6:10 AM, Soon Hui Ngu < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, thanks for your compliment. >>>>>>>>>>>> >>>>>>>>>>>> Ya, I think Ocropus is a good idea, will give it a try >>>>>>>>>>>> sometime.. >>>>>>>>>>>> >>>>>>>>>>>> As for which language I am going to train in Tesserract...well, >>>>>>>>>>>> I haven't think of this issue yet... will think about this later... >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Aug 6, 2008 at 1:48 AM, 74yrs old < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Soon, >>>>>>>>>>>>> *Congratulations* !! >>>>>>>>>>>>> Successfully generated exe files without any error in >>>>>>>>>>>>> VC++2008. All the exe files are performed very well without >>>>>>>>>>>>> any trouble - >>>>>>>>>>>>> to train Kannada script which have dependent vowels. >>>>>>>>>>>>> I am thankful to you for your modified source codes.. >>>>>>>>>>>>> >>>>>>>>>>>>> Which language you are going to train in Tesserract? >>>>>>>>>>>>> >>>>>>>>>>>>> Since you are good programmer, why not compile the source codes >>>>>>>>>>>>> of Ocropus in VC++2008 also for benefit of users. I am willing >>>>>>>>>>>>> to perform >>>>>>>>>>>>> beta testing and feedback to you under your valuable guidance.. >>>>>>>>>>>>> >>>>>>>>>>>>> With Best of Luck, >>>>>>>>>>>>> -sriranga(75yrsold) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Aug 5, 2008 at 4:33 PM, 74yrs old < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> Thanks for the same. I shall test and feedback to you. >>>>>>>>>>>>>> With Regards, >>>>>>>>>>>>>> -sriranga(75yrsold) >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Aug 5, 2008 at 2:12 PM, Soon Hui Ngu < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Here's my modified version. Do contact me if you have >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Aug 5, 2008 at 3:57 PM, 74yrs old < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> Will you kindly forward zipped source codes of tesseract >>>>>>>>>>>>>>>> 2.03 >>>>>>>>>>>>>>>> already modified in VC++2008 by you for beta testing and >>>>>>>>>>>>>>>> feedback to you. I have installed VC++2008. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I would have done myself by replacing as suggested by you. >>>>>>>>>>>>>>>> But I find difficult to do so - due to overaged and vision >>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>> As such, you need not take trouble of correcting - in other >>>>>>>>>>>>>>>> words >>>>>>>>>>>>>>>> simply what you have already done(modified), the same be >>>>>>>>>>>>>>>> zipped direct >>>>>>>>>>>>>>>> to me to have hands on experience. >>>>>>>>>>>>>>>> With Best of Luck, >>>>>>>>>>>>>>>> -sriranga(75yrsold) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> http://itscommonsensestupid.blogspot.com/ >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> http://itscommonsensestupid.blogspot.com/ >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> http://itscommonsensestupid.blogspot.com/ >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> http://itscommonsensestupid.blogspot.com/ >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> http://itscommonsensestupid.blogspot.com/ >>>>>> >>>>> >>>>> >>>> >>> >> > -- http://itscommonsensestupid.blogspot.com/ -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

