Re: [Wikisource-l] OCR for Persian

2014-06-27 Thread Yann Forget
Hi,

I have Abby FR 11 Professional Edition, and Persian/Farsi is not among
the supported languages. :(

Yann

2014-06-24 19:07 GMT+05:30 Amir Ladsgroup ladsgr...@gmail.com:
 Hello,
 I have access to huge resources of old books in Persian (some of them are
 even typed) and almost all of them can be imported to Wikisource but the
 problem is I don't have (or know) any OCR for Persian, Do you know which OCR
 software supports Persian (supporting Arabic is not enough; I checked
 several programs) texts?


 Best

 --
 Amir

___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] OCR for Persian

2014-06-27 Thread Alex Brollo
Is there any Persian/Parsi text into Internet Archive? I'd like to take a
look to its OCR - just to see if OCR engine attempts to interpret it (even
if with no usable result).

Alex


2014-06-27 9:14 GMT+02:00 Yann Forget yan...@gmail.com:

 Hi,

 I have Abby FR 11 Professional Edition, and Persian/Farsi is not among
 the supported languages. :(

 Yann

 2014-06-24 19:07 GMT+05:30 Amir Ladsgroup ladsgr...@gmail.com:
  Hello,
  I have access to huge resources of old books in Persian (some of them are
  even typed) and almost all of them can be imported to Wikisource but the
  problem is I don't have (or know) any OCR for Persian, Do you know which
 OCR
  software supports Persian (supporting Arabic is not enough; I checked
  several programs) texts?
 
 
  Best
 
  --
  Amir

 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l

___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] OCR for Persian

2014-06-25 Thread Amir Ladsgroup
I tried ABBY before and the quality was low,
I will try tesseract and see what happens

Best


On Tue, Jun 24, 2014 at 7:08 PM, Aleksey Chalabyan xelgen...@gmail.com
wrote:

 ABBYY FineReader supports Hebrew and Arabic since v. 11. But I'm afraid
 same script is not enough. For example FineReader has 3 versions for
 Armenian. All three use same scripts, different orphography and slightly
 different vocabulary, but if you set wrong language drop in quality is
 dramatic. So I'm not sure if Arabic OCR would work good for text in Farsi
 (Persian).
 FineReader provides 30 days full trial, and I think it's worth to give it
 a try.

 You may try to approach ABBYY and check if there are any plans on full
 support of Persian in coming future.

 And trying to train Teseract seems like good idea to get free/open source
 OCR for Persian, if you can get enough resources on that. But I can't
 comment on how well it will work with RTL scripts especially with
 Nastaliq/Naskh when letters and words are not separated from each other.


 On Tue, Jun 24, 2014 at 6:13 PM, Federico Leva (Nemo) nemow...@gmail.com
 wrote:

 Amir Ladsgroup, 24/06/2014 15:37:

  I have access to huge resources of old books in Persian (some of them
 are even typed) and almost all of them can be imported to Wikisource but
 the problem is I don't have (or know) any OCR for Persian, Do you know
 which OCR software supports Persian (supporting Arabic is not enough; I
 checked several programs) texts?


 The only result for Persian and OCR in abbyy website is 
 http://www.abbyy.com/CaseStudies/SISU-Reveals-Its-
 Multilingual-Content-to-Academic-Community-Thanks-to-
 ABBYY-Recognition-Server/, weird! Worth asking them some details, they
 might have some additional plugins.

 On the FLOSS side, maybe some library in Iran made some investments on
 tesseract? If there's any big digital library of Persian content you should
 ask them as well.

 Reminder: archive.org is still in need of people willing to compare 8.0
 vs. 9.0 OCR results of some books in their language. :)
 http://thread.gmane.org/gmane.org.wikimedia.wikisource/1552

 Nemo

 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l



 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l




-- 
Amir
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


[Wikisource-l] OCR for Persian

2014-06-24 Thread Amir Ladsgroup
Hello,
I have access to huge resources of old books in Persian (some of them are
even typed) and almost all of them can be imported to Wikisource but the
problem is I don't have (or know) any OCR for Persian, Do you know which
OCR software supports Persian (supporting Arabic is not enough; I checked
several programs) texts?


Best

-- 
Amir
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] OCR for Persian

2014-06-24 Thread Federico Leva (Nemo)

Amir Ladsgroup, 24/06/2014 15:37:

I have access to huge resources of old books in Persian (some of them
are even typed) and almost all of them can be imported to Wikisource but
the problem is I don't have (or know) any OCR for Persian, Do you know
which OCR software supports Persian (supporting Arabic is not enough; I
checked several programs) texts?


The only result for Persian and OCR in abbyy website is 
http://www.abbyy.com/CaseStudies/SISU-Reveals-Its-Multilingual-Content-to-Academic-Community-Thanks-to-ABBYY-Recognition-Server/, 
weird! Worth asking them some details, they might have some additional 
plugins.


On the FLOSS side, maybe some library in Iran made some investments on 
tesseract? If there's any big digital library of Persian content you 
should ask them as well.


Reminder: archive.org is still in need of people willing to compare 8.0 
vs. 9.0 OCR results of some books in their language. :)

http://thread.gmane.org/gmane.org.wikimedia.wikisource/1552

Nemo

___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] OCR for Persian

2014-06-24 Thread Aleksey Chalabyan
ABBYY FineReader supports Hebrew and Arabic since v. 11. But I'm afraid
same script is not enough. For example FineReader has 3 versions for
Armenian. All three use same scripts, different orphography and slightly
different vocabulary, but if you set wrong language drop in quality is
dramatic. So I'm not sure if Arabic OCR would work good for text in Farsi
(Persian).
FineReader provides 30 days full trial, and I think it's worth to give it a
try.

You may try to approach ABBYY and check if there are any plans on full
support of Persian in coming future.

And trying to train Teseract seems like good idea to get free/open source
OCR for Persian, if you can get enough resources on that. But I can't
comment on how well it will work with RTL scripts especially with
Nastaliq/Naskh when letters and words are not separated from each other.


On Tue, Jun 24, 2014 at 6:13 PM, Federico Leva (Nemo) nemow...@gmail.com
wrote:

 Amir Ladsgroup, 24/06/2014 15:37:

  I have access to huge resources of old books in Persian (some of them
 are even typed) and almost all of them can be imported to Wikisource but
 the problem is I don't have (or know) any OCR for Persian, Do you know
 which OCR software supports Persian (supporting Arabic is not enough; I
 checked several programs) texts?


 The only result for Persian and OCR in abbyy website is 
 http://www.abbyy.com/CaseStudies/SISU-Reveals-Its-Multilingual-Content-to-
 Academic-Community-Thanks-to-ABBYY-Recognition-Server/, weird! Worth
 asking them some details, they might have some additional plugins.

 On the FLOSS side, maybe some library in Iran made some investments on
 tesseract? If there's any big digital library of Persian content you should
 ask them as well.

 Reminder: archive.org is still in need of people willing to compare 8.0
 vs. 9.0 OCR results of some books in their language. :)
 http://thread.gmane.org/gmane.org.wikimedia.wikisource/1552

 Nemo

 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l

___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l