[Bug 63122] improve detectTofu algorithm so it can detect replacement characters in fixed-width glyphs

2014-03-31 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=63122

Amir E. Aharoni amir.ahar...@mail.huji.ac.il changed:

   What|Removed |Added

   Priority|Unprioritized   |Normal

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 63122] improve detectTofu algorithm so it can detect replacement characters in fixed-width glyphs

2014-03-30 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=63122

--- Comment #9 from Xiangquan Xiao xiaoxiangq...@gmail.com ---
(In reply to David Chan from comment #8)
 (2) is correct. Your method is more precise and works for more languages.
 However the old method is faster[*], and completely reliable if it returns
 false. Therefore we should do the following pseudo-code:
 
 function detectTofu ( text ) {
 maybeTofu = old technique;
 if ( maybeTofu ) {
 isTofu = new technique;
 } else {
 isTofu = false;
 }
 return isTofu;
 }

Hi, how about a sentence only contains 1 tofu, which is common in languages
like Chinese? 
detectTofu(text) will return true in such situation. Is that correct?



BTW, I'll test the performance of both techniques and post result here :)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 63122] improve detectTofu algorithm so it can detect replacement characters in fixed-width glyphs

2014-03-30 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=63122

--- Comment #10 from Gerrit Notification Bot gerritad...@wikimedia.org ---
Change 122277 had a related patch set uploaded by Xiaoxiangquan:
uls: Improve detectTofu algorithm to detect fixed-width glyphs

https://gerrit.wikimedia.org/r/122277

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 63122] improve detectTofu algorithm so it can detect replacement characters in fixed-width glyphs

2014-03-30 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=63122

Gerrit Notification Bot gerritad...@wikimedia.org changed:

   What|Removed |Added

 Status|NEW |PATCH_TO_REVIEW

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 63122] improve detectTofu algorithm so it can detect replacement characters in fixed-width glyphs

2014-03-30 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=63122

--- Comment #11 from Xiangquan Xiao xiaoxiangq...@gmail.com ---
(In reply to Gerrit Notification Bot from comment #10)
 Change 122277 had a related patch set uploaded by Xiaoxiangquan:
 uls: Improve detectTofu algorithm to detect fixed-width glyphs
 
 https://gerrit.wikimedia.org/r/122277

Sorry I havn't setup a testing-environment well ( trying vagrant currently ),
so it's not well tested. I tried to make it bug free.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 63122] improve detectTofu algorithm so it can detect replacement characters in fixed-width glyphs

2014-03-28 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=63122

--- Comment #7 from Xiangquan Xiao xiaoxiangq...@gmail.com ---
(In reply to David Chan from comment #6)
 Thanks Xiangquan, that's an extremely good start!
 
 When you submit to gerrit, I'll post more detailed comments there.
 
 Be sure to put 'Bug: 63122' (without quotes) in your commit message, on a
 line of its own, immediately above the change ID, with no extra whitespace.
 Then gerrit will post comments automatically to this bug.

Hi, I want to make something clear.

1. Do we need a seperate function, like detectChineseTofu(), just as I did in
the previous patch? If so, in which scene will it be called?

2. Or it's an improvement on the old detectTofu() to make it applicable to
Chinese. If so, may I just cover the old solution, as the new one (comparing
image) will work for almost all languages. Though it's slower than only
comparing widths and heights, a unified solution looks much simpler.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 63122] improve detectTofu algorithm so it can detect replacement characters in fixed-width glyphs

2014-03-28 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=63122

--- Comment #8 from David Chan da...@sheetmusic.org.uk ---
(2) is correct. Your method is more precise and works for more languages.
However the old method is faster[*], and completely reliable if it returns
false. Therefore we should do the following pseudo-code:

function detectTofu ( text ) {
maybeTofu = old technique;
if ( maybeTofu ) {
isTofu = new technique;
} else {
isTofu = false;
}
return isTofu;
}

[*] I *presume* the old method is faster, but I have not actually tested this.
Feel free to do so and to post actual numbers here!

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 63122] improve detectTofu algorithm so it can detect replacement characters in fixed-width glyphs

2014-03-27 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=63122

--- Comment #2 from Xiangquan Xiao xiaoxiangq...@gmail.com ---
Created attachment 14941
  -- https://bugzilla.wikimedia.org/attachment.cgi?id=14941action=edit
Patch for the detect chinese tofu function

Hope it can work. I just finish the function, while don't know how to integrate
it with ULS currently.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 63122] improve detectTofu algorithm so it can detect replacement characters in fixed-width glyphs

2014-03-27 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=63122

--- Comment #3 from Xiangquan Xiao xiaoxiangq...@gmail.com ---
Created attachment 14942
  -- https://bugzilla.wikimedia.org/attachment.cgi?id=14942action=edit
A simple test page

it will alert a tofu char, and a not-tofu char

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 63122] improve detectTofu algorithm so it can detect replacement characters in fixed-width glyphs

2014-03-27 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=63122

Andre Klapper aklap...@wikimedia.org changed:

   What|Removed |Added

   Keywords||patch, patch-need-review
 CC||aklap...@wikimedia.org

--- Comment #4 from Andre Klapper aklap...@wikimedia.org ---
(In reply to Xiangquan Xiao from comment #2)
 Created attachment 14941 [details]
 Patch for the detect chinese tofu function

Thanks for your patch!
You are welcome to use Developer access
  https://www.mediawiki.org/wiki/Developer_access
to submit this as a Git branch directly into Gerrit:
  https://www.mediawiki.org/wiki/Git/Tutorial

Putting your branch in Git makes it easier to review it quickly. If you don't
want to set up Git/Gerrit, you can also use
https://tools.wmflabs.org/gerrit-patch-uploader/
Thanks again! We appreciate your contribution.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 63122] improve detectTofu algorithm so it can detect replacement characters in fixed-width glyphs

2014-03-27 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=63122

--- Comment #5 from Xiangquan Xiao xiaoxiangq...@gmail.com ---
(In reply to Andre Klapper from comment #4)
 
 Putting your branch in Git makes it easier to review it quickly. If you
 don't want to set up Git/Gerrit, you can also use
 https://tools.wmflabs.org/gerrit-patch-uploader/
 Thanks again! We appreciate your contribution.

Thanks for the information. I've set up gerrit by following the tips.

Actually it's an incomplete fix. So I just leave the test page there to show
how it works, as my GSoC application's microtask.

A complete fix will be submitted soon using Gerrit.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 63122] improve detectTofu algorithm so it can detect replacement characters in fixed-width glyphs

2014-03-27 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=63122

--- Comment #6 from David Chan da...@sheetmusic.org.uk ---
Thanks Xiangquan, that's an extremely good start!

When you submit to gerrit, I'll post more detailed comments there.

Be sure to put 'Bug: 63122' (without quotes) in your commit message, on a line
of its own, immediately above the change ID, with no extra whitespace.
Then gerrit will post comments automatically to this bug.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 63122] improve detectTofu algorithm so it can detect replacement characters in fixed-width glyphs

2014-03-26 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=63122

Quim Gil q...@wikimedia.org changed:

   What|Removed |Added

 CC||xiaoxiangq...@gmail.com
   See Also||https://bugzilla.wikimedia.
   ||org/show_bug.cgi?id=31791

--- Comment #1 from Quim Gil q...@wikimedia.org ---
Please don't take this bug unless you are a GSoC student working on Bug 31791 -
Add web fonts for Chinese scripts. Thank you.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l