subject:"Re\: \[Wikitech\-l\] Our CAPTCHA is very unfriendly"

Re: [Wikitech-l] Our CAPTCHA is very unfriendly

2015-03-17 Thread Ricordisamoa


Il 18/03/2015 04:30, MZMcBride ha scritto:

Ricordisamoa wrote:

Il 09/11/2014 18:33, MZMcBride ha scritto:

Marc A. Pelletier wrote:

But there is also a great heap of anecdotal data that shows that having
to provide an email account increases the barrier of entry to users
signing up.  So, there's a tradeoff.

Eh, I think the anecdotal data (such as Facebook's and Google's hundreds
of millions account registrations) suggests that e-mail confirmation is
not a huge barrier to entry for legitimate users.

I think both Facebook and Google have enough staff resources to deal
with spam, and they could even let bots create fake accounts as long as
they don't harass other users, just to let the accounts counter increase.
We can't afford that.

I'm not sure what you mean by can't afford that. What specific behaviors
are we trying to prevent? Account registration alone isn't really a
problem on MediaWiki wikis, just as it isn't a problem on Facebook or
Google. The system scales. But if the accounts are registering and then
spamming (creating new pages, making bad edits to existing pages, etc.),
that's a real problem that we should try to solve as efficiently and
cleanly as possible. Volunteer time is definitely precious.


If a bot creates 10,000 Facebook profiles and fills them with bogus 
content, that's fine for them. More users, more ads, more money.
But if it creates 10,000 Wikimedia accounts with bogus user pages, it 
isn't fine for us. Less trust between Wikimedians.





I think calling this issue a sacred cow is a bit overblown, but requiring
an e-mail address would be a violation of our shared values. We strive to
be as open and independent as possible and requiring an e-mail address is
antithetical to that. If anything, we could provide e-mail address
aliases (e.g., mzmcbr...@en.wikipedia.org) for our users as a side
benefit.

What about case-sensitivity of user names vs email addresses then?

This is tangential, but... we should fix usernames to be case-insensitive.
And we should support login via e-mail address. And we should (properly)
support a display name field, in my opinion. Hopefully, in time. :-)

In addition to better heuristics, as Robert suggested, we could also focus
on tasks such as , maybe. Using
AbuseFilter to trigger CAPTCHAs seems like it would either be a really
great or a really terrible idea. At least making this functionality
available as an option to potentially try seems worthwhile.


Definitely.



MZMcBride



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Our CAPTCHA is very unfriendly

2015-03-17 Thread MZMcBride

Ricordisamoa wrote:
>Il 09/11/2014 18:33, MZMcBride ha scritto:
>> Marc A. Pelletier wrote:
>>> But there is also a great heap of anecdotal data that shows that having
>>> to provide an email account increases the barrier of entry to users
>>> signing up.  So, there's a tradeoff.
>> Eh, I think the anecdotal data (such as Facebook's and Google's hundreds
>> of millions account registrations) suggests that e-mail confirmation is
>> not a huge barrier to entry for legitimate users.
>
>I think both Facebook and Google have enough staff resources to deal
>with spam, and they could even let bots create fake accounts as long as
>they don't harass other users, just to let the accounts counter increase.
>We can't afford that.

I'm not sure what you mean by can't afford that. What specific behaviors
are we trying to prevent? Account registration alone isn't really a
problem on MediaWiki wikis, just as it isn't a problem on Facebook or
Google. The system scales. But if the accounts are registering and then
spamming (creating new pages, making bad edits to existing pages, etc.),
that's a real problem that we should try to solve as efficiently and
cleanly as possible. Volunteer time is definitely precious.

>>I think calling this issue a sacred cow is a bit overblown, but requiring
>>an e-mail address would be a violation of our shared values. We strive to
>>be as open and independent as possible and requiring an e-mail address is
>>antithetical to that. If anything, we could provide e-mail address
>>aliases (e.g., mzmcbr...@en.wikipedia.org) for our users as a side
>>benefit.
>
>What about case-sensitivity of user names vs email addresses then?

This is tangential, but... we should fix usernames to be case-insensitive.
And we should support login via e-mail address. And we should (properly)
support a display name field, in my opinion. Hopefully, in time. :-)

In addition to better heuristics, as Robert suggested, we could also focus
on tasks such as , maybe. Using
AbuseFilter to trigger CAPTCHAs seems like it would either be a really
great or a really terrible idea. At least making this functionality
available as an option to potentially try seems worthwhile.

MZMcBride

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Our CAPTCHA is very unfriendly

2015-03-17 Thread Ricordisamoa

Il 10/11/2014 17:23, Chris Steipp ha scritto:

On the general topic, I think either a captcha or verifying an email makes
a small barrier to building a bot, but it's significant enough that it
keeps the amateur bots out. I'd be very interested in seeing an experiment
run to see what the exact impact is though.

Google had a great blog post on this subject where they made recaptcha
easier to solve, and instead,

"The updated system uses advanced risk analysis techniques, actively
considering the user's entire engagement with the CAPTCHA--before, during
and after they interact with it. That means that today the distorted
letters serve less as a test of humanity and more as a medium of engagement
to elicit a broad range of cues that characterize humans and bots. " [1]

So spending time on a new engine that allows for environmental feedback
from the system solving the captcha, and that lets us tune lots of things
besides did the "user" sending back the right string of letters, I think
would be well worth our time.

[1] -
http://googleonlinesecurity.blogspot.com/2013/10/recaptcha-just-got-easier-but-only-if.html

Il 04/12/2014 05:35, Robert Rohde ha scritto:

We have many smart people, and undoubtedly we could design a better captcha.

However, no matter how smart the mousetrap, as long as you leave it strewn
around the doors and hallways, well-meaning people are going to trip over
it.

I would support removing the captcha from generic entry points, like the
account registration page, where we know many harmless people are
encountering it.

However, captchas might be useful if used in conjunction with simple
behavioral analysis, such as rate limiters. For example, if an IP is
creating a lot of accounts or editing at a high rate of speed, those are
bad signs. Adding the same external link to multiple pages is often a very
bad sign. However, adding a link to the NYTimes or CNN or an academic
journal is probably fine. With that in mind, I would also eliminate the
external link captcha in most cases where a link has only been added once
and try to be more intelligent about which sites trigger it otherwise.

Basically, I'd advocate a strategy of adding a few heuristics to try and
figure out who the mice are before putting the mousetraps in front of
them. Of course, the biggest rats will still break the captcha and get
through, but that is already true. Though reducing the prevalence of the
captcha may increase the volume of spam by some small measure, I think it
is more important that we stop erecting so many hurdles to new editors.

-Robert Rohde

Il 05/12/2014 06:28, Robert Rohde ha scritto:

I suspect that a lot of the spam are the obvious things such as external
links to junk sites and repetitive promotional postings, though perhaps
there are also less obvious types of spam?

I suspect we could weed out a lot of spammy link behavior by designing an
external link classifier that used knowledge of what external links are
frequently included and what external links are frequently removed to
generate automatic good / suspect / bad ratings for new external links (or
domains). Good links (e.g. NYTimes, CNN) might be automatically allowed
for all users, suspect links (e.g. unknown or rarely used domains) might be
automatically allowed for established users and challenged with captchas or
other tools for new users / IPs, and bad links (i.e. those repeatedly
spammed and removed) could be automatically detected and blocked.

-Robert Rohde

What about applying ClueBot NG's Vandalism Detection Algorithm

to spam?
At this point I think machine learning is the only way a real CAPTCHA
can keep up with evil bots, and a text-based system (such as T34695
) would only be used for
tuning, just as reCAPTCHA does.

68 matches

Mail list logo