Re: wiki slightly broken still?

2010-08-06 Thread Damjan Jovanovic
On Tue, Aug 3, 2010 at 4:18 PM, David Gerard  wrote:
> [to list as well]
>
> On 3 August 2010 15:07, Dimi Paun  wrote:
>> On Tue, 2010-08-03 at 14:30 +0200, Alexandru Băluț wrote:
>
>>> How difficult would it be to use ReCaptcha?
>>> http://www.google.com/recaptcha
>
>> Hm, don't know. We could hack our version to support recaptcha,
>> but I'm not familiar with the code base, and I don't have the
>> time right now. But I can take patches if someone is willing
>> to do it.
>
>
> The MoinMoin developers consider TextCHA inherently superior and so
> have no interest in writing a reCaptcha interface:
>
> http://moinmo.in/FeatureRequests/ReCaptcha
>
> (Note also the problems people have had with TextCHA: it becomes too
> much work to write the questions and to answer the questions.)
>
> If someone wants reCaptcha in MoinMoin, it appears they will need to
> write it all themselves.
>
>
> - d.
>
>
>

reCaptcha has essentially been cracked now
(http://it.slashdot.org/story/10/08/05/2054247/ReCAPTCHAnet-Now-Vulnerable-to-Algorithmic-Attack)
so I'm not sure it's worth using it in the wiki.

Damjan Jovanovic




Re: wiki slightly broken still?

2010-08-03 Thread David Gerard
[to list as well]

On 3 August 2010 15:07, Dimi Paun  wrote:
> On Tue, 2010-08-03 at 14:30 +0200, Alexandru Băluț wrote:

>> How difficult would it be to use ReCaptcha?
>> http://www.google.com/recaptcha

> Hm, don't know. We could hack our version to support recaptcha,
> but I'm not familiar with the code base, and I don't have the
> time right now. But I can take patches if someone is willing
> to do it.


The MoinMoin developers consider TextCHA inherently superior and so
have no interest in writing a reCaptcha interface:

http://moinmo.in/FeatureRequests/ReCaptcha

(Note also the problems people have had with TextCHA: it becomes too
much work to write the questions and to answer the questions.)

If someone wants reCaptcha in MoinMoin, it appears they will need to
write it all themselves.


- d.




Re: wiki slightly broken still?

2010-08-03 Thread Dimi Paun
On Tue, 2010-08-03 at 14:30 +0200, Alexandru Băluț wrote:
> How difficult would it be to use ReCaptcha?
> 
> http://www.google.com/recaptcha

Hm, don't know. We could hack our version to support recaptcha,
but I'm not familiar with the code base, and I don't have the
time right now. But I can take patches if someone is willing 
to do it.

-- 
Dimi Paun 
Lattica, Inc.





Re: wiki slightly broken still?

2010-08-03 Thread Alexandru Băluț
On Thu, Jul 29, 2010 at 18:01, Dimi Paun  wrote:
> Now that this issue is fixed, we can look again at the spam
> problem. It was suggested that we use a 'TextChas' for non
> logged in users:
>    http://moinmo.in/HelpOnSpam
>
> But it seems it's not too easy to come up with decent questions.
> Should we try it?

How difficult would it be to use ReCaptcha?

http://www.google.com/recaptcha

Thanks,
Alex




Re: wiki slightly broken still?

2010-07-30 Thread Dimi Paun
On Fri, 2010-07-30 at 08:58 +0200, Francois Gouget wrote:
> I have a theory: did the script move the remaining files to another 
> directory? 

Yes, it did.

-- 
Dimi Paun 
Lattica, Inc.





Re: wiki slightly broken still?

2010-07-29 Thread Francois Gouget
On Thu, 29 Jul 2010, Dimi Paun wrote:

> On Thu, 2010-07-29 at 18:17 +0200, Michael Stefaniuc wrote:
> > Yes, the LocalBadContent page got pretty long; I'm fairly sure it's
> > the spam checking that takes so long.
> 
> I tried to empty it, and it does seem to help. However, it's not
> the only cause of the problem, it's still not fast even with an
> empty LocalBadContent.

I have a theory: did the script move the remaining files to another 
directory? If not it may be that there's a fragmentation problem at the 
directory level; i.e. the directory structure was grown to accomodate 
32k entries, not there's only 5k entries but they are spread over the 
old 32k entries leading to inefficient lookups? If so something like 
this should fix it:

mkdir newdir
mv olddir/* newdir  # hope there's no dot file
rmdir olddir
mv newdir olddir


-- 
Francois Gouget   http://fgouget.free.fr/
A polar bear is a cartesian bear after a coordinate transform.




Re: wiki slightly broken still?

2010-07-29 Thread Uwe Bonnes
> "Octavian" == Octavian Voicu  writes:

>>  "What is the name for a  billion bytes?"

Terabyte, at least in germany. Billion -> 10^12. 10^9 -> "Milliarde"

So these questions can be tricky...

-- 
Uwe Bonnesb...@elektron.ikp.physik.tu-darmstadt.de

Institut fuer Kernphysik  Schlossgartenstrasse 9  64289 Darmstadt
- Tel. 06151 162516  Fax. 06151 164321 --




Re: wiki slightly broken still?

2010-07-29 Thread Octavian Voicu
On Thu, Jul 29, 2010 at 8:41 PM, Dan Kegel  wrote:
> I like the idea.  It is hard, but here are some possible questions:
>
> "What is the first name of the Finn who created the Linux operating system?"
> "What is the abbreviation for the GNU C Compiler?"
> "What is the name of the simple text editor that comes with Windows?"
> "Complete the phrase:  screen of death"
> "What is the name for a billion bytes?"
>
> I have no idea if those will faze spammers.

Those questions will certainly keep away non-geek spammers.
We could make a quiz-like anti-spam system with not-so-trivial questions :)

Octavian




Re: wiki slightly broken still?

2010-07-29 Thread Dan Kegel
On Thu, Jul 29, 2010 at 9:01 AM, Dimi Paun  wrote:
>> Should we also add another hurdle (possibly even manual approval)
>> to make it harder for spammers to get accounts?
>
> Now that this issue is fixed, we can look again at the spam
> problem. It was suggested that we use a 'TextChas' for non
> logged in users:
>    http://moinmo.in/HelpOnSpam
>
> But it seems it's not too easy to come up with decent questions.
> Should we try it?

I like the idea.  It is hard, but here are some possible questions:

"What is the first name of the Finn who created the Linux operating system?"
"What is the abbreviation for the GNU C Compiler?"
"What is the name of the simple text editor that comes with Windows?"
"Complete the phrase:  screen of death"
"What is the name for a billion bytes?"

I have no idea if those will faze spammers.
- Dan




Re: wiki slightly broken still?

2010-07-29 Thread Dimi Paun
On Thu, 2010-07-29 at 18:17 +0200, Michael Stefaniuc wrote:
> Yes, the LocalBadContent page got pretty long; I'm fairly sure it's
> the spam checking that takes so long.

I tried to empty it, and it does seem to help. However, it's not
the only cause of the problem, it's still not fast even with an
empty LocalBadContent.

-- 
Dimi Paun 
Lattica, Inc.





Re: wiki slightly broken still?

2010-07-29 Thread Michael Stefaniuc
Dimi Paun wrote:
> On Wed, 2010-07-28 at 22:35 +0100, David Gerard wrote:
>> Ubuntu hit this one:
>>
>> https://bugs.edge.launchpad.net/ubuntu/+source/moin/+bug/217191
>> http://moinmo.in/MoinMoinBugs/AllPagesSavedToSingleDirectory
> 
> Thanks David for the links.
> 
> I've run the cleanup scripts, and we are now down to ~5K pages,
> down from 32K. So there is still plenty of room to grow for the
> time being. 
> 
> If we hit the limit again, please let me know and I'll clean it
> up right away, now I know what I need to do :)
> 
> P.S. There is still something wrong with the Wiki, saving pages
> takes a really long time with no reason whatsoever (no load on the
> box, etc). I think we're hitting an inefficiency in Moin, as the
> httpd process shoots up to 95% CPU usage for a few good seconds.
> I've trimmed the edit-log and the event-log files, which were very
> big, but that doesn't seem to help. Any other ideas?
Yes, the LocalBadContent page got pretty long; I'm fairly sure it's the
spam checking that takes so long.

bye
michael




Re: wiki slightly broken still?

2010-07-29 Thread Dimi Paun
On Wed, 2010-07-28 at 15:06 -0700, Dan Kegel wrote:
> > I'm looking into how we can clean this up.
> 
> Should we also add another hurdle (possibly even manual approval)
> to make it harder for spammers to get accounts?

Now that this issue is fixed, we can look again at the spam
problem. It was suggested that we use a 'TextChas' for non
logged in users:
http://moinmo.in/HelpOnSpam

But it seems it's not too easy to come up with decent questions.
Should we try it?

-- 
Dimi Paun 
Lattica, Inc.





Re: wiki slightly broken still?

2010-07-29 Thread Dimi Paun
On Wed, 2010-07-28 at 22:35 +0100, David Gerard wrote:
> Ubuntu hit this one:
> 
> https://bugs.edge.launchpad.net/ubuntu/+source/moin/+bug/217191
> http://moinmo.in/MoinMoinBugs/AllPagesSavedToSingleDirectory

Thanks David for the links.

I've run the cleanup scripts, and we are now down to ~5K pages,
down from 32K. So there is still plenty of room to grow for the
time being. 

If we hit the limit again, please let me know and I'll clean it
up right away, now I know what I need to do :)

P.S. There is still something wrong with the Wiki, saving pages
takes a really long time with no reason whatsoever (no load on the
box, etc). I think we're hitting an inefficiency in Moin, as the
httpd process shoots up to 95% CPU usage for a few good seconds.
I've trimmed the edit-log and the event-log files, which were very
big, but that doesn't seem to help. Any other ideas?

-- 
Dimi Paun 
Lattica, Inc.





Re: wiki slightly broken still?

2010-07-28 Thread Frédéric Delanoy
On Wed, Jul 28, 2010 at 23:35, David Gerard  wrote:
> On 28 July 2010 21:49, Dimi Paun  wrote:
>> On Wed, 2010-07-28 at 13:05 -0700, Dan Kegel wrote:
>
>>> Creating new wiki pages seems broken today...
>
>> Yes, due to all the spam, we've hit the ext3 limit
>> of subdirectories (32k). More here:
>>    http://www.rooftopsolutions.nl/blog/135
>> I'm looking into how we can clean this up.
>
>
> Ubuntu hit this one:
>
> https://bugs.edge.launchpad.net/ubuntu/+source/moin/+bug/217191
> http://moinmo.in/MoinMoinBugs/AllPagesSavedToSingleDirectory
>
> The other solution is permanent deletion of the spam pages from the
> actual file system. I've done such pruning before, and it needs
> (obviously) to be done with *remarkable* care. It's also very fiddly.
> I eventually cobbled together scripts to do the deletion for me. (At
> an old workplace, I don't have them to hand.) The MoinMoin page above
> lists maintenance scripts that can do it for you.
>
> They also suggest moving the wiki directories to a filesystem that can
> allow stupid amounts of directories, like XFS. (Even ext4 only scales
> to 64,000 directories.)

https://ext4.wiki.kernel.org/index.php/Ext4_Howto#Sub_directory_scalability
seems to indicate there is no such limit.
Maybe this was the case a couple of years ago.
Additionally, migrating from ext3 to ext4 should give the least
headaches (maybe a kernel recompile, YMMV)

> MoinMoin 2.0 will apparently use a database instead of flat files.
> ETA: some time or other in the far future. "we can't tell exactly when
> the new storage stuff will be production ready, but I expect end 2008
> .. mid 2009." Ahem.
>
> Oh, and moinmo.in regards this as not being a "bug", but the result of
> bad file system design. (And not, e.g., a wiki that doesn't scale.)
>
>
> - d.
>
>
>




Re: wiki slightly broken still?

2010-07-28 Thread Dan Kegel
On Wed, Jul 28, 2010 at 1:49 PM, Dimi Paun  wrote:
> Yes, due to all the spam, we've hit the ext3 limit
> of subdirectories (32k). More here:
>    http://www.rooftopsolutions.nl/blog/135
>
> I'm looking into how we can clean this up.

Should we also add another hurdle (possibly even manual approval)
to make it harder for spammers to get accounts?




Re: wiki slightly broken still?

2010-07-28 Thread David Gerard
On 28 July 2010 21:49, Dimi Paun  wrote:
> On Wed, 2010-07-28 at 13:05 -0700, Dan Kegel wrote:

>> Creating new wiki pages seems broken today...

> Yes, due to all the spam, we've hit the ext3 limit
> of subdirectories (32k). More here:
>    http://www.rooftopsolutions.nl/blog/135
> I'm looking into how we can clean this up.


Ubuntu hit this one:

https://bugs.edge.launchpad.net/ubuntu/+source/moin/+bug/217191
http://moinmo.in/MoinMoinBugs/AllPagesSavedToSingleDirectory

The other solution is permanent deletion of the spam pages from the
actual file system. I've done such pruning before, and it needs
(obviously) to be done with *remarkable* care. It's also very fiddly.
I eventually cobbled together scripts to do the deletion for me. (At
an old workplace, I don't have them to hand.) The MoinMoin page above
lists maintenance scripts that can do it for you.

They also suggest moving the wiki directories to a filesystem that can
allow stupid amounts of directories, like XFS. (Even ext4 only scales
to 64,000 directories.)

MoinMoin 2.0 will apparently use a database instead of flat files.
ETA: some time or other in the far future. "we can't tell exactly when
the new storage stuff will be production ready, but I expect end 2008
.. mid 2009." Ahem.

Oh, and moinmo.in regards this as not being a "bug", but the result of
bad file system design. (And not, e.g., a wiki that doesn't scale.)


- d.




Re: wiki slightly broken still?

2010-07-28 Thread Dimi Paun
On Wed, 2010-07-28 at 13:05 -0700, Dan Kegel wrote:
> Creating new wiki pages seems broken today...

Yes, due to all the spam, we've hit the ext3 limit
of subdirectories (32k). More here:
http://www.rooftopsolutions.nl/blog/135

I'm looking into how we can clean this up.

-- 
Dimi Paun 
Lattica, Inc.





wiki slightly broken still?

2010-07-28 Thread Dan Kegel
Creating new wiki pages seems broken today...