Re: [Wikitech-l] We're not quite at Google's level

2009-05-21 Thread Steve Bennett
On Fri, May 22, 2009 at 12:13 PM, Thomas Dalton  wrote:
> The thing that prompted me to start this thread was Google, a
> commercial organisation (although not one people pay for at the point
> of use), issuing just such a press release.

Err, yes. But people had already noticed, and been blogging rampantly
about it. So it's not like they were promoting their failure so much
as avoiding being silent on the issue. Whereas we would be actively
promoting it.

Steve

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] We're not quite at Google's level

2009-05-21 Thread Thomas Dalton
2009/5/22 Steve Bennett :
> On Sat, May 16, 2009 at 2:58 AM, The Cunctator  wrote:
>> We should definitely highlight real downtime as a reason for funding,
>> especially in a way that discusses practical steps that would be taken to
>> reduce the problem and how much those steps would cost.
>
> Interesting point. Commercial organisations would never issue a press
> release highlighting poor performance, because they want people to
> think they're getting good value for money. A charity on the other
> hand...what does wikipedia have to lose from people thinking its
> servers are unreliable due to lack of funding?

The thing that prompted me to start this thread was Google, a
commercial organisation (although not one people pay for at the point
of use), issuing just such a press release.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] We're not quite at Google's level

2009-05-21 Thread Steve Bennett
On Sat, May 16, 2009 at 2:58 AM, The Cunctator  wrote:
> We should definitely highlight real downtime as a reason for funding,
> especially in a way that discusses practical steps that would be taken to
> reduce the problem and how much those steps would cost.

Interesting point. Commercial organisations would never issue a press
release highlighting poor performance, because they want people to
think they're getting good value for money. A charity on the other
hand...what does wikipedia have to lose from people thinking its
servers are unreliable due to lack of funding?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Correct way to Import SQL Dumps of Wikipedia into MediaWiki in Binary

2009-05-21 Thread Platonides
O. O. wrote:
> Hi,
>   This may be a bit obvious – but I don’t have quite as much experience 
> in this area. The SQL Dumps provided at http://download.wikimedia.org do 
> not contain specifications for the “DEFAULT CHARSET” of the respective 
> Table. When installing MediaWiki – it seems to be recommended to use the 
> binary Charset. I would like to know how to import one of these dumps 
> into a Table with the binary Charset.
> 
>   Right now I import on the cmdline: E.g.
> 
> mysql wikidb < enwiki-20090306-pagelinks.sql
> 
> This results in the corresponding Table being dropped and then recreated 
> again. The problem with this is that the newly created Table does not 
> have the “DEFAULT CHARSET” set to Binary, because the SQL Dumps do not 
> have these specified.
> 
>   I first attempted to modify my my.cnf file to set the “DEFAULT CHARSET” 
> to binary for new Tables. I attempted to make the following changes to 
> my.cnf:
> 
> [client]
> default-character-set=binary
> [mysqld]
> default-character-set=binary
> default-collation=binary
> character-set-server=binary
> collation-server=binary
> init-connect='SET NAMES binary'
> 
> 
> I restarted the Server – but I found that the new Table that gets 
> created, gets created in UTF-8, not binary.
> 
> I then attempted to edit the SQL File i.e. replace the line
> 
> ) TYPE=InnoDB;
> 
> With
> 
> ) TYPE=InnoDB DEFAULT CHARSET=binary;
> 
> This works, in the sense that now the new Table gets created in Binary. 
> However I think I am making mistakes in editing the file. These files 
> are rather large, so I wrote code in Perl, and again in Java to do the 
> editing. They can manage to do the above substitution, but I am not 
> entirely confident about their UTF-8 handling.

You can also use sed to edit it:
$ sed -i "n;n;n;n;n;n;n;n;n;n;n;n;n;n;n;n;n;s/InnoDB/InnoDB DEFAULT
CHARSET=binary/" enwiki-20090306-pagelinks.sql

will modify just that line (the 18th, adjust the number of 'n;'s should
the schema change) to be
) TYPE=InnoDB DEFAULT CHARSET=binary;



> The problem appears when
> I am trying to import these modified files, where I get an error
> “Duplicate entry” e.g. for the enwiki-20090306-pagelinks.sql file, I
> get the error:
>
> ERROR 1062 (23000) at line 1359: Duplicate entry
> '1198132-2-Gangleri/tests/links/�' for key 1
>
> I would like to add that importing this file as UTF-8 results in this
> “Duplicate entry” error coming much earlier in the input file.

I have looked at the entries from 1198132
(http://en.wikipedia.org/wiki/User:%D7%9C%D7%A2%D7%A8%D7%99_%D7%A8%D7%99%D7%99%D7%A0%D7%94%D7%90%D7%A8%D7%98/tests/links/char_x00_-_xFF)
on that file and they aren't duplicated (the file is ok), but they
stress the charset a bit (uses the full 0-255 range) so if mysql is not
completely interpreting it as binary, it'll have problems.

It explains that in utf8 you get the problem earlier, but I don't know
why exactly it's failing in your setup.

You can also try luck with yesterday's pagelinks.sql



> So, what’s the correct way of importing these SQL Dumps, such that they 
> are imported into a Table in Binary? If my above description is not 
> clear please let me know and I would try to explain again.
> 
> Thanks a lot,
> O. O.
> 
> P. S. I am running MediaWiki/MySQL under Ubuntu. I hope UTF-8 is handled 
> correctly on the Commandline Bash – but I don’t know how to check that.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] flagged revisions

2009-05-21 Thread Platonides
Bart wrote:
> I think a better solution would be to upgrade Huggle/Twinkle.  As it now
> stands, the main antivandal programs are... somewhat stodgy.  When you have
> a bunch of people all using them at once, you start to run into edit
> conflicts.  Different people will be trying to revert the same page at the
> same time.  But you know how you can set a page as "patrolled" in the "new
> revisions" section?  Perhaps there should be a way to set that on
> Huggle/Twinkle, but for multiple users.  There could be a flag that flips
> when there are more than X number of users actively running through
> Huggle/Twinkle.  If the number of users is greater than X, then revisions
> are actually sent out to multiple people at once.  This seems somewhat
> contradictory at first, the idea that you'll save time/resources and cover
> more pages if you have more people working on the same page, but it wouldn't
> revert as soon as you hit revert.  It would just set the flag on that page
> and serve up the next page -- if a majority of reviewers reverts it, the
> vandalism is reverted and the vandal is warned.
> 
> If the number of users is lower than X, then of course each person would
> instantly revert a page when they revert.  But I spend a lot of time waiting
> for Huggle to revert a page and warn a user.  This may not be the case for
> everyone, but I read very quickly.  I read the last Harry Potter book in
> like a couple of hours, no joking.  When I use Huggle, I spend the majority
> of my time waiting for Huggle to revert a page and warn a user (well, other
> than using Google to find other sites to check on factual accuracy, but
> that's another story).
> 
> I just feel that the amount of edit conflicts while using Huggle and the
> amount that the same set of pages is looked over by the same set of people,
> all of whom are trying to individually revert, is just too much.  There's
> far too much wasted time, in my opinion, because Huggle and Twinkle,
> although great, are just slightly inadequate to keep up with how big
> Wikepdia has become.  It's so huge that it's impossible for one person to
> read it all, since it'd take a few years of continuous reading and it's
> growing faster than the fastest reader could read.



In summary, your complain is that Huggle and Twinkle are slow.
Complain to its authors, not to mediawiki developers. It's up to you to
use them or not, or even create a "better" tool.

If two people save the same version, mediawiki already chooses the first
one. There could be an addition of "add this section if it doesn't
exist" command, but that's all.

Antivandal tools are free to synchronize and load their load between
them in any way they wish. This is the wrong list to rant about them.




___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Bugzilla components for Extensions

2009-05-21 Thread Sergey Chernyshev
Can somebody create a component in Bugzilla for the Widgets extension
please?
You can set my email (sergey.chernys...@gmail.com) as a default assignee for
it.

BTW, maybe it makes sense to have a "- other -" component in there so people
who monitor the bugs can create component by just seeing that it doesn't
exist? Maybe I'm wrong.

Thank you,

Sergey


--
Sergey Chernyshev
http://www.sergeychernyshev.com/
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l