Re: [Wikitech-l] [Toolserver-l] Crawling deWP

2009-01-28 Thread Daniel Kinzler
Marco Schuster schrieb:
...
>> But by then, i do hope we have revision flags in the dumps. because that 
>> would
>> be The Right Thing to use.
> Still, using the dumps would require me to get the full history dump
> because I only want flagged revisions and not current revisions
> without the flag.

Including the latest revision which is flagged "good" would be an obvious
feature that should be implemented along with including the revision flags. So
the "current" dump would have 1-3 revisions per page.


-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Crawling deWP

2009-01-28 Thread Rolf Lampa
Marco Schuster skrev:

> Rolf Lampa  wrote:
>>
>> Doesn't the xml dumps contain the flag for flagged revs?
> 
> The xml dumps are nothing for me, way too much overhead (especially,
> they are old, and I want to use single files, it's easier to process
> these than one hge xml file). And they don't contain flagged
> revisions flags :(

I traverse the last enwiki dump (last revision only) in 15 minutes (or
the Swedish svwiki in < 3 min) with my stream tool (written in Delphi
Pascal).

On the go I can copy the whole thing, (takes no longer) and while at it
I can create the "big three" sql-tables (page, revision & text) out of
the xml dump as well, in less than 20 minutes.

I like Xml dumps. :)

I'd love, however, to see the flagged rev status as an attribute in one 
of the tags, for example 

Regards,

// Rolf Lampa


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Crawling deWP

2009-01-28 Thread Daniel Kinzler
Rolf Lampa schrieb:
> I'd love, however, to see the flagged rev status as an attribute in one 
> of the tags, for example 
> 
> Regards,

Naw, it's more complex than that. You can have any number of different flags. It
would probably have to be 
foobar

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] MediaWiki Slow, what to look for?

2009-01-28 Thread Dawson
Thank you Platonides,

Seems now I get the error: "xcache.var_size is either 0 or too small to
enable var data caching in */var/www/includes/BagOStuff.php* on line *643"

*Googling hasn't provided much info on how to fix this, anyone know?*
*
2009/1/28 Platonides 

> Dawson wrote:
> > Modified config file as follows:
> >
> > $wgUseDatabaseMessage = false;
> > $wgUseFileCache = true;
> > $wgMainCacheType = "CACHE_ACCEL";
>
> This should be $wgMainCacheType = CACHE_ACCEL; (constant) not
> $wgMainCacheType = "CACHE_ACCEL"; (string)
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] MediaWiki Slow, what to look for?

2009-01-28 Thread Aryeh Gregor
On Wed, Jan 28, 2009 at 5:33 AM, Dawson  wrote:
> Seems now I get the error: "xcache.var_size is either 0 or too small to
> enable var data caching in */var/www/includes/BagOStuff.php* on line *643"
>
> *Googling hasn't provided much info on how to fix this, anyone know?*

Add this to php.ini:

xcache.var_size = 32M

Or pick whatever size you like, depending on how much RAM you have
available.  You can check the amount of RAM used (and other things)
using the xcache-admin stuff that should have been provided when you
installed XCache.  You might want to tweak the other options too:

http://xcache.lighttpd.net/wiki/PhpIni

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] hosting wikipedia

2009-01-28 Thread Stephen Dunn
I want to offer something like reference. com where the results are formatted 
in a manner consistent with the look and feel of my website. try a search on 
this site and you will see what I mean. So is this a live mirror? If ads are on 
the site, is the revenue shared?



- Original Message 
From: Aryeh Gregor 
To: Wikimedia developers 
Sent: Tuesday, January 27, 2009 7:41:24 PM
Subject: Re: [Wikitech-l] hosting wikipedia

On Tue, Jan 27, 2009 at 7:37 PM, George Herbert
 wrote:
> Right, but a live mirror is a very different thing than a search box link.

Well, as far as I can tell, we have no idea whether the original
poster meant either of those, or perhaps something else altogether.
Obviously nobody minds a search box link, that's just a *link*.  You
can't stop people from linking to you.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Enwiki dump crawling since 10/15/2008

2009-01-28 Thread Russell Blau
"Brion Vibber"  wrote in message 
news:497f9c35.9050...@wikimedia.org...
> On 1/27/09 2:55 PM, Robert Rohde wrote:
>> On Tue, Jan 27, 2009 at 2:42 PM, Brion Vibber 
>> wrote:
>>> On 1/27/09 2:35 PM, Thomas Dalton wrote:
 The way I see it, what we need is to get a really powerful server
>>> Nope, it's a software architecture issue. We'll restart it with the new
>>> arch when it's ready to go.
>> The simplest solution is just to kill the current dump job if you have
>> faith that a new architecture can be put in place in less than a year.
>
> We'll probably do that.
>
> -- brion

FWIW, I'll add my vote for aborting the current dump *now* if we don't 
expect it ever to actually be finished, so we can at least get a fresh dump 
of the current pages.

Russ




___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Enwiki dump crawling since 10/15/2008

2009-01-28 Thread Brion Vibber
Probably wise to poke in a hack to skip the history first. :)

-- brion vibber (brion @ wikimedia.org)

On Jan 28, 2009, at 7:34, "Russell Blau"  wrote:

> "Brion Vibber"  wrote in message
> news:497f9c35.9050...@wikimedia.org...
>> On 1/27/09 2:55 PM, Robert Rohde wrote:
>>> On Tue, Jan 27, 2009 at 2:42 PM, Brion Vibber
>>> wrote:
 On 1/27/09 2:35 PM, Thomas Dalton wrote:
> The way I see it, what we need is to get a really powerful server
 Nope, it's a software architecture issue. We'll restart it with  
 the new
 arch when it's ready to go.
>>> The simplest solution is just to kill the current dump job if you  
>>> have
>>> faith that a new architecture can be put in place in less than a  
>>> year.
>>
>> We'll probably do that.
>>
>> -- brion
>
> FWIW, I'll add my vote for aborting the current dump *now* if we don't
> expect it ever to actually be finished, so we can at least get a  
> fresh dump
> of the current pages.
>
> Russ
>
>
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] enable $wgAllowCopyUploads follow-up

2009-01-28 Thread Michael Dale
Revising the $wgAllowCopyUploads request ... The thread ended here: 
http://lists.wikimedia.org/pipermail/wikitech-l/2009-January/040942.html

Any updates on this; or ideas on how we could support client initiated 
importing of media assets over http?

--michael


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Secure Server IPs?

2009-01-28 Thread Robert Rohde
On enwiki, the secure server (i.e. secure.wikimedia.org) is currently
written down as using: 66.230.192.0–66.230.239.255

It seems unlikely that the server really uses or needs such a large range.

In addition, we received a report that 66.230.230.230 is operating as
a TOR exit node.  Since Wikipedia policy is to prohibit anon editing
and account creation from TOR nodes, it would be nice to clarify this.

Thanks.

-Robert Rohde

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Make upload headings changeable

2009-01-28 Thread Chad
You're hitting on a core issue here, which is the lack of
support for multilingual projects. Mediawiki does not
currently support this. Using hacks such as uselang has
helped hide the issue, but its far from ideal. I would
venture that multilingual content could be handled
with the user's language setting/headers/uselang
param being helpful to show the appropriate content.
Until that happens, each project only has one content
language. In cases like the ones you mentioned, this
happens to be English. Let's suppose I use the French
Wikipedia with Arabic interface. I would find it very
odd that the content is not in French, even though I
use Arabic as my interface language.

On multilingual projects, its ok to present in your user
language. On single-language projects it is not. Using
uselang for content is an icky hack anyway. Multilingual
projects need to be supported in core, or we're just
going to perpetuate these hacks.

Basically, I figured support the majority of cases (single
language projects) rather than the minority (multi-
language projects). The former get the benefit of the
hack, the latter see no change.

-Chad

On Jan 27, 2009 4:08 PM, "Marcus Buck"  wrote:

Chad hett schreven:

> Should be done with a wiki's content language as of r46372. > > -Chad
Thanks! That's already a big improvement, but why content language? As I
pointed out in response to your question, it need's to be user language
on Meta, Incubator, Wikispecies, Beta Wikiversity, old Wikisource, and
all the multilingual wikis of third party users. It's not actually
necessary on non-multilingual wikis, but it does no harm either. So why
content language?
This could be solved with a setting in LocalSettings.php
"isMultilingual", but that's another affair and as long as that does not
exist, we should use user language.

Marcus Buck

___ Wikitech-l mailing list
wikitec...@lists.wikimedia
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Enwiki dump crawling since 10/15/2008

2009-01-28 Thread Christian Storm
That would be great.  I second this notion whole heartedly.


On Jan 28, 2009, at 7:34 AM, Russell Blau wrote:

> "Brion Vibber"  wrote in message
> news:497f9c35.9050...@wikimedia.org...
>> On 1/27/09 2:55 PM, Robert Rohde wrote:
>>> On Tue, Jan 27, 2009 at 2:42 PM, Brion Vibber
>>> wrote:
 On 1/27/09 2:35 PM, Thomas Dalton wrote:
> The way I see it, what we need is to get a really powerful server
 Nope, it's a software architecture issue. We'll restart it with  
 the new
 arch when it's ready to go.
>>> The simplest solution is just to kill the current dump job if you  
>>> have
>>> faith that a new architecture can be put in place in less than a  
>>> year.
>>
>> We'll probably do that.
>>
>> -- brion
>
> FWIW, I'll add my vote for aborting the current dump *now* if we don't
> expect it ever to actually be finished, so we can at least get a  
> fresh dump
> of the current pages.
>
> Russ
>
>
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Make upload headings changeable

2009-01-28 Thread Marcus Buck
Chad hett schreven:
> You're hitting on a core issue here, which is the lack of
> support for multilingual projects. Mediawiki does not
> currently support this. Using hacks such as uselang has
> helped hide the issue, but its far from ideal. I would
> venture that multilingual content could be handled
> with the user's language setting/headers/uselang
> param being helpful to show the appropriate content.
> Until that happens, each project only has one content
> language. In cases like the ones you mentioned, this
> happens to be English.
The facts are correct, but if you thereby implicate that English thus 
should be regarded as a valid output for non-English users of those 
projects, I don't agree. This implication is wrong.
> Let's suppose I use the French
> Wikipedia with Arabic interface. I would find it very
> odd that the content is not in French, even though I
> use Arabic as my interface language.
>   
The average user with a non-technical approach does not feel a strict 
distinction between "interface" (served by the php scripts) and 
"content" (rendered from database content). Especially on file 
description pages (file history and file links for example appear as 
headings just in the same way as the content headings). It won't seem 
odd to me.
> On multilingual projects, its ok to present in your user
> language. On single-language projects it is not. Using
> uselang for content is an icky hack anyway. Multilingual
> projects need to be supported in core, or we're just
> going to perpetuate these hacks.
>   
The ways of achieving and accessing may change in the future, but you 
will never have a clear separation of "content" and localizable 
elements. Multilang support can be as core as imaginable, but still you 
will have localizable elements stored in "content" areas.
> Basically, I figured support the majority of cases (single
> language projects) rather than the minority (multi-
> language projects). The former get the benefit of the
> hack, the latter see no change.
>
> -Chad
Well, you could put it in other terms and the majority/minority thing 
switches: content lang allows localization for monolang projects only, 
when user lang allows it for _all_ projects. So content lang is the 
minority. Whether Arabic file description pages for users of the French 
Wikipedia preferring Arabic is a good or a bad thing is not decided and 
not even decidable. There are some points for content lang, but no 
strong points. There are some points for user lang, but no strong points 
either. If there are equally good points for both solutions this 
supports my interpretation of the majority/minority relation. Your 
interpretation is based on the assumption that content lang on monolang 
projects is _obviously_ a good thing.

Marcus Buck

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Secure Server IPs?

2009-01-28 Thread Brion Vibber
On 1/28/09 10:18 AM, Robert Rohde wrote:
> On enwiki, the secure server (i.e. secure.wikimedia.org) is currently
> written down as using: 66.230.192.0–66.230.239.255
>
> It seems unlikely that the server really uses or needs such a large range.

Indeed, this is wildly incorrect. :)

Our *old* public IP address space in Tampa was 66.230.200.0/24 -- this 
covers the range from 66.230.200.1 to 66.230.200.255, much smaller than 
the range listed by Pilotguy in 2007:
http://en.wikipedia.org/w/index.php?title=MediaWiki:Blockiptext&diff=131647237&oldid=126717109

Most likely, Pilotguy did a lookup and picked out the result for the 
parent IP space, which would cover many other customers of our provider, 
rather than the specific result for our network.


Further, this range is no longer being actively routed on the internet. 
Our current public IP address space in Tampa is 208.80.152.0/22, which 
covers 208.80.152.0 to 208.80.155.255.

Note that while edits made through the secure server would list that 
server in their proxy forwarding headers, which would be visible in 
CheckUser results, it would not be visible as the final public IP unless 
there were a misconfiguration of our proxy whitelist.

> In addition, we received a report that 66.230.230.230 is operating as
> a TOR exit node.  Since Wikipedia policy is to prohibit anon editing
> and account creation from TOR nodes, it would be nice to clarify this.

This IP is not and never has been in our address range. It's probably in 
the same building, though. :)

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] enable $wgAllowCopyUploads follow-up

2009-01-28 Thread Brion Vibber
On 1/28/09 8:57 AM, Michael Dale wrote:
> Revising the $wgAllowCopyUploads request ... The thread ended here:
> http://lists.wikimedia.org/pipermail/wikitech-l/2009-January/040942.html
>
> Any updates on this;

I'm poking poor Mark about it. :)

(And dude, I'm sitting right next to you. ;)

> or ideas on how we could support client initiated
> importing of media assets over http?

I don't think that can be done without a browser plugin, and it would be 
kinda crappy anyway -- the poor client would be using up both their 
upstream and downstream bandwidth for the whole file.

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Wikimedia IdeaTorrent?

2009-01-28 Thread Erik Moeller
If you haven't seen it yet, Ubuntu is running an interesting
brainstorming software called IdeaTorrent to think collectively about
common problems and solutions:

http://brainstorm.ubuntu.com/

The software:

http://www.ideatorrent.org/

I wonder - would people consider it useful to set up something like
brainstorm.wikimedia.org using this software, or would it be too
duplicative of BugZilla and listservs? The benefit of IdeaTorrent is
that it's very straightforward for non-technical users to contribute
ideas and solutions. And, of course, it could be used for
non-technical problems as well.

-- 
Erik Möller
Deputy Director, Wikimedia Foundation

Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Crawling deWP

2009-01-28 Thread Platonides
Daniel Kinzler wrote:
> Rolf Lampa schrieb:
>> I'd love, however, to see the flagged rev status as an attribute in one 
>> of the tags, for example 
>>
>> Regards,
> 
> Naw, it's more complex than that. You can have any number of different flags. 
> It
> would probably have to be 
> foobar
> 
> -- daniel

It would be "", child of , just as 


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Crawling deWP

2009-01-28 Thread Thomas Dalton
2009/1/28 Platonides :
> Daniel Kinzler wrote:
>> Rolf Lampa schrieb:
>>> I'd love, however, to see the flagged rev status as an attribute in one
>>> of the tags, for example 
>>>
>>> Regards,
>>
>> Naw, it's more complex than that. You can have any number of different 
>> flags. It
>> would probably have to be 
>> foobar
>>
>> -- daniel
>
> It would be "", child of , just as 

But, as daniel said, "flagged" isn't enough, you need to know what flag.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Toolserver-l] Crawling deWP

2009-01-28 Thread Thomas Dalton
2009/1/28 Daniel Kinzler :
> Marco Schuster schrieb:
> ...
>>> But by then, i do hope we have revision flags in the dumps. because that 
>>> would
>>> be The Right Thing to use.
>> Still, using the dumps would require me to get the full history dump
>> because I only want flagged revisions and not current revisions
>> without the flag.
>
> Including the latest revision which is flagged "good" would be an obvious
> feature that should be implemented along with including the revision flags. So
> the "current" dump would have 1-3 revisions per page.

The extension is highly customisable, so different projects will have
different flags available. Would you include the latest revision with
each flag? The latest revision with any flag? The latest revision with
a particular flag chosen for each project?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikimedia IdeaTorrent?

2009-01-28 Thread Brion Vibber
On 1/28/09 12:22 PM, Erik Moeller wrote:
> If you haven't seen it yet, Ubuntu is running an interesting
> brainstorming software called IdeaTorrent to think collectively about
> common problems and solutions:
>
> http://brainstorm.ubuntu.com/
>
> The software:
>
> http://www.ideatorrent.org/
>
> I wonder - would people consider it useful to set up something like
> brainstorm.wikimedia.org using this software, or would it be too
> duplicative of BugZilla and listservs? The benefit of IdeaTorrent is
> that it's very straightforward for non-technical users to contribute
> ideas and solutions. And, of course, it could be used for
> non-technical problems as well.

Taking a quick peek it's giving me a warm fuzzy feeling. :) Not sure how 
best to integrate things, but it's definitely worth investigating.

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l