[Wikitech-l] Visual impairment

2010-05-15 Thread emijrp
Hi all;

Solving captcha during registration is mandatory. Can this be replaced with
a sound captcha for visual impairment people? It is a suggestion to the
usability project too. Thanks.

Regards,
emijrp
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Foundation-l] Visual impairment

2010-05-16 Thread emijrp
Perhaps, we can offer two captchas. First, the current one, and a link with
this label "if you can't read this captcha, try this one" and a link to the
sound reCAPTCHA. Requesting an account to admins is not a good solution
(perhaps as a third option).

Regards,
emijrp

2010/5/16 Christopher Grant 

> On Sun, May 16, 2010 at 11:09 AM, K. Peachey 
> wrote:
> > I believe reCaptcha has it implemented as part of their service (we
> > do/did have a extension to implement theres) but then we would have to
> > reply on third party servers.
>
> Yes, reCaptcha does. However iirc it has been rejected in the past because
> of both the reliance on 3rd party servers and not all the code is open.
> - Chris
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Foundation-l] Visual impairment

2010-05-16 Thread emijrp
Interesting thread in Jimbo's talk page[1] from June 2008.

[1]
http://en.wikipedia.org/wiki/User_talk:Jimbo_Wales/Archive_37#Wikipedia_and_Captcha

2010/5/16 Chad 

> On Sun, May 16, 2010 at 3:04 AM, Christopher Grant
>  wrote:
> > On Sun, May 16, 2010 at 11:09 AM, K. Peachey 
> wrote:
> >> I believe reCaptcha has it implemented as part of their service (we
> >> do/did have a extension to implement theres) but then we would have to
> >> reply on third party servers.
> >
> > Yes, reCaptcha does. However iirc it has been rejected in the past
> because
> > of both the reliance on 3rd party servers and not all the code is open.
> > - Chris
> >
>
> Yes, and I think that makes it pretty much a non-starter for
> both reasons. Nothing's really changed there.
>
> -Chad
>
> ___
> foundation-l mailing list
> foundatio...@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSOC2012 Image_recognition

2012-03-24 Thread emijrp
Perhaps, does she want to developed it out of GSoC? I think that it is a
very interesting topic.

If she know a bit of image processing, probably she know more about it that
most of us. But we can help her with the pywikipedia bot to retrieve
images/wikipages and submit the categorized image results to Wikimedia
Commons.

2012/3/23 Sumana Harihareswara 

> On 03/21/2012 04:00 PM, Emanuela Boroș wrote:
> > Hello,
> >
> > My name is Emanuela Boros and I am a second year Software Engineering
> > master's student at the "Al. Ioan Cuza" University of Iasi, Romania and,
> > after I graduate, I plan to follow a phd program.
> >
> > My research experience is focused mainly towards computer vision, image
> > processing, machine learning, multimodal information retrieval. It also
> > involves participating at international evaluation campaigns (ImageClef,
> > RobotVision) focusing on content-based image retrieval, image
> > classification and topological localization using visual information
> >
> > I am interested in this project
> > https://www.mediawiki.org/wiki/Summer_of_Code_2012#Image_recognition .
> I've
> > noticed that there isn't a mentor mentioned for this project and I would
> > like to know who should I contact for further clarifications regarding
> this
> > project before I start writing my proposal.
> >
> > Thank you and best regards,
> > Emanuela Boros
>
> Emanuela, thank you for your interest in MediaWiki.  Given the lack of
> response to you (and to a few other students who are interested in this
> idea), I'm sorry to say that I do not think there is a mentor available
> for this idea.
>
> We do have a lot of ideas that mentors are interested in guiding:
> https://www.mediawiki.org/wiki/Summer_of_Code_2012#Project_ideas so
> perhaps you will find one of them interesting.  I'm sorry for the trouble.
>
> --
> Sumana Harihareswara
> Volunteer Development Coordinator
> Wikimedia Foundation
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Page views

2012-04-08 Thread emijrp
2012/4/8 Erik Zachte 

> Hi Lars,
>
> You have a point here, especially for smaller projects:
>
> For Swedish Wikisource:
>
> zcat sampled-1000.log-20120404.gz | grep  'GET http://sv.wikisource.org' |
> awk '{print $9, $11,$14}'
>
> returns 20 lines from this 1:1000 sampled squid log file
> after removing javascript/json/robots.txt there are 13 left,
> which fits perfectly with 10,000 to 13,000 per day
>
> however 9 of these are bots!!
>
>
How many of that 1000 sample log were robots (including all languages)?

-- 
Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
Pre-doctoral student at the University of Cádiz (Spain)
Projects: AVBOT <http://code.google.com/p/avbot/> |
StatMediaWiki<http://statmediawiki.forja.rediris.es>
| WikiEvidens <http://code.google.com/p/wikievidens/> |
WikiPapers<http://wikipapers.referata.com>
| WikiTeam <http://code.google.com/p/wikiteam/>
Personal website: https://sites.google.com/site/emijrp/
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Release of educational videos under creative commons

2012-04-24 Thread emijrp
2012/4/24 Andrew Cates 

> I am wondering about releasing many hundreds of Africa educational videos
> under creative commons.
>
> They are the videos currently at www.our-africa.org which is a child
> generated reference site about Africa.
>
> There is a lot of material in the videos which could be edited and used to
> improve Wikipedia article (a solar kettle in operation, a maize plant
> grinding maize, a variety of musical instruments in use, different
> religious festivals, cocoa plantations etc etc)
>
> However at present Wikipedia does not seem to support or want video
> material.
>

Wikipedia supports video with [[Media:]] tag. Also, Wikimedia Commons has a
little collection of them https://commons.wikimedia.org/wiki/Category:Video


>
> Does anyone have a feel whether this is likely to change?
>
>
http://www.videoonwikipedia.com

Also, this message is more related to Wikimedia or Commons mailing lists
(cc:). If you are the owner of those videos and you want to donate them,
some people can help you in the process.


> Andrew
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
Pre-doctoral student at the University of Cádiz (Spain)
Projects: AVBOT <http://code.google.com/p/avbot/> |
StatMediaWiki<http://statmediawiki.forja.rediris.es>
| WikiEvidens <http://code.google.com/p/wikievidens/> |
WikiPapers<http://wikipapers.referata.com>
| WikiTeam <http://code.google.com/p/wikiteam/>
Personal website: https://sites.google.com/site/emijrp/
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Wikimedia-l] Release of educational videos under creative commons

2012-04-25 Thread emijrp
2012/4/24 Samuel Klein 

> Where's the latest thread on the Timed Media Handler progress?
>
> I am meeting with MIT Open CourseWare tomorrow - they want to expand
> the set of videos they released last year under CC-SA, starting with
> categories / vids that would be fill gaps on Wikipedia.  Any thoughts
> on how to make that collaboration more effective would be welcome.
>
> SJ
>
>
You can upload them to Internet Archive, if Wikipedia has temporal issues
with videos. When the problems are fixed, we can move them from Internet
Archive to Wikimedia Commons.

-- 
Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
Pre-doctoral student at the University of Cádiz (Spain)
Projects: AVBOT <http://code.google.com/p/avbot/> |
StatMediaWiki<http://statmediawiki.forja.rediris.es>
| WikiEvidens <http://code.google.com/p/wikievidens/> |
WikiPapers<http://wikipapers.referata.com>
| WikiTeam <http://code.google.com/p/wikiteam/>
Personal website: https://sites.google.com/site/emijrp/
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] External link tracking

2012-04-25 Thread emijrp
2012/4/25 Strainu 

> Hi,
>
> Are there any statistics about the number of visitors that go from
> Wikipedia to different websites linked with external links?


Hi;

I don't think so. If you ask for them, perhaps you can get a random
anonymized sample.

We've
> recently seen some people adding external links (as references) to
> articles from different newspapers and I was wondering if it's really
> worth it for the newspaper to have someone add such links?
>
>
This paper[1] discuss a project to add links to Wikipedia and you can see
the results and how it improved visits.

Wikipedia uses nofollow, so adding links to your website doesn't increase
your pagerank, but it works fine for reaching new readers.

Theses sites[2] receive a lot of traffic from Wikipedia, for sure.

Regards,
emijrp

(Forwarding to the research mailing list.)

[1] http://www.dlib.org/dlib/may07/lally/05lally.html
[2] https://en.wikipedia.org/wiki/User:Emijrp/External_Links_Ranking

-- 
Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
Pre-doctoral student at the University of Cádiz (Spain)
Projects: AVBOT <http://code.google.com/p/avbot/> |
StatMediaWiki<http://statmediawiki.forja.rediris.es>
| WikiEvidens <http://code.google.com/p/wikievidens/> |
WikiPapers<http://wikipapers.referata.com>
| WikiTeam <http://code.google.com/p/wikiteam/>
Personal website: https://sites.google.com/site/emijrp/
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Wikimedia-l] Release of educational videos under creative commons

2012-05-02 Thread emijrp
Another example of a recent video donation
https://commons.wikimedia.org/wiki/Category:Files_from_the_Australian_Broadcasting_Corporation

2012/4/25 emijrp 

> 2012/4/24 Samuel Klein 
>
>> Where's the latest thread on the Timed Media Handler progress?
>>
>> I am meeting with MIT Open CourseWare tomorrow - they want to expand
>> the set of videos they released last year under CC-SA, starting with
>> categories / vids that would be fill gaps on Wikipedia.  Any thoughts
>> on how to make that collaboration more effective would be welcome.
>>
>> SJ
>>
>>
> You can upload them to Internet Archive, if Wikipedia has temporal issues
> with videos. When the problems are fixed, we can move them from Internet
> Archive to Wikimedia Commons.
>
> --
> Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
> Pre-doctoral student at the University of Cádiz (Spain)
> Projects: AVBOT <http://code.google.com/p/avbot/> | 
> StatMediaWiki<http://statmediawiki.forja.rediris.es>
> | WikiEvidens <http://code.google.com/p/wikievidens/> | 
> WikiPapers<http://wikipapers.referata.com>
> | WikiTeam <http://code.google.com/p/wikiteam/>
> Personal website: https://sites.google.com/site/emijrp/
>
>


-- 
Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
Pre-doctoral student at the University of Cádiz (Spain)
Projects: AVBOT <http://code.google.com/p/avbot/> |
StatMediaWiki<http://statmediawiki.forja.rediris.es>
| WikiEvidens <http://code.google.com/p/wikievidens/> |
WikiPapers<http://wikipapers.referata.com>
| WikiTeam <http://code.google.com/p/wikiteam/>
Personal website: https://sites.google.com/site/emijrp/
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] XML dumps/Media mirrors update

2012-05-17 Thread emijrp
Good work. We are approaching finally to an indestructible corpus of
knowledge.

2012/5/17 Ariel T. Glenn 

> We now have three mirror sites, yay!  The full list is linked to from
> http://dumps.wikimedia.org/ and is also available at
>
> http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Current_Mirrors
>
> Summarizing, we have:
>
> C3L (Brazil) with the last 5 good known dumps,
> Masaryk University (Czech Republic) with the last 5 known good dumps,
> Your.org (USA) with the complete archive of dumps, and
>
> for the latest version of uploaded media, Your.org with http/ftp/rsync
> access.
>
> Thanks to Carlos, Kevin and Yenya respectively at the above sites for
> volunteering space, time and effort to make this happen.
>
> As people noticed earlier, a series of media tarballs per-project
> (excluding commons) is being generated.  As soon as the first run of
> these is complete we'll announce its location and start generating them
> on a semi-regular basis.
>
> As we've been getting the bugs out of the mirroring setup, it is getting
> easier to add new locations.  Know anyone interested?  Please let us
> know; we would love to have them.
>
> Ariel
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
Pre-doctoral student at the University of Cádiz (Spain)
Projects: AVBOT <http://code.google.com/p/avbot/> |
StatMediaWiki<http://statmediawiki.forja.rediris.es>
| WikiEvidens <http://code.google.com/p/wikievidens/> |
WikiPapers<http://wikipapers.referata.com>
| WikiTeam <http://code.google.com/p/wikiteam/>
Personal website: https://sites.google.com/site/emijrp/
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] XML dumps/Media mirrors update

2012-05-21 Thread emijrp
You can create a script that uses Special:Export to export all articles in
the deletion categories just before they are deleted.

Then import them into your "Deletionpedia".

2012/5/17 Mike Dupont 

> Hi,
> I am thinking about how to collect articles deleted based on the "not
> notable" criteria,
> is there any way we can extract them from the mysql binlogs? how are
> these mirrors working? I would be interested in setting up a mirror of
> deleted data, at least that which is not spam/vandalism based on tags.
> mike
>
> On Thu, May 17, 2012 at 1:09 PM, Ariel T. Glenn 
> wrote:
> > We now have three mirror sites, yay!  The full list is linked to from
> > http://dumps.wikimedia.org/ and is also available at
> >
> http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Current_Mirrors
> >
> > Summarizing, we have:
> >
> > C3L (Brazil) with the last 5 good known dumps,
> > Masaryk University (Czech Republic) with the last 5 known good dumps,
> > Your.org (USA) with the complete archive of dumps, and
> >
> > for the latest version of uploaded media, Your.org with http/ftp/rsync
> > access.
> >
> > Thanks to Carlos, Kevin and Yenya respectively at the above sites for
> > volunteering space, time and effort to make this happen.
> >
> > As people noticed earlier, a series of media tarballs per-project
> > (excluding commons) is being generated.  As soon as the first run of
> > these is complete we'll announce its location and start generating them
> > on a semi-regular basis.
> >
> > As we've been getting the bugs out of the mirroring setup, it is getting
> > easier to add new locations.  Know anyone interested?  Please let us
> > know; we would love to have them.
> >
> > Ariel
> >
> >
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
> --
> James Michael DuPont
> Member of Free Libre Open Source Software Kosova http://flossk.org
> Contributor FOSM, the CC-BY-SA map of the world http://fosm.org
> Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
Pre-doctoral student at the University of Cádiz (Spain)
Projects: AVBOT <http://code.google.com/p/avbot/> |
StatMediaWiki<http://statmediawiki.forja.rediris.es>
| WikiEvidens <http://code.google.com/p/wikievidens/> |
WikiPapers<http://wikipapers.referata.com>
| WikiTeam <http://code.google.com/p/wikiteam/>
Personal website: https://sites.google.com/site/emijrp/
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] XML dumps/Media mirrors update

2012-05-21 Thread emijrp
Create a script that makes a request to Special:Export using this category
as feed
https://en.wikipedia.org/wiki/Category:Candidates_for_speedy_deletion

More info https://www.mediawiki.org/wiki/Manual:Parameters_to_Special:Export

2012/5/21 Mike Dupont 

> Well I whould be happy for items like this :
> http://en.wikipedia.org/wiki/Template:Db-a7
> would it be possible to extract them easily?
> mike
>
> On Thu, May 17, 2012 at 2:23 PM, Ariel T. Glenn 
> wrote:
> > There's a few other reasons articles get deleted: copyright issues,
> > personal identifying data, etc.  This makes maintaning the sort of
> > mirror you propose problematic, although a similar mirror is here:
> > http://deletionpedia.dbatley.com/w/index.php?title=Main_Page
> >
> > The dumps contain only data publically available at the time of the run,
> > without deleted data.
> >
> > The articles aren't permanently deleted of course.  The revisions texts
> > live on in the database, so a query on toolserver, for example, could be
> > used to get at them, but that would need to be for research purposes.
> >
> > Ariel
> >
> > Στις 17-05-2012, ημέρα Πεμ, και ώρα 13:30 +0200, ο/η Mike Dupont έγραψε:
> >> Hi,
> >> I am thinking about how to collect articles deleted based on the "not
> >> notable" criteria,
> >> is there any way we can extract them from the mysql binlogs? how are
> >> these mirrors working? I would be interested in setting up a mirror of
> >> deleted data, at least that which is not spam/vandalism based on tags.
> >> mike
> >>
> >> On Thu, May 17, 2012 at 1:09 PM, Ariel T. Glenn 
> wrote:
> >> > We now have three mirror sites, yay!  The full list is linked to from
> >> > http://dumps.wikimedia.org/ and is also available at
> >> >
> http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Current_Mirrors
> >> >
> >> > Summarizing, we have:
> >> >
> >> > C3L (Brazil) with the last 5 good known dumps,
> >> > Masaryk University (Czech Republic) with the last 5 known good dumps,
> >> > Your.org (USA) with the complete archive of dumps, and
> >> >
> >> > for the latest version of uploaded media, Your.org with http/ftp/rsync
> >> > access.
> >> >
> >> > Thanks to Carlos, Kevin and Yenya respectively at the above sites for
> >> > volunteering space, time and effort to make this happen.
> >> >
> >> > As people noticed earlier, a series of media tarballs per-project
> >> > (excluding commons) is being generated.  As soon as the first run of
> >> > these is complete we'll announce its location and start generating
> them
> >> > on a semi-regular basis.
> >> >
> >> > As we've been getting the bugs out of the mirroring setup, it is
> getting
> >> > easier to add new locations.  Know anyone interested?  Please let us
> >> > know; we would love to have them.
> >> >
> >> > Ariel
> >> >
> >> >
> >> > ___
> >> > Wikitech-l mailing list
> >> > Wikitech-l@lists.wikimedia.org
> >> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >>
> >>
> >>
> >
> >
> >
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
> --
> James Michael DuPont
> Member of Free Libre Open Source Software Kosova http://flossk.org
> Contributor FOSM, the CC-BY-SA map of the world http://fosm.org
> Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
Pre-doctoral student at the University of Cádiz (Spain)
Projects: AVBOT <http://code.google.com/p/avbot/> |
StatMediaWiki<http://statmediawiki.forja.rediris.es>
| WikiEvidens <http://code.google.com/p/wikievidens/> |
WikiPapers<http://wikipapers.referata.com>
| WikiTeam <http://code.google.com/p/wikiteam/>
Personal website: https://sites.google.com/site/emijrp/
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Level to which Wikimedia wikis care about data integrity

2012-11-11 Thread emijrp
In Commons there is a bunch of broken/corrupt/missing files (most old
versions of the same file).

2012/11/11 MZMcBride 

> Hi.
>
> Is there a policy or guideline about the level to which Wikimedia wikis
> care
> about data integrity? There are a few specific cases I'm talking about:
>
> * edits or other logged actions with a wrong timestamp;
> * incomplete user renames (contributions split between two accounts);
> * weird entries in various *links tables (categorylinks, pagelinks, etc.);
> * weird entries in various non-links tables (page, user, etc.); and
> * revisions with weird user_id (page import bug, I think?).
>
> This has come up in the context of some old Bugzilla bugs about edits with
> the wrong timestamp. There's been some recent activity to mark at least
> some
> of these old bugs as "wontfix". And perhaps this makes sense, given that
> some of them are very old and it may be quite likely that nobody will ever
> go back and tweak the old revision timestamps.
>
> But I'm left wondering if there's anything concrete in this area to guide
> the other Bugzilla bugs, such as the bugs about botched user renames or
> *links tables having orphaned or invalid entries. And to some degree,
> there's a human component too, I suppose. An edit with a bad timestamp
> isn't
> a big deal; breaking a user's account simply because they have a lot of
> edits is a much bigger deal. So there's some level of triaging to be done.
>
> I see a lot of these issues as falling under general "database integrity".
> Is there a page on MediaWiki.org or Meta-Wiki that discusses this
> particular
> issue?
>
> I believe 
> ("Database
> table cleanup (tracking)") is the relevant tracking bug for most of what
> I'm
> describing.
>
> MZMcBride
>
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Emilio J. Rodríguez-Posada
http://LibreFind.org - The wiki search engine
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] list of things to do for image dumps

2010-09-10 Thread emijrp
Hi Lars, are you going to upload more logs to Internet Archive? Domas
website only shows the last 3 (?) months. I think that there are many of
these files at Toolserver, but we must preserve this raw data in another
secure (for posterity) place.

2010/9/10 Lars Aronsson 

> On 09/09/2010 10:54 PM, Jamie Morken wrote:
> > Hi all,
> >
> > If anyone can help with #2 to provide the access log of image usage stats
> please send me an email!
> > 2. sort the image list based on usage frequency from access log files
>
> The raw data is one file per hour, containing a list of page names
> and visit counts. From just one such file, you get statistics on what's
> the most visited pages during that particular hour. By combining
> more files, you can get statistics for a whole day, a week, a month,
> a year, all Mondays, all 7am hours around the year, the 3rd Sunday
> after Easter, or whatever. The combinations are almost endless.
>
> How do we boil this down to a few datasets that are most useful?
> Is that the total visit count per month? Or what?
>
> Are these visitor stats already in a database on the toolserver?
> If so, how are they organized?
>
> I wrote some documentation on the access log format here,
> http://www.archive.org/details/wikipedia_visitor_stats_200712
>
>
> --
>   Lars Aronsson (l...@aronsson.se)
>   Aronsson Datateknik - http://aronsson.se
>
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Community vs. centralized development

2010-09-13 Thread emijrp
Hi all;

I think that Jamie has started an important topic. I don't think that WMF is
going to usurp Wikipedia and the sister projects now or in the future, but
it is statistically possible. If we want to protect us, the human knowledge
and our work of this hypothetical scenario, we need complete full dumps
frequently. But this scenario is a malicious one, and I think that there are
many more dangerous posibilities, and unfortunately, they are common.

For example, small or massive lost of data due to natural disasters,
crackers attacks, stolen passwords, hardware and software bugs, sudden crazy
sysops, and _human errors_. Is WMF ready for that?

Long time ago I searched info about that, but I only found these
links[1][2]. Recently, I have been concerned about this again. Most of the
Wiki[mp]edia projects are small, and their full backups are updated every
week[3] and they can be stored everywhere, but the largest ones like English
Wikipedia gets outdated soon[4] (now, it is +200 days old).

I don't know so much about the infrastructure and how WMF servers are
allocated around the world, so, I want to ask a simple question:

In the case of a complete disaster in the "main" servers, will WMF be able
to restore all the Wiki[mp]edia contain using backups?

We got a terrible fright when 3000 images were deleted accidentally in
2008[5] and I think that not all were recovered.

When people ask about images dump the most common reply is: "Are you going
to store 7 TB (Commons)?" I can't store that at home of course, but, I'm
sure that a few universities or entities around the world can, not only for
backup purposes, for researching too (in full resolution or thumbs).

Also, I think that we need to start mirroring Wiki[mp]edia dumps to other
servers around the globe, as the common GNU/Linux ISOs mirrors do. Also,
Library of Congress said some time ago that they are going to save a copy of
all the tweets sent to Twitter.[6] When are they going to save a copy of
Wiki[mp]edia? I hope we have learnt a bit since Library of Alexandria was
destroyed.

I don't want that an error moves us back to January 15, 2001.

Regards,
emijrp

[1] http://wikitech.wikimedia.org/view/Disaster_Recovery
[2] http://wikitech.wikimedia.org/view/Offsite_Backups
[3] http://download.wikimedia.org/
[4] http://en.wikipedia.org/wiki/User:Emijrp/Wikipedia_Archive
[5]
http://lists.wikimedia.org/pipermail/wikitech-l/2008-September/039265.html
[6] http://www.wired.com/epicenter/2010/04/loc-google-twitter/

2010/9/8 Jamie Morken 

> Hi,
>
> I was involved in an open source project that was usurped by one of the
> main developers for the sole reason of making money, and that project
> continues now to take advantage of the community to increase the profit of
> that developer.  I never would have thought such a thing was possible until
> I saw that happen.  If that developer wasn't acting greedy, there would now
> be open source hardware for radio transceivers of all types, but instead
> there is only open source software for radio of all types.  I find it a
> shame, and when I was working on that project I could *feel* it being
> usurped!  I unfortunately may be paranoid as I feel the same thing here with
> the wikimedia foundation usurping wikipedia.  If you don't believe me, just
> consider that it is a very gradual process, like getting people used to not
> being able to download image dumps anymore, and ignoring ALL requests to
> restore this functionality.  Also failing to provide full history backups of
> the flagship wiki.  These two facts allow the wikimedia foundation to
> maintain the control of intellectual property that wasn't created by the
> people.  If you want the wikimedia foundation to respect you as volunteers,
> you will have to DEMAND respect by making sure that they never usurp the
> project.  I think the best way to do this is to make sure we can all
> download up to date full history with images wikipedia's so a fork at any
> time is possible.  Sure it may be paranoid, but trust me it is worth it to
> be paranoid regarding a project as important as wikipedia.  I have been in
> situations like this before, I wish I had acted before even if I was wrong!
> I wouldn't even be speaking now except for reading the heart-felt words of
> volunteers in this thread that are unhappy with how the wikimedia foundation
> is running.  We need to organize to get wikimedia foundation to release
> images tarballs, they are only ignoring multiple requests to do so, so far.
>
> cheers,
> Jamie
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] list of things to do for image dumps

2010-09-18 Thread emijrp
Thanks! : )

2010/9/17 Lars Aronsson 

> On September 10, emijrp wrote:
> > Hi Lars, are you going to upload more logs to Internet Archive?
>
> No, I can't. I have not downloaded more recent logs. I only uploaded
> what was on my disk, because I needed to free some space.
>
> > Domas
> > website only shows the last 3 (?) months. I think that there are many of
> > these files at Toolserver, but we must preserve this raw data in another
> > secure (for posterity) place.
>
> "Must"? Says who? That sounds like a naive opinion. If you have an
> interest, you can do the job. Otherwise they will get lost. In the
> future, maybe this should be a task for the paid staff, but so far
> it has not been.
>
>
> --
>Lars Aronsson (l...@aronsson.se)
>   Aronsson Datateknik - http://aronsson.se
>
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Xmldatadumps-l] dataset1 maintenance Sat Oct 1 (dumps unavailable)

2010-10-04 Thread emijrp
So, will English Wikipedia dumps be created with this new method from now?

2010/10/2 Ariel T. Glenn 

> The server that hosts XML dumps was moved this morning and all
> maintenance completed.  The dumps for dewiki, arwiki, srwiki and
> ptwikiquote were restarted from the beginning; everything else should be
> running normally. (Except for enwiki, which is a special case.)
>
> Ariel Glenn
>
>
>
> ___
> Xmldatadumps-l mailing list
> xmldatadump...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Xmldatadumps-l] XML dumps stopped, possible fs/disk issues on dump server under investigation

2010-11-10 Thread emijrp
What data is in risk?

2010/11/10 Ariel T. Glenn 

> The server refused to come up on reboot; raid errors.  The backplane is
> suspect.  A ticket is being opened with the vendor.  The host will
> remain offline until we have good information about how to resolve the
> problem or we get a replacement part from the vendor; we don't want to
> risk losing the data.  The backplane is suspect due to earlier errors
> from April that seemed to be disk errors.
>
> Future updates will be made available here:
>
> http://wikitech.wikimedia.org/view/Dataset1
>
> Ariel
>
> Στις 09-11-2010, ημέρα Τρι, και ώρα 21:44 -0800, ο/η Ariel T. Glenn
> έγραψε:
> > We noticed a kernel panic message and stack trace in the logs on the
> > server that servers XML dumps.  The web server that provides access to
> > these files is temporarily out of commission; we hope to have it back on
> > line in 12 hours or less.  Dumps themselves have been suspended while we
> > investigate.  I hope to have an update on this tomorrow as well.
> >
> > Ariel
> >
> >
> >
> > ___
> > Xmldatadumps-l mailing list
> > xmldatadump...@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>
>
>
> ___
> Xmldatadumps-l mailing list
> xmldatadump...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Is that download.wikimedia.org server is down?

2010-11-11 Thread emijrp
The dump generating process is halted. Also, the official XML download page
is "offline", until they fix the hardware.

I don't know if there are mirrors. I don't think so.

2010/11/11 Billy Chan 

> Hi Robin,
>
> Thanks for your link. Do u know where i can download the xml dumps now?
> Thanks.
>
> 2010/11/11 Robin Krahl 
>
> > Hi Billy,
> >
> > On 10.11.2010 19:48, Billy Chan wrote:
> > > I am trying to download mediawiki by the following link, but with no
> > luck,
> > > the server seems down:
> >
> > That’s a known problem (see tech channel or server admin log). You may
> > user the following file instead:
> >  http://noc.wikimedia.org/mediawiki-1.16.0.tar.gz
> >
> > Regards,
> >Robin
> >
> > --
> > Robin Krahl || ireas
> >  http://robin-krahl.de
> >  m...@robin-krahl.de
> >
> >
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Is that download.wikimedia.org server is down?

2010-11-11 Thread emijrp
There are some old dumps in Internet Archive,[1] but I guess you are
interested in the most recent ones.

Also, I have a copy of all the pages-meta-history.xml.7z from August 2010 at
home. But I can't upload them anywhere, they are 100 GB.

[1] http://en.wikipedia.org/wiki/User:Emijrp/Wikipedia_Archive

2010/11/11 emijrp 

> The dump generating process is halted. Also, the official XML download page
> is "offline", until they fix the hardware.
>
> I don't know if there are mirrors. I don't think so.
>
> 2010/11/11 Billy Chan 
>
> Hi Robin,
>>
>> Thanks for your link. Do u know where i can download the xml dumps now?
>> Thanks.
>>
>> 2010/11/11 Robin Krahl 
>>
>> > Hi Billy,
>> >
>> > On 10.11.2010 19:48, Billy Chan wrote:
>> > > I am trying to download mediawiki by the following link, but with no
>> > luck,
>> > > the server seems down:
>> >
>> > That’s a known problem (see tech channel or server admin log). You may
>> > user the following file instead:
>> >  http://noc.wikimedia.org/mediawiki-1.16.0.tar.gz
>> >
>> > Regards,
>> >Robin
>> >
>> > --
>> > Robin Krahl || ireas
>> >  http://robin-krahl.de
>> >  m...@robin-krahl.de
>> >
>> >
>> > ___
>> > Wikitech-l mailing list
>> > Wikitech-l@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>> >
>> ___
>> Wikitech-l mailing list
>> Wikitech-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Is that download.wikimedia.org server is down?

2010-11-11 Thread emijrp
Sorry. Where I said "from August 2010", I mean "of August 2010". I have only
one .7z for every wiki of WMF.

2010/11/11 emijrp 

> There are some old dumps in Internet Archive,[1] but I guess you are
> interested in the most recent ones.
>
> Also, I have a copy of all the pages-meta-history.xml.7z from August 2010
> at home. But I can't upload them anywhere, they are 100 GB.
>
> [1] http://en.wikipedia.org/wiki/User:Emijrp/Wikipedia_Archive
>
> 2010/11/11 emijrp 
>
> The dump generating process is halted. Also, the official XML download page
>> is "offline", until they fix the hardware.
>>
>> I don't know if there are mirrors. I don't think so.
>>
>> 2010/11/11 Billy Chan 
>>
>> Hi Robin,
>>>
>>> Thanks for your link. Do u know where i can download the xml dumps now?
>>> Thanks.
>>>
>>> 2010/11/11 Robin Krahl 
>>>
>>> > Hi Billy,
>>> >
>>> > On 10.11.2010 19:48, Billy Chan wrote:
>>> > > I am trying to download mediawiki by the following link, but with no
>>> > luck,
>>> > > the server seems down:
>>> >
>>> > That’s a known problem (see tech channel or server admin log). You may
>>> > user the following file instead:
>>> >  http://noc.wikimedia.org/mediawiki-1.16.0.tar.gz
>>> >
>>> > Regards,
>>> >Robin
>>> >
>>> > --
>>> > Robin Krahl || ireas
>>> >  http://robin-krahl.de
>>> >  m...@robin-krahl.de
>>> >
>>> >
>>> > ___
>>> > Wikitech-l mailing list
>>> > Wikitech-l@lists.wikimedia.org
>>> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>> >
>>> ___
>>> Wikitech-l mailing list
>>> Wikitech-l@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>
>>
>>
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] XML dumps stopped, possible fs/disk issues on dump server under investigation

2010-11-22 Thread emijrp
You can follow the updates here
http://wikitech.wikimedia.org/history/Dataset1

2010/11/21 masti 

> On 11/10/2010 06:44 AM, Ariel T. Glenn wrote:
> > We noticed a kernel panic message and stack trace in the logs on the
> > server that servers XML dumps.  The web server that provides access to
> > these files is temporarily out of commission; we hope to have it back on
> > line in 12 hours or less.  Dumps themselves have been suspended while we
> > investigate.  I hope to have an update on this tomorrow as well.
> >
> > Ariel
> >
>
> any news/outlook when the new dumps will be available?
>
> masti
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] alternative way to get wikipedia dump while server is down

2010-11-26 Thread emijrp
Crossposting.

This dump is in /mnt/user-store/dump or dumps, on Toolserver. If the admins
don't see any problem, it may be put available for download (~30GB).

Regards,
emijrp

2010/11/25 Oliver Schmidt 

> Hello alltogether,
>
> is there any alternative way to get hands on a wikipedia dump?
> Preferably the last complete one.
> Which was supposed to be found at this address:
> http://download.wikimedia.org/enwiki/20100130/
>
> I would need that dump asap for my research.
> Thank you for any help!
>
> Best regards
>
>
> —
>
> Oliver Schmidt
> PhD student
> Nano Systems Biology Research Group
>
> University of Ulster, School of Biomedical Sciences
> Cromore Road, Coleraine BT52 1SA, Northern Ireland
>
> T: +44 / (0)28 / 7032 3367
> F: +44 / (0)28 / 7032 4375
> E: schmidt...@email.ulster.ac.uk<mailto:schmidt...@email.ulster.ac.uk>
>
> —
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] alternative way to get wikipedia dump while server is down

2010-11-28 Thread emijrp
What are the ISO codes? ro and ka?

I have kawiktionary-20100807-pages-meta-history.xml.7z (1.3 MB) and
rowiktionary-20100810-pages-meta-history.xml.7z (10.1 MB). Very tiny.

2010/11/28 Andrew Dunbar 

> On 28 November 2010 02:42, Jeff Kubina  wrote:
> > I have a copy of the 20091009 enwiki dumps if that would do:
> >
> > http://jeffkubina.org/data/download.wikimedia.org/enwiki/20091009/
> >
> > Jeff
> > --
> > Jeff Kubina 
> > 410-988-4436
> > 8am-10pm EST
> >
> > On Thu, Nov 25, 2010 at 12:30 PM, Oliver Schmidt <
> > schmidt...@email.ulster.ac.uk> wrote:
> >
> >> Hello alltogether,
> >>
> >> is there any alternative way to get hands on a wikipedia dump?
> >> Preferably the last complete one.
> >> Which was supposed to be found at this address:
> >> http://download.wikimedia.org/enwiki/20100130/
> >>
> >> I would need that dump asap for my research.
> >> Thank you for any help!
> >>
> >> Best regards
> >>
> >>
> >> —
> >>
> >> Oliver Schmidt
> >> PhD student
> >> Nano Systems Biology Research Group
> >>
> >> University of Ulster, School of Biomedical Sciences
> >> Cromore Road, Coleraine BT52 1SA, Northern Ireland
> >>
> >> T: +44 / (0)28 / 7032 3367
> >> F: +44 / (0)28 / 7032 4375
> >> E: schmidt...@email.ulster.ac.uk
> >>
> >> —
> >>
> >> ___
> >> Wikitech-l mailing list
> >> Wikitech-l@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >>
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
>
> I don't suppose anybody has a copy of any Romanian or Georgian
> Wiktionary from any time? (-:
>
> Andrew Dunbar (hippietrail)
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Looking for a mediawiki.org dump

2010-12-10 Thread emijrp
I have this one: mediawikiwiki-20100808-pages-meta-history.xml.7z (37 MB). I
can upload it to MegaUpload if needed.

2010/12/6 Andrew Dunbar 

> Could anybody help me locate a dump of mediawiki.org while the dump
> server is broken please? I only need current revisions.
>
> Thanks in advance.
>
> Andrew Dunbar (hippietrail)
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia dumps

2010-12-10 Thread emijrp
2010/12/10 James Linden 

> This may or may not be appropriate to this list -- this is where I
> found most of the discussions on the matter, so posting here.
>
> From reading the past couple of weeks of messages, I surmise that
> there isn't a way to get a current data dump (for enwiki), while the
> server is fubar.
>
> I have the 20100312 dump, which seems to be more recent than others
> available from archive.org, Amazon EC2, and others. However, even this
> dump is significantly behind the current article revisions from
> en.wikipedia.org.
>
> I pulled 333 semi-random articles from the live API -- of those, 329
> of them have significant content changes since 20100312 dump.
>
> Thus, my question:
>
> What is the current preference/recommendation regarding pulling
> significant quantities of articles (250k/ish) from the live API, until
> the dumps are available again?
>
> Sidenote 1: I'm in the process of uploading the 20100312 dump to a
> public web location, in case it is helpful to others.
>
>
Thanks


> Sidenote 2: Is there any discussion regarding insuring current dumps
> are mirrored in the future, say with archive.org ?
>
>
http://en.wikipedia.org/wiki/User:Emijrp/Wikipedia_Archive
http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps


> --
> James Linden
> kodekr...@gmail.com
> --
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Looking for a mediawiki.org dump

2010-12-11 Thread emijrp
Here you have http://www.megaupload.com/?f=WRDUHD3E If it says "temporarly
disabled", wait some minutes and retry.

2010/12/11 Andrew Dunbar 

> Thanks that would be awesom! I don't know megaupload so give me a URL
> or whatever I need when it's there.
>
> Andrew Dunbar (hippietrail)
>
>
> On 11 December 2010 10:34, emijrp  wrote:
> > I have this one: mediawikiwiki-20100808-pages-meta-history.xml.7z (37
> MB). I
> > can upload it to MegaUpload if needed.
> >
> > 2010/12/6 Andrew Dunbar 
> >
> >> Could anybody help me locate a dump of mediawiki.org while the dump
> >> server is broken please? I only need current revisions.
> >>
> >> Thanks in advance.
> >>
> >> Andrew Dunbar (hippietrail)
> >>
> >> ___
> >> Wikitech-l mailing list
> >> Wikitech-l@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >>
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-13 Thread emijrp
Hi;

It would be better if you can give us the md5sum of the file. If you are on
Linux, use the command "md5sum filename" (you have to install it with
apt-get). If you are on Windows search for a tutorial.

Also, the file size and the project language and family (wikipedia,
wiktionary...) would be nice.

Regards,
emijrp

2010/12/13 Monica shu 

> Hi all,
>
> I have downloaded a dump several month ago.
> By accidentally, I lost the version info of this dump, so I don't know when
> this dump was generated.
> Is there any place that list out info about the past dumps(such as
> size...)?
>
> Thanks!
>
> Monica
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] dataset1, xml dumps

2010-12-14 Thread emijrp
Thanks.

Double good news:
http://lists.wikimedia.org/pipermail/foundation-l/2010-December/063088.html

2010/12/14 Ariel T. Glenn 

> For folks who have not been following the saga on
> http://wikitech.wikimedia.org/view/Dataset1
> we were able to get the raid array back in service last night on the XML
> data dumps server, and we are now busily copying data off of it to
> another host.  There's about 11T of dumps to copy over; once that's done
> we will start serving these dumps read-only to the public again.
> Because the state of the server hardware is still uncertain, we don't
> want to do anything that might put the data at risk until that copy has
> been made.
>
> The replacement server is on order and we are watching that closely.
>
> We have also been working on deploying a server to run one round of
> dumps in the interrim.
>
> Thanks for your patience (which is a way of saying, I know you are all
> out of patience, as am I, but hang on just a little longer).
>
> Ariel
>
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Xmldatadumps-l] dataset1, xml dumps

2010-12-15 Thread emijrp
Good work.

2010/12/15 Ariel T. Glenn 

> We now have a copy of the dumps on a backup host.  Although we are still
> resolving hardware issues on the XML dumps server, we think it is safe
> enough to serve the existing dumps read-only.  DNS was updated to that
> effect already; people should see the dumps within the hour.
>
> Ariel
>
>
>
> ___
> Xmldatadumps-l mailing list
> xmldatadump...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-16 Thread emijrp
Hi James;

download.wikimedia.org is available again, so, you can download that file
from
http://download.wikimedia.org/enwiki/20101011/enwiki-20101011-pages-articles.xml.bz26.2
GB.

Regards,
emijrp

2010/12/14 James Linden 

> On Mon, Dec 13, 2010 at 7:09 PM, Michael Gurlitz
>  wrote:
> > I grabbed the following files in the days before the server broke, and
> > I can set up a torrent file if anyone's interested, or I could FTP
> > them to a server. 2010-10-11 was the last full Wikipedia dump that was
> > completed.
> > 6652983189 (6.2GB) enwiki-20101011-pages-articles.xml.bz2
>
> I would very much like to get a copy of
> enwiki-20101011-pages-articles.xml.bz2 if that's possible?
>
> If you need a server to upload to, message me off-list and I can provide
> it.
>
> -- James
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-16 Thread emijrp
Hi Monica;

You dump is this one, with date 2010-03-12:[1][2]

a3a5ee062abc16a79d111273d4a1a99a  enwiki-20100312-pages-articles.xml.bz2

There are some old English Wikipedia dumps and md5sum files in a directory
called "archive"[3].

Regards,
emijrp

[1]
http://download.wikimedia.org/archive/enwiki/20100312/enwiki-20100312-md5sums.txt
[2] http://download.wikimedia.org/archive/enwiki/20100312/
[3] http://download.wikimedia.org/archive/

2010/12/14 Monica shu 

> Hi emijrp,
>
> Here is my dump's info:
>
> *enwiki-latest-pages-articles.xml.bz2 *
> *a3a5ee062abc16a79d111273d4a1a99a*
>
> Thanks~
>
> On Mon, Dec 13, 2010 at 10:00 PM, emijrp  wrote:
>
> > Hi;
> >
> > It would be better if you can give us the md5sum of the file. If you are
> on
> > Linux, use the command "md5sum filename" (you have to install it with
> > apt-get). If you are on Windows search for a tutorial.
> >
> > Also, the file size and the project language and family (wikipedia,
> > wiktionary...) would be nice.
> >
> > Regards,
> > emijrp
> >
> > 2010/12/13 Monica shu 
> >
> > > Hi all,
> > >
> > > I have downloaded a dump several month ago.
> > > By accidentally, I lost the version info of this dump, so I don't know
> > when
> > > this dump was generated.
> > > Is there any place that list out info about the past dumps(such as
> > > size...)?
> > >
> > > Thanks!
> > >
> > > Monica
> > > ___
> > > Wikitech-l mailing list
> > > Wikitech-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > >
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-16 Thread emijrp
All? The 2006 one too?

2010/12/16 Ariel T. Glenn 

> The dumps in the archive are there because they are incomplete, by the
> way.
>
> Ariel
>
> Στις 16-12-2010, ημέρα Πεμ, και ώρα 16:50 +0100, ο/η emijrp έγραψε:
> > Hi Monica;
> >
> > You dump is this one, with date 2010-03-12:[1][2]
> >
> > a3a5ee062abc16a79d111273d4a1a99a  enwiki-20100312-pages-articles.xml.bz2
> >
> > There are some old English Wikipedia dumps and md5sum files in a
> directory
> > called "archive"[3].
> >
> > Regards,
> > emijrp
> >
> > [1]
> >
> http://download.wikimedia.org/archive/enwiki/20100312/enwiki-20100312-md5sums.txt
> > [2] http://download.wikimedia.org/archive/enwiki/20100312/
> > [3] http://download.wikimedia.org/archive/
> >
> > 2010/12/14 Monica shu 
> >
> > > Hi emijrp,
> > >
> > > Here is my dump's info:
> > >
> > > *enwiki-latest-pages-articles.xml.bz2 *
> > > *a3a5ee062abc16a79d111273d4a1a99a*
> > >
> > > Thanks~
> > >
> > > On Mon, Dec 13, 2010 at 10:00 PM, emijrp  wrote:
> > >
> > > > Hi;
> > > >
> > > > It would be better if you can give us the md5sum of the file. If you
> are
> > > on
> > > > Linux, use the command "md5sum filename" (you have to install it with
> > > > apt-get). If you are on Windows search for a tutorial.
> > > >
> > > > Also, the file size and the project language and family (wikipedia,
> > > > wiktionary...) would be nice.
> > > >
> > > > Regards,
> > > > emijrp
> > > >
> > > > 2010/12/13 Monica shu 
> > > >
> > > > > Hi all,
> > > > >
> > > > > I have downloaded a dump several month ago.
> > > > > By accidentally, I lost the version info of this dump, so I don't
> know
> > > > when
> > > > > this dump was generated.
> > > > > Is there any place that list out info about the past dumps(such as
> > > > > size...)?
> > > > >
> > > > > Thanks!
> > > > >
> > > > > Monica
> > > > > ___
> > > > > Wikitech-l mailing list
> > > > > Wikitech-l@lists.wikimedia.org
> > > > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > > > >
> > > > ___
> > > > Wikitech-l mailing list
> > > > Wikitech-l@lists.wikimedia.org
> > > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > > >
> > > ___
> > > Wikitech-l mailing list
> > > Wikitech-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > >
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] dataset1, xml dumps

2010-12-16 Thread emijrp
Have you checked the md5sum?

2010/12/16 Gabriel Weinberg 

> Ariel T. Glenn  wikimedia.org> writes:
>
> >
> > We now have a copy of the dumps on a backup host.  Although we are still
> > resolving hardware issues on the XML dumps server, we think it is safe
> > enough to serve the existing dumps read-only.  DNS was updated to that
> > effect already; people should see the dumps within the hour.
> >
> > Ariel
> >
>
> Hi, thank you for working so hard on this issue, but I'm still having
> trouble
> with the latest en.wikipedia dump, however. I downloaded
> http://download.wikimedia.org/enwiki/20101011/enwiki-20101011-pages-
> articles.xml.bz2 and am running into trouble decompressing.
>
> In particular, bzip2 -d enwiki-20101011-pages-articles.xml.bz2 fails.
>
> And bzip2 -tvv enwiki-20101011-pages-articles.xml.bz2 reports:
>
>[2752: huff+mtf data integrity (CRC) error in data
>
> I ran bzip2recover & then bzip2 -t rec* and got the following:
>
> bzip2: rec02752enwiki-20101011-pages-articles.xml.bz2: data integrity (CRC)
> error in data
> bzip2: rec08881enwiki-20101011-pages-articles.xml.bz2: data integrity (CRC)
> error in data
> bzip2: rec26198enwiki-20101011-pages-articles.xml.bz2: data integrity (CRC)
> error in data
>
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] dataset1, xml dumps

2010-12-16 Thread emijrp
If the md5s don't match, the files are obviously different, I mean, one of
them is corrupt.

What is the size of your local file? I use to download dumps with wget UNIX
command and I don't get errors. If you are using FAT32, the file size is
limited to 2 GB and the file is truncated. Is your case?

2010/12/16 Gabriel Weinberg 

> md5sum doesn't match. I get e74170eaaedc65e02249e1a54b1087cb (as
> opposed to 7a4805475bba1599933b3acd5150bd4d
> on
> http://download.wikimedia.org/enwiki/20101011/enwiki-20101011-md5sums.txt
> ).
>
> I've downloaded it twice now and have gotten the same md5sum. Can anyone
> else confirm?
>
> On Thu, Dec 16, 2010 at 5:41 PM, emijrp  wrote:
>
> > Have you checked the md5sum?
> >
> > 2010/12/16 Gabriel Weinberg 
> >
> > > Ariel T. Glenn  wikimedia.org> writes:
> > >
> > > >
> > > > We now have a copy of the dumps on a backup host.  Although we are
> > still
> > > > resolving hardware issues on the XML dumps server, we think it is
> safe
> > > > enough to serve the existing dumps read-only.  DNS was updated to
> that
> > > > effect already; people should see the dumps within the hour.
> > > >
> > > > Ariel
> > > >
> > >
> > > Hi, thank you for working so hard on this issue, but I'm still having
> > > trouble
> > > with the latest en.wikipedia dump, however. I downloaded
> > > http://download.wikimedia.org/enwiki/20101011/enwiki-20101011-pages-
> > > articles.xml.bz2 and am running into trouble decompressing.
> > >
> > > In particular, bzip2 -d enwiki-20101011-pages-articles.xml.bz2 fails.
> > >
> > > And bzip2 -tvv enwiki-20101011-pages-articles.xml.bz2 reports:
> > >
> > >[2752: huff+mtf data integrity (CRC) error in data
> > >
> > > I ran bzip2recover & then bzip2 -t rec* and got the following:
> > >
> > > bzip2: rec02752enwiki-20101011-pages-articles.xml.bz2: data integrity
> > (CRC)
> > > error in data
> > > bzip2: rec08881enwiki-20101011-pages-articles.xml.bz2: data integrity
> > (CRC)
> > > error in data
> > > bzip2: rec26198enwiki-20101011-pages-articles.xml.bz2: data integrity
> > (CRC)
> > > error in data
> > >
> > >
> > >
> > > ___
> > > Wikitech-l mailing list
> > > Wikitech-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > >
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Foundation-l] YouTube and Creative Commons

2011-06-04 Thread emijrp
A nice script to download YouTube videos is youtube-dl[1]. Link that with a
flv/mp4 -> ogg converter and an uploader to Commons is trivial.

[1] http://rg3.github.com/youtube-dl/

2011/6/4 Michael Dale 

> Comments inline:
>
> On Fri, Jun 3, 2011 at 4:51 PM, Brion Vibber  wrote:
>
> > (I'm not sure offhand if I'm set up to cross-post to Foundation-l; if
> this
> > doesn't make it, somebody please CC a mention if necessary. Thanks!)
> >
> > On Fri, Jun 3, 2011 at 4:42 PM, aude  wrote:
> >
> > > Aside from the very real privacy issue, YouTube videos can disappear at
> > any
> > > time.  I would much rather we host them on Commons.
> > >
> > > A youtube2commons script is pretty easy to implement,
> >
>
>
> yes a basic youtube2commons script was posted by Jan on wikivideo-l list
> recently:
> http://lists.wikimedia.org/pipermail/wikivideo-l/2011-May/56.html But
> as
> you point out we really need to work on increasing the upload size limit.
>
>
>
> >
> > There's been some ongoing work on TimedMediaHandler extension which will
> > replace the older OggHandler
> >
>
>
> Yes, been hammering away on associated bugs. People can help by testing and
> filing bugs :) thedj has helped file a lot of bugs, and Brion too recently
> has been taking a look at the transcoding side of things and Roan did a
> good
> first pass review and thous suggestions have since been integrated.  I hope
> to have a new version up prototype soon that integrates all the known
> requested features / bugs listed in bugzilla some time next week. (with the
> exception of features tagged for version 1.1 like server side srt parsing
> and timed wikitext -> html -> srt text with html tag removal )  Once I get
> this update out to prototype I will try and do a blog post at that point to
> invite people to put test it out.
>
>
>
> > Basic uploads by URL work in theory, but I'm not sure the deployment
> > status.
> > Background large-file downloads are currently disabled in the latest code
> > and needs to be reimplemented if that's to be used.
> >
>
>
> Yea we have bug:
> https://bugzilla.wikimedia.org/show_bug.cgi?id=20512tracking
> re-enabling copy by url. Once we have webm TMH deployed it would
> make for simple youtube cc content with importing without conversion :)
>
>
> >
> > For straight uploads, regular uploads of large files are a bit
> problematic
> > in general (they hit memory limits and such and have to make it through
> > caching proxies and whatnot), but there's also been some new work on
> > improved chunked uploads for FireFogg (and perhaps for general modern
> > browsers that can do fancier uploads). Michael Dale can probably give
> some
> > updates on this, but it'll be a bit yet before it's ready to go.
> >
>
> Yes we are reimplementing the firefogg chunk uploading as ResumableUpload (
> name of new extension ) in a way that allows both HTML5 XHR browsers to use
> the chunk protocol in addition to firefogg ( if your converting video from
> a
> proprietary source ). In addition we had disscutions at the Berlin
> Hack-a-ton that cleared up some confusion about the concerns with the
> firefogg protocol and modified it to explicitly state the byte ranges of
> chunks in requests and server responses. Also had a brief chat with Russell
> on IRC, so that we can support this append chunks system as we move to the
> swiftMedia back end.
>
> --michael
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Using computer vision to categorize images at Commons

2012-02-20 Thread emijrp
Hi Maarten;

I think that this is a perfect example of open question in wiki research.
WikiPapers has a page for that stuff.[1] Can you add some bits there about
this?

I dind't know about OpenCV, I will check it for sure, and I will try to
something (I'm a bot developer).

Regards,
emijrp

[1] http://wikipapers.referata.com/wiki/List_of_open_questions

2012/2/20 Maarten Dammers 

> Hi everyone,
>
> Some time ago I played around with computer vision to get images
> categorized on Commons. I documented this at
> https://commons.wikimedia.org/**wiki/User:Multichill/Using_**
> OpenCV_to_categorize_files<https://commons.wikimedia.org/wiki/User:Multichill/Using_OpenCV_to_categorize_files>.
>  I don't think I'm going to spend time on it soon, but the results were
> quite promising, so maybe someone else feels like working on this? Would
> probably be a pretty nice student project or just fun to do.
>
> Maarten
>
>
> __**_
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/**mailman/listinfo/wikitech-l<https://lists.wikimedia.org/mailman/listinfo/wikitech-l>
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Using computer vision to categorize images at Commons

2012-02-20 Thread emijrp
I have found a tutorial for Python coders
http://creatingwithcode.com/howto/face-detection-in-static-images-with-python/After
some tests, it works fine (including René Descartes face : )).

This is going to be very helpful to improve Images for biographies accuracy
http://toolserver.org/~emijrp/imagesforbio/

2012/2/20 emijrp 

> Hi Maarten;
>
> I think that this is a perfect example of open question in wiki research.
> WikiPapers has a page for that stuff.[1] Can you add some bits there about
> this?
>
> I dind't know about OpenCV, I will check it for sure, and I will try to
> something (I'm a bot developer).
>
> Regards,
> emijrp
>
> [1] http://wikipapers.referata.com/wiki/List_of_open_questions
>
>
> 2012/2/20 Maarten Dammers 
>
>> Hi everyone,
>>
>> Some time ago I played around with computer vision to get images
>> categorized on Commons. I documented this at
>> https://commons.wikimedia.org/**wiki/User:Multichill/Using_**
>> OpenCV_to_categorize_files<https://commons.wikimedia.org/wiki/User:Multichill/Using_OpenCV_to_categorize_files>.
>>  I don't think I'm going to spend time on it soon, but the results were
>> quite promising, so maybe someone else feels like working on this? Would
>> probably be a pretty nice student project or just fun to do.
>>
>> Maarten
>>
>>
>> __**_
>> Wikitech-l mailing list
>> Wikitech-l@lists.wikimedia.org
>> https://lists.wikimedia.org/**mailman/listinfo/wikitech-l<https://lists.wikimedia.org/mailman/listinfo/wikitech-l>
>>
>
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Errors in Wikimedia Commons old files

2012-02-29 Thread emijrp
Hi all;

I'm trying to download Wikimedia Commons, but I have found some errors. For
example:
* oi_archive_name is empty
http://commons.wikimedia.org/wiki/File:Nl-scheikundig.ogg#filehistory
* link is broken and you get an empty file
http://commons.wikimedia.org/wiki/File:SMS_Bluecher.jpg#filehistory

Are you aware of this? Is this going to be fixed?

Regards,
emijrp
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Errors in Wikimedia Commons old files

2012-03-01 Thread emijrp
2012/3/1 Peter Gervai 

> On Thu, Mar 1, 2012 at 00:56, emijrp  wrote:
> > I'm trying to download Wikimedia Commons, but I have found some errors.
> For
>
> There are still occasional errors around, would be nice to run a
> script against the files database... but it can be usually fixed by
> downloading (sometimes from history) and upoading again.
>
>
Didn't you see the broken image links? How are you going to download?


> g
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Picture of the Year torrents

2011-09-16 Thread emijrp
https://bugzilla.wikimedia.org/show_bug.cgi?id=30946

2011/9/12 emijrp 

> Hi all;
>
> I have created two torrent files for the PIcture of the Year dumps[1]. They
> use Wikimedia server as webseed.[2][3] Can you add them to the page?
>
> Thanks,
> emijrp
>
> [1] http://dumps.wikimedia.org/other/poty/
> [2] http://burnbit.com/torrent/177023/poty2006_zip
> [3] http://burnbit.com/torrent/177024/poty2007_zip
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] page view stats redux

2011-09-18 Thread emijrp
Thanks Ariel. That is important data to preserve.

2011/9/15 Ariel T. Glenn 

> I think we finally have a complete copy from December 2007 through
> August 2011 of the pageview stats scrounged from various sources, now
> available on our dumps server.
>
> See http://dumps.wikimedia.org/other/pagecounts-raw/
>
> Ariel
>
>
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Foundation-l] Request: WMF commitment as a long term cultural archive?

2011-09-21 Thread emijrp
Hi all;

Just like the scripts to preserve wikis[1], I'm working in a new script to
download all Wikimedia Commons images packed by day. But I have limited
spare time. Sad that volunteers have to do this without any help from
Wikimedia Foundation.

I started too an effort in meta: (with low activity) to mirror XML dumps.[2]
If you know about universities or research groups which works with
Wiki[pm]edia XML dumps, they would be a possible successful target to mirror
them.

If you want to download the texts into your PC, you only need 100GB free and
to run this Python script.[3]

I heard that Internet Archive saves XML dumps quarterly or so, but no
official announcement. Also, I heard about Library of Congress wanting to
mirror the dumps, but not news since a long time.

L'Encyclopédie has an "uptime"[4] of 260 years[5] and growing. Will
Wiki[pm]edia projects reach that?

Regards,
emijrp

[1] http://code.google.com/p/wikiteam/
[2] http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps
[3]
http://code.google.com/p/wikiteam/source/browse/trunk/wikipediadownloader.py
[4] http://en.wikipedia.org/wiki/Uptime
[5] http://en.wikipedia.org/wiki/Encyclop%C3%A9die


2011/6/2 Fae 

> Hi,
>
> I'm taking part in an images discussion workshop with a number of
> academics tomorrow and could do with a statement about the WMF's long
> term commitment to supporting Wikimedia Commons (and other projects)
> in terms of the public availability of media. Is there an official
> published policy I can point to that includes, say, a 10 year or 100
> commitment?
>
> If it exists, this would be a key factor for researchers choosing
> where to share their images with the public.
>
> Thanks,
> Fae
> --
> http://enwp.org/user_talk:fae
> Guide to email tags: http://j.mp/faetags
>
> ___
> foundation-l mailing list
> foundatio...@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Xmldatadumps-l] first mirror of most recent XML dumps, at C3SL in Brazil

2011-10-23 Thread emijrp
Congratulations, a big step in wiki preservation.

2011/10/13 Ariel T. Glenn 

> As the subject says, the first mirror of our XML dumps is up, hosted at
> C3Sl in BRazil.  We're really excited about it.  Details are listed on
> the main index page on our download server
> ( http://dumps.wikimedia.org/ ) and are reproduced below for everyone's
> convenience:
>
> Site: Centro de Computação Científica e Software Livre (C3SL), at the
> Universidade Federal do Paraná in Brazil.
> Contents: the 5 most current complete and successful dumps of each
> project
> Access: HTTP: http://wikipedia.c3sl.ufpr.br/
> FTP: ftp://wikipedia.c3sl.ufpr.br/wikipedia/
> rsync: rsync://wikipedia.c3sl.ufpr.br/wikipedia/
>
> A big thank you to the folks there for providing the space and working
> with us to make it happen.
>
> Please forward this on to researchers or others who might want to know
> about it but aren't on these lists.
>
> Ariel Glenn
> Software Developer / Systems Engineer
> Wikimedia Foundation
> ar...@wikimedia.org
>
>
> ___
> Xmldatadumps-l mailing list
> xmldatadump...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Xmldatadumps-l] first mirror of most recent XML dumps, at C3SL in Brazil

2011-10-23 Thread emijrp
Some of the most recent dumps links are broken[1].

[1] http://wikipedia.c3sl.ufpr.br/jawikisource/20111018

2011/10/13 Ariel T. Glenn 

> As the subject says, the first mirror of our XML dumps is up, hosted at
> C3Sl in BRazil.  We're really excited about it.  Details are listed on
> the main index page on our download server
> ( http://dumps.wikimedia.org/ ) and are reproduced below for everyone's
> convenience:
>
> Site: Centro de Computação Científica e Software Livre (C3SL), at the
> Universidade Federal do Paraná in Brazil.
> Contents: the 5 most current complete and successful dumps of each
> project
> Access: HTTP: http://wikipedia.c3sl.ufpr.br/
> FTP: ftp://wikipedia.c3sl.ufpr.br/wikipedia/
> rsync: rsync://wikipedia.c3sl.ufpr.br/wikipedia/
>
> A big thank you to the folks there for providing the space and working
> with us to make it happen.
>
> Please forward this on to researchers or others who might want to know
> about it but aren't on these lists.
>
> Ariel Glenn
> Software Developer / Systems Engineer
> Wikimedia Foundation
> ar...@wikimedia.org
>
>
> ___
> Xmldatadumps-l mailing list
> xmldatadump...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Xmldatadumps-l] first mirror of most recent XML dumps, at C3SL in Brazil

2011-10-24 Thread emijrp
They have probably change something. I don't remember an "Index of" page in
http://wikipedia.c3sl.ufpr.br/ yesterday but a backup-index.html.

2011/10/24 Ariel T. Glenn 

> And now that my eyes are open a bit better I see they don't actually
> mirror the backup-index.html file.  So where did you find that link?
> I don't see it here:
>
> http://wikipedia.c3sl.ufpr.br/jawikisource/
>
> Ariel
>
> Στις 24-10-2011, ημέρα Δευ, και ώρα 00:18 +0200, ο/η emijrp έγραψε:
> > Some of the most recent dumps links are broken[1].
> >
> > [1] http://wikipedia.c3sl.ufpr.br/jawikisource/20111018
> >
> > 2011/10/13 Ariel T. Glenn 
> > As the subject says, the first mirror of our XML dumps is up,
> > hosted at
> > C3Sl in BRazil.  We're really excited about it.  Details are
> > listed on
> > the main index page on our download server
> > ( http://dumps.wikimedia.org/ ) and are reproduced below for
> > everyone's
> > convenience:
> >
> > Site: Centro de Computação Científica e Software Livre (C3SL),
> > at the
> > Universidade Federal do Paraná in Brazil.
> > Contents: the 5 most current complete and successful dumps of
> > each
> > project
> > Access: HTTP: http://wikipedia.c3sl.ufpr.br/
> > FTP: ftp://wikipedia.c3sl.ufpr.br/wikipedia/
> > rsync: rsync://wikipedia.c3sl.ufpr.br/wikipedia/
> >
> > A big thank you to the folks there for providing the space and
> > working
> > with us to make it happen.
> >
> > Please forward this on to researchers or others who might want
> > to know
> > about it but aren't on these lists.
> >
> > Ariel Glenn
> > Software Developer / Systems Engineer
> > Wikimedia Foundation
> > ar...@wikimedia.org
> >
> >
> > ___
> > Xmldatadumps-l mailing list
> > xmldatadump...@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
> >
>
>
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Fwd: Old English Wikipedia image dump from 2005

2011-11-11 Thread emijrp
Forwarding...

-- Forwarded message --
From: emijrp 
Date: 2011/11/11
Subject: Old English Wikipedia image dump from 2005
To: wikiteam-disc...@googlegroups.com


Hi all;

I want to share with you this Archive Team link[1]. It is an old English
Wikipedia image dump from 2005. One of the last ones, probably, before
Wikimedia Foundation stopped publishing image dumps. Enjoy.

Regards,
emijrp

[1] http://www.archive.org/details/wikimedia-image-dump-2005-11
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Foundation-l] Announcement: Selected Books from Malayalam Wikisource on CD released

2011-06-11 Thread emijrp
Creating an offline version of a wiki project is a hard work. Keep up the
good work! Congratulations! : )

P.D.: downloading...

2011/6/11 Jyothis E 

> Dear fellow Wikimedians,
>
> With great pleasure, Malayalam Wikimedia Community announced its 2011 CD
> project "Selected Books from Malayalam Wikisource on CD - 1.0" at the 4th
> annual Wiki Meetup in Kannur, Kerala. This is by far the biggest digital
> collection of free books in Malayalam language available on CD for offline
> use. This is an important milestone, as majority of the households in
> Kerala
> does not have internet or does not have an always on connection and this
> will enable them to access these books as an offline content.
>
> Contents:
>
> Selected Poems by
>  * Kumaranasan
>  * Cherusseri
>  * Changampuzha Krishna Pillai
>  * Kalakkaththu Kunchan Nambiar
>  * Irayimman Thampi
>  * Ramapurathu Warrier
>
> Malayalam Grammer
>  * Kerala Panineeyam by AR Rajaraja Varma
>
> Legends/Folklore
>  * Aithihyamala
>
> Novels
>  * Indulekha
>
> Religious
>  * Bhagavad Gita
>  * Adhyatma Ramayanam Kilippaatu
>  * Harinama Keerthanam
>  * Geetha Govindam
>  * Sathya Veda Pusthakam (Malayalam Bible)
>  * Quran
>  * Works of Sree Narayana Guru
>  * Devotional songs for Christian, Hindu and Islamic religions
>
>  Native Art Form
>  * Parichamuttukali pattukal
>
>  Philosophy (Political)
>  * Communist Manifesto
>  * Principles of Communism (Friedrich Engels)
>
> The CD also contains the commons collections of images on food, plants,
> birds, maps and  celebrations from Kerala. The CD is made available for
> download in iso format as well as browsing at our community website -
> http://www.mlwiki.in. For those who are interested in the technical
> challenges and aspects of the background work may read Santhosh's blog
> post<
> http://thottingal.in/blog/2011/06/11/malayalam-wikisource-offline-version/
> >about
> it.
>
> We thank every one who participated in the effort. Comments and questions
> are welcome.
>
> Thanks and Regards,
> Malayalam Wikimedia Community.
> ___
> foundation-l mailing list
> foundatio...@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Foundation-l] Announcement: Selected Books from Malayalam Wikisource on CD released

2011-06-12 Thread emijrp
I'm interested in uploading these CDs ISOs to Internet Archive. Are you OK
with this? Your server is a bit slow, so, you will have a mirror an a bit
faster.

2011/6/11 Jyothis E 

> Dear fellow Wikimedians,
>
> With great pleasure, Malayalam Wikimedia Community announced its 2011 CD
> project "Selected Books from Malayalam Wikisource on CD - 1.0" at the 4th
> annual Wiki Meetup in Kannur, Kerala. This is by far the biggest digital
> collection of free books in Malayalam language available on CD for offline
> use. This is an important milestone, as majority of the households in
> Kerala
> does not have internet or does not have an always on connection and this
> will enable them to access these books as an offline content.
>
> Contents:
>
> Selected Poems by
>  * Kumaranasan
>  * Cherusseri
>  * Changampuzha Krishna Pillai
>  * Kalakkaththu Kunchan Nambiar
>  * Irayimman Thampi
>  * Ramapurathu Warrier
>
> Malayalam Grammer
>  * Kerala Panineeyam by AR Rajaraja Varma
>
> Legends/Folklore
>  * Aithihyamala
>
> Novels
>  * Indulekha
>
> Religious
>  * Bhagavad Gita
>  * Adhyatma Ramayanam Kilippaatu
>  * Harinama Keerthanam
>  * Geetha Govindam
>  * Sathya Veda Pusthakam (Malayalam Bible)
>  * Quran
>  * Works of Sree Narayana Guru
>  * Devotional songs for Christian, Hindu and Islamic religions
>
>  Native Art Form
>  * Parichamuttukali pattukal
>
>  Philosophy (Political)
>  * Communist Manifesto
>  * Principles of Communism (Friedrich Engels)
>
> The CD also contains the commons collections of images on food, plants,
> birds, maps and  celebrations from Kerala. The CD is made available for
> download in iso format as well as browsing at our community website -
> http://www.mlwiki.in. For those who are interested in the technical
> challenges and aspects of the background work may read Santhosh's blog
> post<
> http://thottingal.in/blog/2011/06/11/malayalam-wikisource-offline-version/
> >about
> it.
>
> We thank every one who participated in the effort. Comments and questions
> are welcome.
>
> Thanks and Regards,
> Malayalam Wikimedia Community.
> ___
> foundation-l mailing list
> foundatio...@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Foundation-l] We need to make it easy to fork and leave

2011-08-12 Thread emijrp
Man, Gerard is thinking about new methods to fork (in an easy way) single
articles, sets of articles or complete wikipedias, and people reply about
setting up servers/mediawiki/importing_databases and other geeky weekend
parties. That is why there is no successful forks. Forking Wikipedia is
_hard_.

People need a button to create a branch of an article or sets of articles,
and be allowed to re-write and work in the way they want. Of course, the
resulting articles can't be saved/showed close to the Wikipedia articles,
but in a new plataform. It would be an interesting experiment.

2011/8/12 David Gerard 

> [posted to foundation-l and wikitech-l, thread fork of a discussion
> elsewhere]
>
>
> THESIS: Our inadvertent monopoly is *bad*. We need to make it easy to
> fork the projects, so as to preserve them.
>
> This is the single point of failure problem. The reasons for it having
> happened are obvious, but it's still a problem. Blog posts (please
> excuse me linking these yet again):
>
> * http://davidgerard.co.uk/notes/2007/04/10/disaster-recovery-planning/
> * http://davidgerard.co.uk/notes/2011/01/19/single-point-of-failure/
>
> I dream of the encyclopedia being meaningfully backed up. This will
> require technical attention specifically to making the projects -
> particularly that huge encyclopedia in English - meaningfully
> forkable.
>
> Yes, we should be making ourselves forkable. That way people don't
> *have* to trust us.
>
> We're digital natives - we know the most effective way to keep
> something safe is to make sure there's lots of copies around.
>
> How easy is it to set up a copy of English Wikipedia - all text, all
> pictures, all software, all extensions and customisations to the
> software? What bits are hard? If a sizable chunk of the community
> wanted to fork, how can we make it *easy* for them to do so?
>
> And I ask all this knowing that we don't have the paid tech resources
> to look into it - tech is a huge chunk of the WMF budget and we're
> still flat-out just keeping the lights on. But I do think it needs
> serious consideration for long-term preservation of all this work.
>
>
> - d.
>
> ___
> foundation-l mailing list
> foundatio...@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Foundation-l] We need to make it easy to fork and leave

2011-08-13 Thread emijrp
Yes, that tool looks similar to the idea I wrote. Other approaches may be
possible too.

2011/8/13 John Vandenberg 

> On Sat, Aug 13, 2011 at 4:53 AM, emijrp  wrote:
> > Man, Gerard is thinking about new methods to fork (in an easy way) single
> > articles, sets of articles or complete wikipedias, and people reply about
> > setting up servers/mediawiki/importing_databases and other geeky weekend
> > parties. That is why there is no successful forks. Forking Wikipedia is
> > _hard_.
> >
> > People need a button to create a branch of an article or sets of
> articles,
> > and be allowed to re-write and work in the way they want. Of course, the
> > resulting articles can't be saved/showed close to the Wikipedia articles,
> > but in a new plataform. It would be an interesting experiment.
>
> Something like this.. ?
>
> http://wikimedia.org.au/wiki/Proposal:PersonalWikiTool
>
> --
> John Vandenberg
>
> ___
> foundation-l mailing list
> foundatio...@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Foundation-l] We need to make it easy to fork and leave

2011-08-15 Thread emijrp
2011/8/14 Krinkle 

> Hi all,
>
> 've read most of the previous mails so far. I'd like to clear some
> confusion
> (just in case). Please do correct me if I'm wrong and got caught
> by the confusion myself:
>
> The thread is about one of the following:
> * .. the ability to clone a MediaWiki install and upload it to your own
> domain
> to continue making edits, writing articles etc.
>

Installing MediaWiki for you is easy for geeks. The only solution for
newbies is using wikifarms.


> * .. getting better dumps of Wikimedia wikis in particular (ie. Wikipedia)
>

A ten years old on-going task.


> * .. being able to install MediaWiki easier or even online (like new wikis
> on
> Wikia.com)
>

MediaWiki developers issue.


> * .. making it easy for developers to fork the MediaWiki source code
> repository.
>
>
Trivial. Any developer can set up a repository with a source code snapshot.


Gerard in the first post was speaking about 1) forks, 2) digital preserving

Forking single articles is easy, you just copy/paste (with histories you
have to use import/export). Forking a set of articles is just a bit more
difficult. Forking the whole Wikipedia is _hard_, you need a good
infrastructure and skills.

Digital preserving is a big problem in computer science. It is not solved
yet, but if you make backups frequently and in several places, you have a
high security to save the data.

To fork you need first the data being preserved, and this links with the
dumps generation problem above.

I think people is getting nervous with Wikipedia (and me too), in the same
way people is getting worried with Google having control of all your online
life (Gmail, Google Reader, Google Calendar, Google+, etc). If Google closes
your account, your online life vanishes. If Google dies, your online life
too. Of course you can export all your e-mail, contacts, etc, but you lose
the @gmail.com address, all links in search engines to your data is broken,
etc. Google has a good policy about exporting data, most Internet services
don't.

The mankind is compiling all human knowledge in an encyclopedia, which is
hosted in faulty metal plates spinning thousand times per minute, managed by
faulty humans and located only in one or two locations in the world
(Florida, the land of hurricanes and San Francisco, the land of
earthquakes).

Making fun of Wikipedia is so 2007. Playing with Wikipedia is so 2001.
Losing knowledge is so 48 BC. This is the most important mission human race
has ever achieve.

Regards,
emijrp

--
> Krinkle
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] sep11.wikipedia.org

2011-09-09 Thread emijrp
Hi;

sep11.wikipedia.org redirects to a spam domain, probably expired and
registered by other people.

Can you redirect to this[1] or this[2]? Or make a simply index.html with
that both links...

Thanks,
emijrp

[1] http://dumps.wikimedia.org/sep11wiki/20071116/
[2]
http://web.archive.org/web/20090312042108/http://www.sep11memories.org/wiki/In_Memoriam
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Picture of the Year torrents

2011-09-12 Thread emijrp
Hi all;

I have created two torrent files for the PIcture of the Year dumps[1]. They
use Wikimedia server as webseed.[2][3] Can you add them to the page?

Thanks,
emijrp

[1] http://dumps.wikimedia.org/other/poty/
[2] http://burnbit.com/torrent/177023/poty2006_zip
[3] http://burnbit.com/torrent/177024/poty2007_zip
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l