Re: [Wikitech-l] Hi!

2013-12-12 Thread Petr Onderka
You might want to have a look at the Annoying little bugs page [1],
I think it's meant exactly for people like you.

[1]: https://www.mediawiki.org/wiki/Annoying_little_bugs

Petr Onderka
[[en:User:Svick]]

On Thu, Dec 12, 2013 at 10:27 AM, Amanpreet Singh
 wrote:
> Hello, I am new to mediawiki and want to contribute, I have been lurking
> around the developer docs and articles for about 3 days.
> I have skills in php, javascript.
> At the same time I searching for the bugs that I can fix, but I really
> don't find one, I don't get to think of a starting point to fix the bug.
>
> Can somebody provide right direction to fix some bugs or the whole manual
> of mediawiki (i mean each and everything) should be read carefully before
> starting.
>
> Thanks
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] GSoC 2013 summary: Incremental dumps

2013-10-07 Thread Petr Onderka
Hi,

during the summer I've worked on making dumps of page and revision
information for Wikimedia wikis incremental [1].
This includes both server (faster updating of dumps) and client
(download only changes since last dump) sides.

The project was successful, though there remain some issues that have
to be fixed before this goes into production [2].

I've had fun working on this, and I plan to continue with that, as time permits.
I would like to thank to my mentors, to Tyler Romeo, and especially to
Ariel T. Glenn for being there for me.

Petr Onderka
[[User:Svick]]

[1]: https://www.mediawiki.org/wiki/User:Svick/Incremental_dumps
[2]: https://bugzilla.wikimedia.org/show_bug.cgi?id=54633

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

2013-09-16 Thread Petr Onderka
On Tue, Sep 17, 2013 at 12:34 AM, Gabriel Wicke  wrote:
> In practice I doubt that there are any articles starting with 'w/'.

Actually, there are. Looking at enwiktionary only, there are 10 pages
starting with "w/".
Some of those are redirects (e.g "w/r/t"), but others are normal
articles (e.g. "w/", "w/e").

Petr Onderka
[[en:User:Svick]]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] OAuth

2013-08-21 Thread Petr Onderka
Shouldn't Special:MWOAuth with no other parameters do something better than
just returning an error?

Also, how is normal user supposed to learn about
Special:MWOAuthManageMyGrants?
I would expect this to be available from Preferences, but I didn't find
anything there.

Petr Onderka
[[en:User:Svick]]


On Wed, Aug 21, 2013 at 6:15 AM, Chris Steipp  wrote:

> As mentioned earlier this week, we deployed an initial version of the OAuth
> extension to the test wikis yesterday. I wanted to follow up with a few
> more details about the extension that we deployed (although if you're just
> curious about OAuth in general, I recommend starting at oauth.net, or
> https://www.mediawiki.org/wiki/Auth_systems/OAuth):
>
> * Use it: https://www.mediawiki.org/wiki/Extension:OAuth#Using_OAuthshould
> get you started towards using OAuth in your application.
>
> * Demo: Anomie setup a excellent initial app (I think counts as our first
> official, approved consumer) here
> https://tools.wmflabs.org/oauth-hello-world/. Feel free to try it out, so
> you can get a feel for the user experience as a user!
>
> * Timeline: We're hoping to get some use this week, and deploy to the rest
> of the WMF wikis next week if we don't encounter any issues.
>
> * Bugs: Please open bugzilla tickets for any issues you find, or
> enhancement requests--
>
> https://bugzilla.wikimedia.org/enter_bug.cgi?product=MediaWiki%20extensions&component=OAuth
>
>
> And some other details for the curious:
>
> * Yes, you can use this on your own wiki right now! It's meant to be used
> in a single or shared environment, so the defaults will work on a
> standalone wiki. Input and patches are welcome, if you have any issues
> setting this up on your own wiki.
>
> * TLS: Since a few of you seem to care about https... The extension
> currently implements OAuth 1.0a, which is designed to be used without https
> (except to deliver the shared secret to the app owner, when the app is
> registered). So calls to the API don't need to use https.
>
> * Logging: All edits are tagged with the consumer's id (CID), so you can
> see when OAuth was used to contribute an edit.
>
> Enjoy!
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Xmldatadumps-l] Suggested file format of new incremental dumps

2013-07-31 Thread Petr Onderka
>
> For storing updateable indexes, Berkeley DB 4-5, GDBM, and higher-level
> options like SQLite are widely used. 
> LevelDB<https://code.google.com/p/leveldb/> is
> pretty cool too.
>

I think that with the amount of data we're dealing with, it makes sense to
have the file format under tight control. For example, saving a single byte
on each revision means total savings of ~500 MB for enwiki.

In any case, at this point it would be more work to switch to one of those
than to keep using the format I created.


> For delta coding, there's xdelta3 <http://xdelta.org/>, 
> open-vcdiff<https://code.google.com/p/open-vcdiff/>,
> and 
> Git's<http://stackoverflow.com/questions/9478023/is-the-git-binary-diff-algorithm-delta-storage-standardized>
> delta <https://github.com/git/git/blob/master/diff-delta.c> 
> code<https://github.com/git/git/blob/master/patch-delta.c>.
> (rzip <http://rzip.samba.org/>/rsync are wicked awesome, but not as easy
> to just drop in as a library.)
>

I'm certainly going to try to use some library for delta compression,
because they seem to do pretty much exactly what's needed here. Thanks for
the suggestions.

Petr Onderka
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] First preview version of incremental dumps

2013-07-31 Thread Petr Onderka
Hi,

after a month of work on my GSoC project Incremental Dumps [1], I think I
have now something worth sharing and talking about, though it's still far
from complete.

What the code can do now is to read a pages-history XML dump and create the
various kinds of dumps (pages/stub, current/history) in the new format from
that.
It can then convert a dump in the new format back to XML.

The XML output is almost the same as existing XML dumps, but there are some
differences [2].
The current state of the new format also now has a detailed specification
[3] (this describes the current version, the format is still in flux and
can change daily).

If you want, you can also try running the code. [4]
It's not production-quality yet (e.g. it doesn't report errors properly),
but it should work.
Compilation instructions are in the README file.

Any comments or questions are welcome.

Petr Onderka
User:Svick

[1]: http://www.mediawiki.org/wiki/User:Svick/Incremental_dumps
[2]:
http://www.mediawiki.org/wiki/User:Svick/Incremental_dumps/File_format/XML_output
[3]:
http://www.mediawiki.org/wiki/User:Svick/Incremental_dumps/File_format/Specification
[4]: https://github.com/wikimedia/operations-dumps-incremental/tree/gsoc
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Xmldatadumps-l] Suggested file format of new incremental dumps

2013-07-10 Thread Petr Onderka
On Mon, Jul 8, 2013 at 6:53 AM, Randall Farmer  wrote:

> > Keeping the dumps in a text-based format doesn't make sense, because
> that can't be updated efficiently, which is the whole reason for the new
> dumps.
>
> First, glad to see there's motion here.
>
> It's definitely true that recompressing the entire history to .bz2 or .7z
> goes very, very slowly. Also, I don't know of an existing tool that lets
> you just insert new data here and there without compressing all of the
> unchanged data as well. Those point towards some sort of format change.
>
> I'm not sure a new format has to be sparse or indexed to get around those
> two big problems.
>
> For full-history dumps, delta coding (or the related idea of long-range
> redundancy compression) runs faster than bzip2 or 7z and produces good
> compression ratios on full-history dumps, based on some 
> tests<https://www.mediawiki.org/wiki/Dbzip2#rzip_and_xdelta3>
> . (I'm going to focus mostly on full-history dumps here because they're
> the hard case and one Ariel said is currently painful--not everything here
> will apply to latest-revs dumps.)
>
> For inserting data, you do seemingly need to break the file up into
> independently-compressed sections containing just one page's revision
> history or a fragment of it, so you can add new diff(s) to a page's
> revision history without decompressing and recompressing the previous
> revisions. (Removing previously-dumped revisions is another story, but it's
> rarer.) You'd be in new territory just doing that; I don't know of existing
> compression tools that really allow that.
>
> You could do those two things, though, while still keeping full-history
> dumps a once-every-so-often batch process that produces a sorted file. The
> time to rewrite the file, stripped of the big compression steps, could be
> bearable--a disk can read or write about 100 MB/s, so just copying the 70G
> of the .7z enwiki dumps is well under an hour; if the part bound by CPU and
> other steps is smallish, you're OK.
>
> A format like the proposed one, with revisions inserted wherever there's
> free space when they come in, will also eventually fragment the revision
> history for one page (I think Ariel alluded to this in some early notes).
> Unlike sequential read/writes, seeks are something HDDs are sadly pretty
> slow at (hence the excitement about solid-state disks); if thousands of
> revisions are coming in a day, it eventually becomes slow to read things in
> the old page/revision order, and you need fancy techniques to defrag (maybe
> a big external-memory sort <http://en.wikipedia.org/wiki/External_sorting>)
> or you need to only read the dump on fast hardware that can handle the
> seeks. Doing occasional batch jobs that produce sorted files could help
> avoid the fragmentation question.
>

These are some interesting ideas.

You're right that the copying the whole dump is fast enough (it would
probably add about an hour to a process that currently takes several days).
But it would also pretty much force the use of delta compression. And while
I would like to use delta compression, I don't think it's a good idea to be
forced to use it, because I might not have the time for it or it might not
be good enough.

Because of that, I decided to stay with my indexed approach.


>  There's a great quote about the difficulty of "constructing a software
> design...to make it so simple that there are obviously no deficiencies."
> (Wikiquote came through with the full text/attribution, of 
> course<http://en.wikiquote.org/wiki/C._A._R._Hoare>.)
> I admit it's tricky and people can disagree about what's simple enough or
> even what approach is simpler of two choices, but it's something to strive
> for.
>
> Anyway, I'm wary about going into the technical weeds of other folks'
> projects, because, hey, it's your project! I'm trying to map out the
> options in the hope that you could get a product you're happier with and
> maybe give you more time in a tight three-month schedule to improve on your
> work and not just complete it. Whatever you do, good luck and I'm
> interested to see the results!
>

Feel free to comment more. I am the one implementing the project, but
that's all. Input from others is always welcome.

Petr Onderka
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Xmldatadumps-l] Suggested file format of new incremental dumps

2013-07-04 Thread Petr Onderka
On Wed, Jul 3, 2013 at 11:29 PM, Tyler Romeo  wrote:

> You should look into maybe using cmake or some other automated build system
> to handle the cross-platform compatibility.


I will look into that.


> Also, are you planning on using
> C++11 features? (Just asking because I'm a big C++11 fan. ;) ).


Yeah, I'm already using unique_ptr. And I will use lambdas if I think they
would be useful in some code.

Petr Onderka
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Xmldatadumps-l] Suggested file format of new incremental dumps

2013-07-03 Thread Petr Onderka
I'm writing it in C++.
If you want, you can follow my progress in the operations/dumps/incremental
repo, branch gsoc [1] (but there isn't almost anything there yet).
And I don't have any computers with non-x86 architecture, so I won't be
able to test that.

[1]:
https://git.wikimedia.org/log/operations%2Fdumps%2Fincremental/refs%2Fheads%2Fgsoc


On Wed, Jul 3, 2013 at 10:13 PM, Byrial Jensen wrote:

> At 03-07-2013 18:29, Petr Onderka wrote:
>
>> I'm primarily a Windows guy, so I'm trying to write the code in a
>> portable way and I will make sure the application works on both Linux
>> and Windows.
>>
>
> That sounds good. Just remember that portable not only means that it works
> on different operating systems on the same computer architecture, but also
> on different architectures. What programming language do you intend to use?
>
>
>
> __**_
> Xmldatadumps-l mailing list
> Xmldatadumps-l@lists.**wikimedia.org 
> https://lists.wikimedia.org/**mailman/listinfo/xmldatadumps-**l<https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l>
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Xmldatadumps-l] Suggested file format of new incremental dumps

2013-07-03 Thread Petr Onderka
I'm primarily a Windows guy, so I'm trying to write the code in a portable
way and I will make sure the application works on both Linux and Windows.

Petr Onderka


On Wed, Jul 3, 2013 at 4:49 PM, Erik Zachte  wrote:

> > it will now be a command line application that outputs the data as
> uncompressed XML, in the same format as current dumps.
>
> That will help a great deal. But I assume your application will be for
> Linux only?
> So it would help to still generate current compressed dumps, as post
> processing step, and store them online for download.
>
> One of the reasons of xml dumps is platform independence, both from
> producer side (we had ever evolving SQL dumps earlier), and consumer side
> (not everyone uses Linux).
>
> Erik Zachte
>
> -Original Message-
> From: wikitech-l-boun...@lists.wikimedia.org [mailto:
> wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Petr Onderka
> Sent: Wednesday, July 03, 2013 4:04 PM
> To: Wikimedia developers; Wikipedia Xmldatadumps-l
> Subject: Re: [Wikitech-l] [Xmldatadumps-l] Suggested file format of new
> incremental dumps
>
> A reply to all those who basically want to keep the current XML dumps:
>
> I have decided to change the primary way of reading the dumps: it will now
> be a command line application that outputs the data as uncompressed XML, in
> the same format as current dumps.
>
> This way, you should be able to use the new dumps with minimal changes to
> your code.
>
> Keeping the dumps in a text-based format doesn't make sense, because that
> can't be updated efficiently, which is the whole reason for the new dumps.
>
> Petr Onderka
>
>
> On Mon, Jul 1, 2013 at 11:10 PM, Byrial Jensen  >wrote:
>
> > Hi,
> >
> > As a regular of user of dump files I would not want a "fancy" file
> > format with indexes stored as trees etc.
> >
> > I parse all the dump files (both for SQL tables and the XML files)
> > with a one pass parser which inserts the data I want (which sometimes
> > is only a small fraction of the total amount of data in the file) into
> > my local database. I will normally never store uncompressed dump
> > files, but pipe the uncompressed data directly from bunzip or gunzip
> > to my parser to save disk space. Therefore it is important to me that
> > the format is simple enough for a one pass parser.
> >
> > I cannot really imagine who would use a library with object oriented
> > API to read dump files. No matter what it would be inefficient and
> > have fewer features and possibilities than using a real database.
> >
> > I could live with a binary format, but I have doubts if it is a good
> idea.
> > It will be harder to take sure that your parser is working correctly,
> > and you have to consider things like endianness, size of integers,
> > format of floats etc. which give no problems in text formats. The
> > binary files may be smaller uncompressed (which I don't store anyway)
> > but not necessary when compressed, as the compression will do better on
> text files.
> >
> > Regards,
> > - Byrial
> >
> >
> > __**_
> > Xmldatadumps-l mailing list
> > Xmldatadumps-l@lists.**wikimedia.org
> > 
> > https://lists.wikimedia.org/**mailman/listinfo/xmldatadumps-**l > //lists.wikimedia.org/mailman/listinfo/xmldatadumps-l>
> >
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Xmldatadumps-l] Suggested file format of new incremental dumps

2013-07-03 Thread Petr Onderka
The problem is that appending is not enough, especially if you want to keep
the current format.

1. With the current format you almost could append new pages, but not new
revisions of existing pages, because they belong in the middle of the XML.
2. We also need to handle deletions (and undeletions) of pages and
revisions.
3. There are also "current" dumps, which always contain only the most
recent revision of a page.

And another advantage of the binary format is that you *can* seek easily.
If you're looking for a specific page or revision, you don't have to go
through the whole file, you can tell the application what you want, it will
look it up and output only that.

Also, even if you couldn't seek, I don't see how is this any worse than the
current situation, when you also can't seek into a specific position of the
compressed XML (unless you use multistream dumps).

Petr Onderka



On Wed, Jul 3, 2013 at 4:45 PM, Giovanni Luca Ciampaglia <
glciamp...@gmail.com> wrote:

> Petr, could you please elaborate more on this last claim? If turning the
> dump generation into an incremental process is the task you are interested
> in solving, then I don't understand how text constitutes a problem. Text
> files can be appended to as any regular file and it shouldn't be difficult
> to do this in a way that preserves the XML structure valid.
>
> As I said, having the possibility to seek and inspect the files manually
> is a tremendous boon when debugging your code. With what you propose that
> would be possible but more complicate, since one cannot seek at a specific
> position of stdout without going through the whole contents.
>
> Best
>
> Giovanni
> On Jul 3, 2013 4:05 PM, "Petr Onderka"  wrote:
>
>> A reply to all those who basically want to keep the current XML dumps:
>>
>> I have decided to change the primary way of reading the dumps: it will
>> now be a command line application that outputs the data as uncompressed
>> XML, in the same format as current dumps.
>>
>> This way, you should be able to use the new dumps with minimal changes to
>> your code.
>>
>> Keeping the dumps in a text-based format doesn't make sense, because that
>> can't be updated efficiently, which is the whole reason for the new dumps.
>>
>> Petr Onderka
>>
>>
>> On Mon, Jul 1, 2013 at 11:10 PM, Byrial Jensen 
>> wrote:
>>
>>> Hi,
>>>
>>> As a regular of user of dump files I would not want a "fancy" file
>>> format with indexes stored as trees etc.
>>>
>>> I parse all the dump files (both for SQL tables and the XML files) with
>>> a one pass parser which inserts the data I want (which sometimes is only a
>>> small fraction of the total amount of data in the file) into my local
>>> database. I will normally never store uncompressed dump files, but pipe the
>>> uncompressed data directly from bunzip or gunzip to my parser to save disk
>>> space. Therefore it is important to me that the format is simple enough for
>>> a one pass parser.
>>>
>>> I cannot really imagine who would use a library with object oriented API
>>> to read dump files. No matter what it would be inefficient and have fewer
>>> features and possibilities than using a real database.
>>>
>>> I could live with a binary format, but I have doubts if it is a good
>>> idea. It will be harder to take sure that your parser is working correctly,
>>> and you have to consider things like endianness, size of integers, format
>>> of floats etc. which give no problems in text formats. The binary files may
>>> be smaller uncompressed (which I don't store anyway) but not necessary when
>>> compressed, as the compression will do better on text files.
>>>
>>> Regards,
>>> - Byrial
>>>
>>>
>>> __**_
>>> Xmldatadumps-l mailing list
>>> Xmldatadumps-l@lists.**wikimedia.org
>>> https://lists.wikimedia.org/**mailman/listinfo/xmldatadumps-**l<https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l>
>>>
>>
>>
>> ___
>> Xmldatadumps-l mailing list
>> xmldatadump...@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>>
>>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Xmldatadumps-l] Suggested file format of new incremental dumps

2013-07-03 Thread Petr Onderka
A reply to all those who basically want to keep the current XML dumps:

I have decided to change the primary way of reading the dumps: it will now
be a command line application that outputs the data as uncompressed XML, in
the same format as current dumps.

This way, you should be able to use the new dumps with minimal changes to
your code.

Keeping the dumps in a text-based format doesn't make sense, because that
can't be updated efficiently, which is the whole reason for the new dumps.

Petr Onderka


On Mon, Jul 1, 2013 at 11:10 PM, Byrial Jensen wrote:

> Hi,
>
> As a regular of user of dump files I would not want a "fancy" file format
> with indexes stored as trees etc.
>
> I parse all the dump files (both for SQL tables and the XML files) with a
> one pass parser which inserts the data I want (which sometimes is only a
> small fraction of the total amount of data in the file) into my local
> database. I will normally never store uncompressed dump files, but pipe the
> uncompressed data directly from bunzip or gunzip to my parser to save disk
> space. Therefore it is important to me that the format is simple enough for
> a one pass parser.
>
> I cannot really imagine who would use a library with object oriented API
> to read dump files. No matter what it would be inefficient and have fewer
> features and possibilities than using a real database.
>
> I could live with a binary format, but I have doubts if it is a good idea.
> It will be harder to take sure that your parser is working correctly, and
> you have to consider things like endianness, size of integers, format of
> floats etc. which give no problems in text formats. The binary files may be
> smaller uncompressed (which I don't store anyway) but not necessary when
> compressed, as the compression will do better on text files.
>
> Regards,
> - Byrial
>
>
> __**_
> Xmldatadumps-l mailing list
> Xmldatadumps-l@lists.**wikimedia.org 
> https://lists.wikimedia.org/**mailman/listinfo/xmldatadumps-**l<https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l>
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Suggested file format of new incremental dumps

2013-07-02 Thread Petr Onderka
On Mon, Jul 1, 2013 at 10:15 PM, Daniel Friesen
wrote:

> How are you dealing with extensibility?
>
> We need to be able to extend the format. The fields of data we need to
> export change over time (just look at the changelog for our export's XSD
> file 
> https://www.mediawiki.org/xml/**export-0.7.xsd<https://www.mediawiki.org/xml/export-0.7.xsd>
> ).
>

I have touched on this in answer to Ariel's email.
I think that for now, there will be just a single data version number in
the header of the dump file.
But I will make sure to leave the possibility of having a version number on
each object open.


> Here are some things in that XML format you are missing in the incremental:
> - Redirect info
> - Upload info
> - Log items
> - Liquid Threads support
>

I should have gone to the source instead of assuming that looking at few
samples is enough.
I will add redirect and upload info to the format description.

As far as I know, log items are in a separate XML dump and I'm not planning
to replace that one.

Unless I'm mistaken, Liquid Threads don't have much of a future and are
used only on few wikis like mediawiki.org.
Does anyone actually use this information from the dumps?


> And something that I don't think we've thought about support for in our
> current export format, ContentHandler. There's metadata for it missing
> from our dumps and the data format is somewhat different than our text
> dumps have traditionally expected.


The current dumps already store model and format.
Is there something else needed for ContentHandler?
The dumps don't really care what is the format or encoding of the revision
text, it's just a byte stream to them.

Petr Onderka
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Suggested file format of new incremental dumps

2013-07-01 Thread Petr Onderka
Protocol Buffers are not a bad idea, but I'm not sure about their overhead.

AFAIK, PB have overhead of 1 byte per field.
If I'm counting correctly, with enwiki's 600M revisions and 8 fields per
revision, that means total overhead of more than 4 GB.
The fixed-size part of all revisions (i.e. without comment and text)
amounts to ~22 GB.
I think this means PB have too much overhead.

The overhead could be alleviated by using compression, but I didn't intend
to compress metadata.

So, I think I will start without PB. If I later decide to compress
metadata, I will also try to use PB and see if it works.

Also, I think that reading the binary format isn't going to be the biggest
issue if you're implementing your own library for incremental dumps,
especially if I'm going to use delta compression of revision texts.

Petr Onderka


On Mon, Jul 1, 2013 at 9:16 PM, Daniel Friesen
wrote:

> Instead of XML "or" a proprietary binary format could we try using a
> standard binary format such as Protocol Buffers as a base to reduce the
> issues with having to implement the reading/writing in multiple languages?
>
> --
> ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]
>
>
> On Mon, 01 Jul 2013 11:56:50 -0700, Tyler Romeo 
> wrote:
>
>  Petr is right on par with this one. The purpose of this version 2 for
>> dumps
>> is to allow protocol-specific incremental updating of the dump, which
>> would
>> be significantly more difficult in non-binary format.
>>
>> *-- *
>> *Tyler Romeo*
>> Stevens Institute of Technology, Class of 2016
>> Major in Computer Science
>> www.whizkidztech.com | tylerro...@gmail.com
>>
>
>
> __**_
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/**mailman/listinfo/wikitech-l<https://lists.wikimedia.org/mailman/listinfo/wikitech-l>
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Suggested file format of new incremental dumps

2013-07-01 Thread Petr Onderka
I think this would work well only for the use case where you're always
looking though the whole history of all pages.

How would you find the current revision of a specific page? Or all
revisions of a page?
What if you don't want the whole history, just current versions of all
pages?
And don't forget about deletions (and undeletions).

You could somewhat solve some of these problems (e.g. by adding indexes),
but I don't think you can solve all of them.

Petr Onderka


On Mon, Jul 1, 2013 at 9:13 PM, Dmitriy Sintsov  wrote:

> On 01.07.2013 22:56, Tyler Romeo wrote:
>
>> Petr is right on par with this one. The purpose of this version 2 for
>> dumps
>> is to allow protocol-specific incremental updating of the dump, which
>> would
>> be significantly more difficult in non-binary format.
>>
>>
>>  Why the dumps cannot be just split into daily or weekly XML files
> (optionally compressed ones). Such way the seeking will be performed by
> simply opening .MM.DD.xml file.
> It is so much simplier than going for binary git-like formats. Which would
> take a bit less space but are more prone to bugs and impossible to extract
> and analyze/edit via text/XML processing utils.
> Dmitriy
>
>
>
> __**_
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/**mailman/listinfo/wikitech-l<https://lists.wikimedia.org/mailman/listinfo/wikitech-l>
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Suggested file format of new incremental dumps

2013-07-01 Thread Petr Onderka
Compressed XML is what the current dumps use and it doesn't work well
because:
* it can't be edited
* it doesn't support seeking

I think the only way to solve this is "obscure" and requires special code
to read and write.
(And endianness is not a problem if the specification says which one it
uses and the implementation sticks to it.)

Theoretically, I could use compressed XML in internal data structures, but
I think that just combines the disadvantages of both.

So, the size is not the main reason not to use XML, it's just one of the
reasons.

Petr Onderka


On Mon, Jul 1, 2013 at 7:26 PM,  wrote:

> On 07/01/2013 12:48:11 PM, Petr Onderka - gsv...@gmail.com wrote:
>
>> >
>> > What is the intended format of the dump files? The page makes it sound
>> like
>> > it will have a binary format, which I'm not opposed to, but is
>> definitely
>> > something you should decide on.
>> >
>>
>> Yes, it is a binary format, I will make that clearer on the page.
>>
>> The advantage of a binary format is that it's smaller, which I think is
>> quite important.
>>
>
> In my experience binary formats have very little to recommend them.
>
> They are definitely more obscure. They sometimes suffer from endian
> problems. They require special code to read and write.
>
> In my experience I have found that the notion that they offer an advantage
> by being "smaller" is somewhat misguided.
>
> In particular, with XML, there is generally a very high degree of
> redundancy in the text, far more than in normal writing.
>
> The consequence of this regularity is that text based XML often compresses
> very, very well.
>
> I remember one particular instance where we were generating 30-50
> Megabytes of XML a day and needed to send it from the USA to the UK every
> day, in a situation where our leased data rate was really limiting. We were
> surprised and pleased to discover that zipping the files reduced them to
> only 1-2 MB. I have been skeptical of claims that binary formats are more
> efficient on the wire (where it matters most) ever since.
>
> I think you should do some experiments versus compressed XML to justify
> your claimed benefits of using a binary format.
>
> Jim
>
> 
>
> --
> Jim Laurino
> wican.x.jiml...@dfgh.net
> Please direct any reply to the list.
> Only mail from the listserver reaches this address.
>
>
> __**_
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/**mailman/listinfo/wikitech-l<https://lists.wikimedia.org/mailman/listinfo/wikitech-l>
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Suggested file format of new incremental dumps

2013-07-01 Thread Petr Onderka
>
> I was envisioning that we would produce "diff dumps" in one pass
> (presumably in a much shorter time than the fulls we generate now) and
> would apply those against previous fulls (in the new format) to produce
> new fulls, hopefully also in less time.  What do you have in mind for
> the production of the new fulls?
>

What I originally imagined is that the full dump will be modified directly
and a description of the changes made to it will be also written to the
diff dump.
But now I think that creating the diff and then applying it makes more
sense, because it's simpler.
But I also think that doing the two at the same time will be faster,
because it's less work (no need to read and parse the diff).
So what I imagine now is something like this:

1. Read information about a change in a page/revision
2. Create diff object in memory
3. Write the diff object to the diff file
4. Apply the diff object to the full dump


> It might be worth seeing how large the resulting en wp history files are
> going to be if you compress each revision separaately for version 1 of
> this project.  My fear is that even with 7z it's going to make the size
> unwieldy.  If the thought is that it's a first round prototype, not
> meant to be run on large projects, that's another story.
>

I do expect that full dump of enwiki using this compression would be way
too big.
So yes, this was meant just to have something working, so that I can
concentrate on doing compression properly later (after the mid-term).


> I'm not sure about removing the restrictions data; someone must have
> wanted it, like the other various fields that have crept in over time.
> And we should expect there will be more such fields over time...
>

If I understand the code in XmlDumpWriter.openPage correctly, that data
comes from the page_restrictions row [1], which doesn't seem to be used in
non-ancient versions of MediaWiki.

I did think about versioning the page and revision objects in the dump, but
I'm not sure how exactly to handle upgrades from one version to another.
For now, I think I'll have just one global "data version" per file, but
I'll make sure that adding a version to each object in the future will be
possible.


> We need to get some of the wikidata users in on the model/format
> discussion, to see what use they plan to make of those fields and what
> would be most convenient for them.
>
> It's quite likely that these new fulls will need to be split into chunks
> much as we do with the current en wp files.  I don't know what that
> would mean for the diff files.  Currently we split in an arbitrary way
> based on sequences of page numbers, writing out separate stub files and
> using those for the content dumps.  Any thoughts?
>

If possible, I would prefer to keep everything in a single file.
If that won't be possible, I think it makes sense to split on page ids, but
make the split id visible (probably in the file name) and unchanging  from
month to month.
If it turns out that a single chunk grows too big, we might consider adding
a "split" instruction to diff dumps, but that's probably not necessary now.

Petr Onderka

[1]: http://www.mediawiki.org/wiki/Manual:Page_table#page_restrictions
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Suggested file format of new incremental dumps

2013-07-01 Thread Petr Onderka
>
> What is the intended format of the dump files? The page makes it sound like
> it will have a binary format, which I'm not opposed to, but is definitely
> something you should decide on.
>

Yes, it is a binary format, I will make that clearer on the page.

The advantage of a binary format is that it's smaller, which I think is
quite important.

I think the main advantages of text-based formats is that there are lots of
tools for the common ones (XML and JSON) and that they are human readable.
But those tools wouldn't be very useful, because we certainly want to have
some sort of custom compression scheme and the tools wouldn't be able to
work with that.
And I think human readability is mostly useful if we want others to be able
to write their own code that directly accesses the data.
And, because of the custom compression, doing that won't be that easy
anyway. And hopefully, it won't be necessary, because there will be a nice
library usable by everyone (see below).


> Also, I really like the idea of writing it in a low level language and then
> having bindings for something higher. However, unless you plan of having
> multiple language bindings (e.g., *both* C# and Python), you may want to
> pick a different route. For example, if you decide to only bind to Python,
> you can use something like Cython, which would allow you to write
> pseudo-Python that is still compiled to C. Of course, if you want multiple
> language bindings, this is likely no longer an option.
>

Right now, everyone can read the dumps in their favorite language.
If I write the library interface well, writing bindings for it for another
language should be relatively trivial, so everyone can keep using their
favorite language.

And I admit, I'm proposing doing it this way partially because of selfish
reasons: I'd like to use this library in my future C# code.
But I realize creating something that works only in C# doesn't make sense,
because most people in this community don't use it.
So, to me writing the code so that it can be used from anywhere makes the
most sense

Petr Onderka


>  On Mon, Jul 1, 2013 at 10:00 AM, Petr Onderka  wrote:
>
> > For my GSoC project Incremental data dumps [1], I'm creating a new file
> > format to replace Wikimedia's XML data dumps.
> > A sketch of how I imagine the file format to look like is at
> > http://www.mediawiki.org/wiki/User:Svick/Incremental_dumps/File_format.
> >
> > What do you think? Does it make sense? Would it work for your use case?
> > Any comments or suggestions are welcome.
> >
> > Petr Onderka
> > [[User:Svick]]
> >
> > [1]: http://www.mediawiki.org/wiki/User:Svick/Incremental_dumps
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Suggested file format of new incremental dumps

2013-07-01 Thread Petr Onderka
For my GSoC project Incremental data dumps [1], I'm creating a new file
format to replace Wikimedia's XML data dumps.
A sketch of how I imagine the file format to look like is at
http://www.mediawiki.org/wiki/User:Svick/Incremental_dumps/File_format.

What do you think? Does it make sense? Would it work for your use case?
Any comments or suggestions are welcome.

Petr Onderka
[[User:Svick]]

[1]: http://www.mediawiki.org/wiki/User:Svick/Incremental_dumps
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] showing videos and images in modal viewers within articles

2013-05-30 Thread Petr Onderka
On Thu, May 30, 2013 at 12:21 PM, billinghurst wrote:

> > (On a side note, the TMH behavior should be improved to actually play
> the video
> > immediately, not require a second click to play in modal view.)
>
> Please NO, if there has to be anything please make it an preference that
> users can toggle, but have the default as OFF.  There is more than
> Wikipedia here; there are some of us who don't want to be playing videos
> that others have selected for us.
>

I think you might have misunderstood.

As far as I understand it, the current behavior is:
Click 1 on thumbnail -> modal view opens
Click 2 on the video in modal view -> video starts playing

And the proposed change is to:
Click 1 on thumbnail -> modal window opens and starts playing

This is not about playing videos as soon as a page loads.

Petr Onderka
[[en:User:Svick]]
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Querying the database

2013-05-22 Thread Petr Onderka
This is probably not what you want to hear, but one way would be to get a
Toolserver account.
That way, you wouldn't need the query service, you could run those queries
by yourself.

Petr Onderka
[[en:User:Svick]]


On Wed, May 22, 2013 at 10:03 PM, Tuszynski, Jaroslaw W. <
jaroslaw.w.tuszyn...@saic.com> wrote:

> I do a lot of maintenance tasks on Commons, and many tasks require some
> sort of database query to find the oddball cases. The queries can be done
> through one of several ways:
> 1) Using CatScan and CatScan2[2] tools
> 2) Database query service [3]
> 3) Weekly Database reports [4]
>
> Unfortunately lately some of those ways are breaking down. CatScan and
> CatScan2 rarely work failing in many different ways: usually due to
> exceeding the 'max_user_connections' (30 for magnus's CatScan2, and 15 for
> Daniel's CatScan), but otherwise with some timeout or no-connection errors,
> or can work on a query for hours (or days if you let it) and never returns
> anything. I developed some CatScan2 based queries for Creator template
> maintenance, that worked fine 2-3 years ago, but always time-out since.
> That might be due more and more images on Commons.  Similarly, Database
> query service seems also very inactive. There are many requests and few
> replies, like my request from April 2 [5].
>
> For example, lately I was searching for images on Commons that do not have
> any license templates (sometimes since 2007 or earlier), see [5]. At some
> point
> Magnus was helping me with that query, however after it  failed several
> times with "server not found" error we gave up. It seems like less and less
> can be done with current infrastructure.
>
> So are there any non-toolserver based alternatives for database queries? I
> was trying to read about Wikimedia Labs looking for tools based on them.
> Ideally there would be some CatScan2 like tool that is based on different
> database, with higher number of users allowed.
>
> Jarek T.
> User:jarekt [6]
>
> [1] http://toolserver.org/~daniel/WikiSense/CategoryIntersect.php
> [2] http://toolserver.org/~magnus/catscan_rewrite.php
> [3] https://jira.toolserver.org/browse/DBQ
> [4] http://commons.wikimedia.org/wiki/Commons:Database_reports
> [5] https://jira.toolserver.org/browse/DBQ-201
> [6] http://commons.wikimedia.org/wiki/User:Jarekt
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Incremental XML dumps GSoC proposal

2013-05-02 Thread Petr Onderka
I realized I didn't post my proposal to the list yet (I have added it to
the official GSoC site few days ago), so here it is:

http://www.mediawiki.org/wiki/User:Svick/Incremental_dumps

In short, the project aims to create new format for dumps (which allow
users to download parts of the database of Wikimedia projects). The primary
advantage of this new format will be that it should take shorter time to
create the dump, because the previous dump can be reused.

Any comments or co-mentors (as far as I know, Ariel Glenn is currently the
only potential mentor on this project) are welcome.

Petr Onderka
[[en:User:Svick]]
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] OPW: deadline for submissions is May 1

2013-04-30 Thread Petr Onderka
When I'm confused about timezones, I tend to use Wolfram Alpha:

http://www.wolframalpha.com/input/?i=19%3A00pm+UTC+on+May+1%2C+2013

19:00pm UTC on May 1, 2013
1 day 33 minutes 44 seconds in the future

Petr Onderka
[[en:User:Svick]]


On Tue, Apr 30, 2013 at 8:22 PM, Jiabao Wu  wrote:

> Hi Quim,
>
> Thank you for your reminder. Does this mean that the applicants have one
> more day, or half a hour? >...< Sorry I always feel pretty confused about
> this...
>
>
> On Wed, May 1, 2013 at 3:07 AM, Quim Gil  wrote:
>
> > Hi,
> >
> > The deadline for submissions to the FOSS Outreach Program for Women is
> >
> > *** 19:00pm UTC on May 1, 2013 ***
> >
> > You must send your application to opw-list{}gnome.org before the
> > deadline. No exceptions!
> >
> >
> https://live.gnome.org/**OutreachProgramForWomen#Send_**in_an_Application<
> https://live.gnome.org/OutreachProgramForWomen#Send_in_an_Application>
> >
> > You can keep improving your proposal in your project wiki page after the
> > deadline. It is ok if you haven't completed your mandatory contribution
> > yet. It is also ok if you don't have two co-mentors confirmed yet.
> >
> > Also remember: http://lists.wikimedia.org/**pipermail/wikitech-l/2013-**
> > April/068842.html<
> http://lists.wikimedia.org/pipermail/wikitech-l/2013-April/068842.html>
> >
> > If you hesitate, just apply. You will have more days to ask questions and
> > improve your proposal.
> >
> > After applying you can add yourself to https://www.mediawiki.org/**
> > wiki/Outreach_Program_for_**Women#Candidates<
> https://www.mediawiki.org/wiki/Outreach_Program_for_Women#Candidates>
> >
> >
> > What happens after submitting your OPW proposal:
> >
> > * If you are also applying to GSoC then focus on GSoC. The first
> > evaluation will be done in that context following the GSoC process and
> > without any gender specific considerations.
> >
> > * If your proposal is strictly for OPW then we will start evaluating it
> > and we might come back to you with questions.
> >
> >
> > The official announcement of accepted interns will be done by the OPW
> > program on May 27:
> https://live.gnome.org/**OutreachProgramForWomen/2013/*
> > *JuneSeptember#Schedule<
> https://live.gnome.org/OutreachProgramForWomen/2013/JuneSeptember#Schedule
> >
> >
> > However, we might confirm some or all the Wikimedia interns by May 10. It
> > will partially depend on the GSoC selection process.
> >
> > Good luck to everybody!
> >
> > --
> > Quim Gil
> > Technical Contributor Coordinator @ Wikimedia Foundation
> > http://www.mediawiki.org/wiki/**User:Qgil<
> http://www.mediawiki.org/wiki/User:Qgil>
> >
> > __**_
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/**mailman/listinfo/wikitech-l<
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Making inter-language links shorter

2013-04-18 Thread Petr Onderka
There is a user script [1] that does a primitive version of this.
I have found it to be quite useful, so I think it's a good idea to do this
properly.

Petr Onderka
[[en:User:Svick]]

[1]: http://en.wikipedia.org/wiki/User:Lampak/MyLanguages


On Thu, Apr 18, 2013 at 6:50 PM, Pau Giner  wrote:

> As multilingual content grows, interlanguage links become longer on
> Wikipedia articles. Articles such as "Barak Obama" or "Sun" have more than
> 200 links, and that becomes a problem for users that often switch among
> several languages.
>
> As part of the future plans for the Universal Language Selector, we were
> considering to:
>
>- Show only a short list of the relevant languages for the user based on
>geo-IP, previous choices and browser settings of the current user. The
>language the users are looking for will be there most of the times.
>- Include a "more" option to access the rest of the languages for which
>the content exists with an indicator of the number of languages.
>- Provide a list of the rest of the languages that users can easily scan
>(grouped by script and region ao that alphabetical ordering is
> possible),
>and search (allowing users to search a language name in another
> language,
>using ISO codes or even making typos).
>
> I have created a prototype <http://pauginer.github.io/prototype-uls/#lisa>
> to
> illustrate the idea. Since this is not connected to the MediaWiki backend,
> it lacks the advanced capabilities commented above but you can get the
> idea.
> If you are interested in the missing parts, you can check the flexible
> search and the list of likely languages ("common languages" section) on the
> language selector used at http://translatewiki.net/ which is connected to
> MediaWiki backend.
>
> As part of the testing process for the ULS language settings, I included a
> task to test also the compact interlanguage designs. Users seem to
> understand their use (view
> recording<https://www.usertesting.com/highlight_reels/qPYxPW1aRi1UazTMFreR
> >),
> but I wanted to get some feedback for changes affecting such an important
> element.
>
> Please let me know if you see any possible concern with this approach.
>
> Thanks
>
>
> --
> Pau Giner
> Interaction Designer
> Wikimedia Foundation
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Article on API Characteristics

2013-04-17 Thread Petr Onderka
>
> So you're suggesting we go *against* the HTTP standard? That's not exactly
> what you're supposed to do.
>

Well, ignoring the header makes more sense to me and, personally, I would
prefer that behavior.

But it's a minor issue and I think going against the standard is not
actually worth it.

Petr Onderka
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Article on API Characteristics

2013-04-17 Thread Petr Onderka
I didn't necessarily mean that the 100-Continue workflow should be fully
supported.
I think Ignoring the header would be much better than completely refusing
to work with it (and replying with error 417) and my guess is that doing
that should be possible.

Looking at the HTTP specification [1], it says that a proxy *has to* return
417 if the target server doesn't support HTTP/1.1.
Though I have no idea why would the specification require this.

Petr Onderka
[[en:user:Svick]]

[1]: http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8.2.3


On Wed, Apr 17, 2013 at 10:33 PM, Brian Wolff  wrote:

> My understanding is its not really possible to do this in php in a way
> that would actually be of use to anyone. See
> https://bugzilla.wikimedia.org/show_bug.cgi?id=26631#c1
>
> -bawolff
>
> On 4/17/13, Petr Onderka  wrote:
> > Regarding #7 in that list (Expect: 100-Continue), I think it would be
> nice
> > if Wikimedia wikis did this.
> >
> > I know that at least in .Net, if I send a POST request to
> > http://en.wikipedia.org/w/api.php,
> > the Expect: 100-Continue header will be set, which results in an 417
> > Expectation failed error.
> >
> > .Net has a switch to turn that header off, and with that the request will
> > work fine.
> > But I think it would be nice if Wikimedia wikis supported this.
> >
> > I think this is an issue with something in Wikimedia's configuration
> > (Squid? or maybe something like that) and not MediaWiki itself, because
> it
> > works fine for my local MediaWiki installation even with Expect:
> > 100-Continue set.
> >
> > Petr Onderka
> > [[en:User:Svick]]
> >
> >
> > On Wed, Apr 17, 2013 at 5:50 AM, Tyler Romeo 
> wrote:
> >
> >> Found this interesting articles on designing an API for what it's worth.
> >> Thought some people my find it interesting.
> >>
> >> http://mathieu.fenniak.net/the-api-checklist/
> >> *-- *
> >> *Tyler Romeo*
> >> Stevens Institute of Technology, Class of 2015
> >> Major in Computer Science
> >> www.whizkidztech.com | tylerro...@gmail.com
> >> ___
> >> Wikitech-l mailing list
> >> Wikitech-l@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Article on API Characteristics

2013-04-17 Thread Petr Onderka
Regarding #7 in that list (Expect: 100-Continue), I think it would be nice
if Wikimedia wikis did this.

I know that at least in .Net, if I send a POST request to
http://en.wikipedia.org/w/api.php,
the Expect: 100-Continue header will be set, which results in an 417
Expectation failed error.

.Net has a switch to turn that header off, and with that the request will
work fine.
But I think it would be nice if Wikimedia wikis supported this.

I think this is an issue with something in Wikimedia's configuration
(Squid? or maybe something like that) and not MediaWiki itself, because it
works fine for my local MediaWiki installation even with Expect:
100-Continue set.

Petr Onderka
[[en:User:Svick]]


On Wed, Apr 17, 2013 at 5:50 AM, Tyler Romeo  wrote:

> Found this interesting articles on designing an API for what it's worth.
> Thought some people my find it interesting.
>
> http://mathieu.fenniak.net/the-api-checklist/
> *-- *
> *Tyler Romeo*
> Stevens Institute of Technology, Class of 2015
> Major in Computer Science
> www.whizkidztech.com | tylerro...@gmail.com
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Updating the dump file?

2013-04-14 Thread Petr Onderka
There are experimental incremental dumps [1],
but I'm not sure if they would work for you, especially since they don't
seem to go to January.

There's also a proposed Google Summer of Code project to make this possible
[2],
but that's going to be completed in September at the earliest.

So, right now, I think the best option is to download the whole dump again.

Petr Onderka
[[en:User:Svick]]

[1]: http://dumps.wikimedia.org/other/incr/
[2]:
https://www.mediawiki.org/wiki/Summer_of_Code_2013#Incremental_data_dumps


On Sun, Apr 14, 2013 at 4:37 PM, Sajid Hussain wrote:

> Hi!. I downloaded “pages-articles.xml.bz2″ and I am using it through
> WikiTaxi. It’s working fine but I have a question. I downloaded the above
> mentioned dump which was created in Jan. 2013. Now there’s the new version
> of this dump available created in April 2013. My question is can I update
> my older dump to the current one without downloading all the data? I only
> need the updates.
> Please help.
>
> --
> Sajid Hussain,
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Wikidata queries

2013-03-28 Thread Petr Onderka
How will be the queries formatted? Do I understand it correctly that a
QueryConcept is a JSON object?
Have you considered using something more in line with the format of
action=query queries?
Though I guess what you need is much more complicated and trying to
fit it into the action=query model wouldn't end well.

Petr Onderka
[[en:User:Svick]]

2013/3/28 Denny Vrandečić :
> We have a first write up of how we plan to support queries in Wikidata.
> Comments on our errors and requests for clarifications are more than
> welcome.
>
> <https://meta.wikimedia.org/wiki/Wikidata/Development/Queries>
>
> Cheers,
> Denny
>
> P.S.: unfortunately, no easter eggs inside.
>
> --
> Project director Wikidata
> Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
> Tel. +49-30-219 158 26-0 | http://wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
> der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
> Körperschaften I Berlin, Steuernummer 27/681/51985.
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] serial programming using termios

2013-03-17 Thread Petr Onderka
This list is about "the technical organization of the Wikimedia projects."

That includes MediaWiki programming and related things,
but certainly not general Unix IO programming.

And the fact that a page on a Wikimedia wiki talks about it doesn't
change anything about that.

So I'm afraid you will have to find help elsewhere.
Personally, I would suggest that you ask your questions on stackoverflow.com.

Petr Onderka
[[en:User:Svick]]

On Sun, Mar 17, 2013 at 1:39 PM, Ted Reynard  wrote:
> Hi guys
> Does anybody know about serial programming using termios?
> I found this link in wikibook : 
> http://en.wikibooks.org/wiki/Serial_Programming/termios
> I have some questions. Please let me know if anybody has ever worked with 
> termios.
>
> Thanks
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Seemingly proprietary Javascript

2013-03-05 Thread Petr Onderka
On Tue, Mar 5, 2013 at 9:16 PM, Tyler Romeo  wrote:
> Also, popular libraries
> (such as Google's hosted versions of jQuery and others) always include
> license headers in the minified versions.

That's not what I see.
If I look at jQuery as hosted by Google [1], it starts with the
following comment (and nothing more):

/*! jQuery v1.9.1 | (c) 2005, 2012 jQuery Foundation, Inc. |
jquery.org/license //@ sourceMappingURL=jquery.min.map */

It does link to a license (though it doesn't even mention what the
license is directly),
but it certainly doesn't contain the whole license itself.
And, as I understand it, that's what you claim is required and
what others claim would be a waste of bandwidth

[1]: http://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js

Petr Onderka
[[en:User:Svick]]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Module namespace

2013-02-20 Thread Petr Onderka
According to 
http://hu.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=namespaces
(and http://en.wikipedia.org/wiki/Wikipedia:Namespace), it's 828.

Petr Onderka
[[en:User:Svick]]

On Thu, Feb 21, 2013 at 1:30 AM, Bináris  wrote:
> What is the number of the new namespace we got from Scribunto?
> How can we localize the name of it? In huwiki it should be "Modul" rather
> than "Module". We found only core namespaces in translatewiki.
>
> Thank you!
>
> --
> Bináris
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] LinqToWiki: new library for accessing the API from .Net

2013-02-17 Thread Petr Onderka
I didn't realize that was a requirement for libraries (the library
allowed setting the UserAgent, but it didn't force it until now).
I have made that change and setting the UserAgent is now required.

Petr Onderka
[[en:User:Svick]]

On Sun, Feb 17, 2013 at 7:42 PM, Yuri Astrakhan  wrote:
> Petr, make sure you require users to set their *User-Agent* string. Your
> library should not use any defaults.
>
> For the examples I would recommend this *User-Agent:*
>
> *MyCoolTool/1.1 (http://example.com/MyCoolTool/; mycoolt...@example.com)
> LinqToWiki/1.0*
>
> See http://www.mediawiki.org/wiki/API:Main_page#Identifying_your_client
>
>
> On Sun, Feb 17, 2013 at 12:56 PM, Petr Onderka  wrote:
>
>> I'd like to introduce LinqToWiki: a new library for accessing the
>> MediaWiki API from .Net languages (e.g. C#).
>> Its main advantage is that it knows the API and is strongly-typed,
>> which means autocompletion works on API modules, module parameters and
>> result properties and correctness is checked at compile time.
>>
>> More information is at http://en.wikipedia.org/wiki/User:Svick/LinqToWiki.
>>
>> Any comments are welcome.
>>
>> Petr Onderka
>> [[en:User:Svick]]
>>
>> ___
>> Wikitech-l mailing list
>> Wikitech-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] LinqToWiki: new library for accessing the API from .Net

2013-02-17 Thread Petr Onderka
I'd like to introduce LinqToWiki: a new library for accessing the
MediaWiki API from .Net languages (e.g. C#).
Its main advantage is that it knows the API and is strongly-typed,
which means autocompletion works on API modules, module parameters and
result properties and correctness is checked at compile time.

More information is at http://en.wikipedia.org/wiki/User:Svick/LinqToWiki.

Any comments are welcome.

Petr Onderka
[[en:User:Svick]]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] API data push/pull guidelines

2013-02-02 Thread Petr Onderka
There is a page about the etiquette of using the API. [1]

In short, your bot should ideally make requests in a series (not
multiple requests in parallel)
and use maxlag=5.

But the bug you referred to already includes this link,
and I don't think there are better guidelines.

[1]: http://www.mediawiki.org/wiki/API:Etiquette

Petr Onderka
[[en:User:Svick]]

On Sat, Feb 2, 2013 at 8:07 PM, Small M  wrote:
> Hello,
>
> There is a lack of guidelines regarding the rate at the mediawiki API may be 
> used.
>
> Specifically for the enwiki and commons api (if it doesn't matter, please say 
> so):
>
> For bots:
>
> Are simultaneous uploads permitted? E.g. uploading 3 20MB files 
> simultaneously to the commons on a 1Gbit/s line. What is the max rate 
> permitted?
>
> I was recommended at https://bugzilla.wikimedia.org/show_bug.cgi?id=44584 to 
> post to this mailing list.
>
> -Small
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] a slightly weird search result in the Italian Wikipedia

2012-09-16 Thread Petr Onderka
I think it's related to the other search result (Fiori di Bach):

> … necessarie per effettuare [[diagnosi|autodiagnosi]] e 
> [[terapia|autopratica]].

That article links to the article "Terapia", but the link text is "autopratica",
so, presumably, the search engine takes this to mean that
"terapia" and "autopratica" are synonyms (or at least closely related).

Petr Onderka
[[en:User:Svick]]

On Sun, Sep 16, 2012 at 4:28 PM, Amir E. Aharoni
 wrote:
> Hi,
>
> If I search for the word "autopratica" [1] in the Italian Wikipedia,
> the article "Terapia" [2] comes up as the first result. That word
> doesn't appear in that article. Why does it appear in the results?
>
> [1] 
> https://it.wikipedia.org/w/index.php?search=autopratica&title=Speciale%3ARicerca
> [2] https://it.wikipedia.org/wiki/Terapia
>
> --
> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
> http://aharoni.wordpress.com
> ‪“We're living in pieces,
> I want to live in peace.” – T. Moore‬
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Design comments (and note about no-www)

2012-08-14 Thread Petr Onderka
That's what they proposed it should look like.

Currently, there is a direct link to the Czech Wikipedia
(and all the other Wikipedias) on www.wikipedia.org, like you said.

But in the proposal from http://www.wikipediaredefined.com/,
which is what I was replying to (but wasn't quoted in Mark's email),
it is the way I described.

Petr Onderka
[[en:User:Svick]]

On Tue, Aug 14, 2012 at 10:51 PM, Daniel Zahn  wrote:
>> On 14 August 2012 20:08, Mark Holmquist  wrote:
>>
>>> For example, to get to Czech Wikipedia from www.wikipedia.org,
>>>> I have to roll over the top right corner?
>>>> That's absolutely unusable, I would never think of that.
>
> I don't get the part about "roll over the top right corner". I see a
> direct link to Česky Wikipedia in the "100 000+" AND in the drop-down
> menu.
>
> --
> Daniel Zahn 
> Operations Engineer
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Design comments

2012-08-14 Thread Petr Onderka
I have a feeling that they are trying to make Wikipedia pretty,
but at the cost of making it much less functional.

For example, to get to Czech Wikipedia from www.wikipedia.org,
I have to roll over the top right corner?
That's absolutely unusable, I would never think of that.

Petr Onderka
[[en:User:Svick]]

On Tue, Aug 14, 2012 at 7:44 PM, Martijn Hoekstra
 wrote:
> I found this not at all bad looking. whatever your take, it's always
> nice to have an outside view: http://www.wikipediaredefined.com/
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikisource link stats

2012-05-03 Thread Petr Onderka
This information is in the iwlinks table [1], I don't know about any
Special: page that can be used to access it.

You can search the table for iwl_prefix = 'wikisource'.

[1]: http://www.mediawiki.org/wiki/Manual:Iwlinks_table

Petr Onderka
[[en:User:Svick]]

On Thu, May 3, 2012 at 10:46 PM, Lars Aronsson  wrote:
> From [[Special:Linksearch]] I can find all the external links,
> based on the external links table in the database, which can
> be accessed by tools on the German toolserver.
>
> But is there any way to find similar information about links to
> Wikisource? I.e. what are the total number of links? Which pages
> link to a particular Wikisource page?
>
>
> --
>  Lars Aronsson (l...@aronsson.se)
>  Aronsson Datateknik - http://aronsson.se
>
>
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] API problem

2012-03-23 Thread Petr Onderka
Hi,

> That's what I say: preload is empty and watched is completely missing from
> result.

Watched behaves correctly for me, if you aren't watching the page,
then the watched attribute *should* be missing. Try watching the page
and you'll notice the parameter is there (as long as you query the API
when logged in under the same account).

>> But I think pages without a preload subpage will show with an empty
>> preload. Just because you can link to a page with a preload link,
>> doesn't mean that page has preload.
>>
> This one has, you may see if you click on my original link.

No, the page doesn't have any preload. You can see that if you try
editing it in normal way. *Your link* has preload (notice the preload
parameter), but that has nothing to do with the page itself, so it
won't show in the API.

>> Also, I think would be best if you included what have you tried in
>> your question the next time
>
> E.g. this is what I tried what you have just linked; it is an example of
> fail.

And if you said that right away, you might have gotten the answer that
will actually help you faster, which was my point.

Petr Onderka
[[en:User:Svick]]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] API problem

2012-03-23 Thread Petr Onderka
Hi, the query would be something like:

http://hu.wikipedia.org/w/api.php?action=query&titles=Wikip%C3%A9dia:Javaslat%20tiszts%C3%A9gvisel%C5%91k%20megv%C3%A1laszt%C3%A1s%C3%A1ra/ddd&prop=info&inprop=watched|preload

But I think pages without a preload subpage will show with an empty
preload. Just because you can link to a page with a preload link,
doesn't mean that page has preload.

Also, I think would be best if you included what have you tried in
your question the next time, because that way, people can pinpoint
where exactly was your mistake, or look somewhere else if you did
everything correctly.

Petr Onderka
[[en:User:Svick]]

On Fri, Mar 23, 2012 at 16:37, Bináris  wrote:
> Hi,
> I was playing with properties as written at
> https://www.mediawiki.org/wiki/API:Properties#Info:_Parameters
> I spent a lot of time, but I cannot get the properties watched and preload
> anyway. Preload is empty for an existing page and None for a non-existing
> page with preload; watched doeas not appear at all. I tried it in browser
> as well as with bot, logged in as admin. Could somebody please provide me a
> working link that shows watched and preload?
> Page example with
> preload<http://hu.wikipedia.org/w/index.php?action=edit&preload=Wikip%C3%A9dia%3AJavaslat+tiszts%C3%A9gvisel%C5%91k+megv%C3%A1laszt%C3%A1s%C3%A1ra%2Fpreload&editintro=&summary=&nosummary=&prefix=&minor=&title=Wikip%C3%A9dia%3AJavaslat+tiszts%C3%A9gvisel%C5%91k+megv%C3%A1laszt%C3%A1s%C3%A1ra%2Fddd&create=%C3%9Aj+v%C3%A1laszt%C3%A1si+allap+l%C3%A9trehoz%C3%A1sa>
>
> I am interested in new pages in namespace MediaWiki where a text is
> preloaded but without a preload subpage (I guess taken from translatewiki).
> Do they qualify as preload and may be got by API?
>
>
> --
> Bináris
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] About page ID

2012-02-18 Thread Petr Onderka
You can do that, only the URL is slightly longer:

http://en.wikipedia.org/wiki?curid=2312711

Although I don't understand what would be the benefit of doing that.

Petr Onderka
[[User:Svick]]

On Sat, Feb 18, 2012 at 14:09, John Erling Blad  wrote:
> In some cases it would be better to linke on article ids than their
> names, something like
> http://en.wikipedia.org/aid/123456
>
> One example is as a link to an article in Wikipedia from tweet posted
> through the Twitter API.
>
> John
>
> On Sat, Feb 18, 2012 at 1:51 PM, Bináris  wrote:
>> 2012/2/18 Alex Brollo 
>>
>>> Is there a sound reason to hidden so well the main id of pages? Is there
>>> any drawback to show it anywhere into wikies, and to use it much largely
>>> for links and API calls?
>>>
>>> Deleting and restoring/recreating results in a new id, and pages take
>> their id upon renaming; is the id still useful for linking with these
>> limitations? I just ask it because it is not perfectly clean for me what
>> you mean by that.
>>
>>
>> --
>> Bináris
>> ___
>> Wikitech-l mailing list
>> Wikitech-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Mediawiki-api] Getting list of possible result properties

2011-12-12 Thread Petr Onderka
Hi,

> Nice work. Two questions:
> * Is that only for retrieving properties of list elements?

Yeah, that's what I think I need to dynamically create a nice
object-oriented library that accesses the API. What elese do you think
would be useful?

> * Isn't that the right place to start provinding xml/json/whatever
> schemata for api results?

With my addition, I think paraminfo provides most of the information
something like XSD would. I'm not sure it would be a good idea to add
one or more completely different ways to do the same thing. Especially
since the schema would be different for each specific query.

Petr Onderka
[[en:User:Svick]]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Mediawiki-api] Getting list of possible result properties

2011-12-08 Thread Petr Onderka
Hi,

I have created a proof of concept of this, which contains the
information only for categorymembers. The code is at [1] and it looks
something like this (XML formatted):


  

  

  
  

  
  

  
  

  

  page
  subcat
  file

  

  


What do you think?

One problem I'm aware of is that the output uses “prop” and “property”
to mean something else. Do you have any suggestions for better naming?

After this, I will add the necessary information to the rest of the
API modules and then post a patch to bugzilla.

[1] 
https://github.com/svick/mediawiki/commit/868910637445ea0dcf3ad84bc1ee9fc337f7b9c3

Petr Onderka
[[en:User:Svick]]

On Thu, Nov 10, 2011 at 11:37, Roan Kattouw  wrote:
> On Wed, Nov 9, 2011 at 11:36 PM, Petr Onderka  wrote:
>> Is this information available somewhere? Is trying the query and
>> seeing what properties are returned the best I can do currently?
> Unfortunately, no, at least not programmatically.
>
>> Do
>> you think it would be a good idea if I (or someone else) modified
>> “action=paraminfo” to include this information in some form?
>>
> Yes, please do! Patches are very welcome.
>
> Roan
>
> ___
> Mediawiki-api mailing list
> mediawiki-...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] No such anchor: CITEREF...

2010-04-11 Thread Petr Onderka
It's caused by incorrect use of templates for Harvard referencing like
Template:Harv.
Some of those errors are caused by a change in Template:Cite book and
similar few months back, when they were changed, so that they don't
produce the anchors on default, because it was often causing invalid
HTML. (This kind of errors can be fixed by adding the parameter
ref=harv to the appropriate citation template.)
The other errors are caused by typos or not understanding how Harvard
referencing works on WP.
And not many people check those errors or know how to fix them (my guess).

I created a (incomplete) list of articles with those errors at
http://svick.aspweb.cz/ (details for Mumbai are at
http://svick.aspweb.cz/Harv2.aspx/Detail/1933) and a user javascript
to show error messages for those errors at
http://en.wikipedia.org/wiki/User:Svick/HarvErrors.js.

Svick

On Thu, Apr 8, 2010 at 01:52,   wrote:
> I don't get it. In e.g., http://en.wikipedia.org/wiki/Mumbai there are
> tons of e.g.,
> Patel & Masselos 2003
> Mehta 2004
> Hansen 2001
> But no corresponding anchors. One will only get
> No such anchor: CITEREFHansen2001
> in ones browser. What's the deal?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l