Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further Development

2013-11-15 Thread Shawn Jones
Thanks Brian,

Defaulting to only allow $wgContentNamespaces, or more specifically, 
MWNamespace::getContentNamespaces(), worked great.

--Shawn


From: wikitech-l-boun...@lists.wikimedia.org 
[wikitech-l-boun...@lists.wikimedia.org] on behalf of Brian Wolff 
[bawo...@gmail.com]
Sent: Friday, November 01, 2013 6:43 PM
To: Wikimedia developers
Subject: Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further
Development

Hi, I responded inline.

On 11/1/13, Shawn Jones  wrote:
> Hi,
>
> I'm currently working on the Memento Extension for Mediawiki, as announced
> earlier today by Herbert Van de Sompel.
>
> The goal of this extension is to work with the Memento framework, which
> attempts to display web pages as they appeared at a given date and time in
> the past.
>
> Our goal is for this to be a collaborative effort focusing on solving issues
> and providing functionality in "the Wikimedia Way" as much as possible.
>
> Without further ado, I have the following technical questions (I apologize
> in advance for the fire hose):
>
> 1.  The Memento protocol has a resource called a TimeMap [1] that takes an
> article name and returns text formatted as application/link-format.  This
> text contains a machine-readable list of all of the prior revisions
> (mementos) of this page.  It is currently implemented as a SpecialPage which
> can be accessed like
> http://www.example.com/index.php/Special:TimeMap/Article_Name.  Is this the
> best method, or is it more preferable for us to extend the Action class and
> add a new action to $wgActions in order to return a TimeMap from the regular
> page like
> http://www.example.com/index.php?title=Article_Name&action=gettimemap
> without using the SpecialPage?  Is there another preferred way of solving
> this problem?

Special Page vs Action is usually considered equally ok for this sort
of thing. However creating an api module would probably be the
preferred method to return such machine readable data about a page.

> 2.  We currently make several database calls using the the select method of
> the Database Object.  After some research, we realized that Mediawiki
> provides some functions that do what we need without making these database
> calls directly.  One of these needs is to acquire the oldid and timestamp of
> the first revision of a page, which can be done using
> Title->getFirstRevision()->getId() and
> Title->getFirstRevision()->getTimestamp() methods.  Is there a way to get
> the latest ID and latest timestamp?  I see I can do Title->getLatestRevID()
> to get the latest revision ID; what is the best way to get the latest
> timestamp?

Use existing wrapper functions around DB calls where you can, but if
you need to its ok to query the db directly.

For the last part, probably something along the lines of
WikiPage::factory( $titleObj )->getRevision()->getTimestamp()

> 3.  In order to create the correct headers for use with the Memento
> protocol, we have to generate URIs.  To accomplish this, we use the
> $wgServer global variable (through a layer of abstraction); how do we
> correctly handle situations if it isn't set by the installation?  Is there
> an alternative?  Is there a better way to construct URIs?

$wgServer is always filled out (Setup.php sets it if user doesn't).
However you probably shouldn't be using it directly. What the most
appropriate method to use depends on what sort of urls you want, but
generally the Title class has methods like getFullURL for this sort of
thing.


> 4.  We use exceptions to indicate when showErrorPage should be run; should
> the hooks that catch these exceptions and then run showErrorPage also return
> false?

I haven't looked at your code, so not sure about the context - but: In
general a hook returns true to denote no futher processing should take
place. Displaying an error message sounds like a good criteria to
return true. That said, things may depend on the hook and what
precisely you're doing.
>
> 5.  Is there a way to get previous revisions of embedded content, like
> images?  I tried using the ImageBeforeProduceHTML hook, but found that
> setting the $time parameter didn't return a previous revision of an image.
> Am I doing something wrong?  Is there a better way?

FlaggedRevisions manages to set old version of an image, so its
possible. I think you might want to do something with the
BeforeParserFetchFileAndTitle hook as well. For the time parameter,
make sure the function you're using has the $time parameter marked as
pass by reference. Also note: the time parameter is the timestamp that
the image version was created, it does not denote get whatever image
would be relavent at the time specified (I believe).

>
> 6.  Are there any a

Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further Development

2013-11-15 Thread Shawn Jones
Hi Dan,

Thank you very much for your offer of assistance form the WMF.  We have several 
issues that need to be addressed.

1.  Completely eliminating the use of Mediawiki's global variables.

In our extension, we have eliminated the use of all of Mediawiki's global 
variables except $wgScriptPath.  We use it to construct the URIs for the 
Memento headers with the wfAppendQuery and wfExpandUrl functions.  Is there a 
better way to get the full URI for the Mediawiki installation (including the 
'index.php' part of the path) without resorting to this variable so we can 
reconstruct the URIs of past articles?

2.  Test installations

We were hoping one of your test Wikipedia instances was available so that the 
community could experiment with our extension further.

3.  How best to handle performance testing

We are planning on conducting performance testing, either at Los Alamos, Old 
Dominion University, or one of the test Wikipedia instances, and wanted your 
input to determine what credible experiments should we set up to demonstrate 
the performance impact of our extension on a Mediawiki installation.

Our plan was to the following test groups:
1.  no Memento Mediawiki extension installed - access to current and old 
revision (memento) pages
2.  no Memento Mediawiki extension installed - using a screen scraping 
script to simulate the use of the history pages associated with each article in 
a way that attempts to achieve the goals of Memento, but only via Mediawiki's 
native UI
3.  no Memento Mediawiki extension installed - use of Mediawiki's existing 
XML api to achieve the same goals of Memento
4.  use of our native Memento Mediawiki extension with only the mandatory 
headers - access to current and old revision (memento) pages
5.  use of our native Memento Mediawiki extension with only the mandatory 
headers - with the focus on performing time negotiation and acquiring the 
correct revision
6.  use of our native Memento Mediawiki extension with all headers - access 
to current and old revision (memento) pages
7.  use of our native Memento Mediawiki extension with all headers - again 
focusing on time negotiation

During each of these test runs, we would use a utility like vmstat, iostat, 
and/or collectl to measure load on the system, including memory/disk access, 
and compare the results across multiple runs.

Also, are there pre-existing tools for testing Mediawiki that we should be 
using and is there anything we are missing with our methodology?

3.  Architectural feedback to ensure that we've followed Mediawiki's best 
practices

Our extension is more object-oriented than its first incarnation, utilizing a 
mediator pattern, strategy pattern, template methods and factory methods to 
achieve its goals.  I can generate a simplified inheritance diagram to show the 
relationships, but was wondering if we should trim down the levels of 
inheritance for performance reasons.

4.  Advice on how best to market this extension

We can advertise the extension on the wikitech-l and mediawiki-l lists, and do 
have a Mediawiki Extension page, but were wondering if there were conferences, 
web sites, etc. that could be used to help get the word out that our extension 
is available for use, review, input, and further extension.  Any advice would 
be most helpful.

Thanks in advance,

Shawn M. Jones
Graduate Research Assistant
Department of Computer Science
Old Dominion University

From: wikitech-l-boun...@lists.wikimedia.org 
[wikitech-l-boun...@lists.wikimedia.org] on behalf of Dan Garry 
[dga...@wikimedia.org]
Sent: Monday, November 11, 2013 5:47 PM
To: Wikimedia developers
Subject: Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further
Development

Hi Shawn,

Thanks for starting this discussion!

Other than the suggestions that've been provided, how are you looking for
the WMF to help you with this extension? Our engineers are very limited on
time, so it might be helpful to hear from you about how you'd like us to
help.

Thanks,
Dan


On 1 November 2013 19:50, Shawn Jones  wrote:

> Hi,
>
> I'm currently working on the Memento Extension for Mediawiki, as announced
> earlier today by Herbert Van de Sompel.
>
> The goal of this extension is to work with the Memento framework, which
> attempts to display web pages as they appeared at a given date and time in
> the past.
>
> Our goal is for this to be a collaborative effort focusing on solving
> issues and providing functionality in "the Wikimedia Way" as much as
> possible.
>
> Without further ado, I have the following technical questions (I apologize
> in advance for the fire hose):
>
> 1.  The Memento protocol has a resource called a TimeMap [1] that takes an
> article name and returns text formatted as application/link-format.  This
> text contains a ma

Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further Development

2013-11-11 Thread Dan Garry
Hi Shawn,

Thanks for starting this discussion!

Other than the suggestions that've been provided, how are you looking for
the WMF to help you with this extension? Our engineers are very limited on
time, so it might be helpful to hear from you about how you'd like us to
help.

Thanks,
Dan


On 1 November 2013 19:50, Shawn Jones  wrote:

> Hi,
>
> I'm currently working on the Memento Extension for Mediawiki, as announced
> earlier today by Herbert Van de Sompel.
>
> The goal of this extension is to work with the Memento framework, which
> attempts to display web pages as they appeared at a given date and time in
> the past.
>
> Our goal is for this to be a collaborative effort focusing on solving
> issues and providing functionality in "the Wikimedia Way" as much as
> possible.
>
> Without further ado, I have the following technical questions (I apologize
> in advance for the fire hose):
>
> 1.  The Memento protocol has a resource called a TimeMap [1] that takes an
> article name and returns text formatted as application/link-format.  This
> text contains a machine-readable list of all of the prior revisions
> (mementos) of this page.  It is currently implemented as a SpecialPage
> which can be accessed like
> http://www.example.com/index.php/Special:TimeMap/Article_Name.  Is this
> the best method, or is it more preferable for us to extend the Action class
> and add a new action to $wgActions in order to return a TimeMap from the
> regular page like
> http://www.example.com/index.php?title=Article_Name&action=gettimemapwithout 
> using the SpecialPage?  Is there another preferred way of solving
> this problem?
>
> 2.  We currently make several database calls using the the select method
> of the Database Object.  After some research, we realized that Mediawiki
> provides some functions that do what we need without making these database
> calls directly.  One of these needs is to acquire the oldid and timestamp
> of the first revision of a page, which can be done using
> Title->getFirstRevision()->getId() and
> Title->getFirstRevision()->getTimestamp() methods.  Is there a way to get
> the latest ID and latest timestamp?  I see I can do Title->getLatestRevID()
> to get the latest revision ID; what is the best way to get the latest
> timestamp?
>
> 3.  In order to create the correct headers for use with the Memento
> protocol, we have to generate URIs.  To accomplish this, we use the
> $wgServer global variable (through a layer of abstraction); how do we
> correctly handle situations if it isn't set by the installation?  Is there
> an alternative?  Is there a better way to construct URIs?
>
> 4.  We use exceptions to indicate when showErrorPage should be run; should
> the hooks that catch these exceptions and then run showErrorPage also
> return false?
>
> 5.  Is there a way to get previous revisions of embedded content, like
> images?  I tried using the ImageBeforeProduceHTML hook, but found that
> setting the $time parameter didn't return a previous revision of an image.
>  Am I doing something wrong?  Is there a better way?
>
> 6.  Are there any additional coding standards we should be following
> besides those on the "Manual:Coding_conventions" and "Manual:Coding
> Conventions - Mediawiki" pages?
>
> 7.  We have two styles for serving pages back to the user:
>* 302-style[2], which uses a 302 redirect to tell the user's
> browser to go fetch the old revision of the page (e.g.
> http://www.example.com/index.php?title=Article&oldid=12345)
>* 200-style[3], which actually modifies the page content in place
> so that it resembles the old revision of the page
>  Which of these styles is preferable as a default?
>
> 8.  Some sites don't wish to have their past Talk/Discussion pages
> accessible via Memento.  We have the ability to exclude namespaces (Talk,
> Template, Category, etc.) via configurable option.  By default it excludes
> nothing.  What namespaces should be excluded by default?
>
> Thanks in advance for any advice, assistance, further discussion, and
> criticism on these and other topics.
>
> Shawn M. Jones
> Graduate Research Assistant
> Department of Computer Science
> Old Dominion University
>
> [1] http://www.mementoweb.org/guide/rfc/ID/#Pattern6
> [2] http://www.mementoweb.org/guide/rfc/ID/#Pattern1.1
> [3] http://www.mementoweb.org/guide/rfc/ID/#Pattern1.2
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
Dan Garry
Associate Product Manager for Platform
Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further Development

2013-11-08 Thread Shawn Jones

This worked beautifully.

Thanks Daniel Friesen,

--Shawn

From: wikitech-l-boun...@lists.wikimedia.org 
[wikitech-l-boun...@lists.wikimedia.org] on behalf of Daniel Friesen 
[dan...@nadir-seen-fire.com]
Sent: Saturday, November 02, 2013 7:08 PM
To: Wikimedia developers
Subject: Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further 
Development

On 2013-11-02 11:53 AM, Shawn Jones wrote:
> That makes me feel a little bit better about our dependencies.
>
> Since our rewrite, we only use $wgServer (via abstraction) in two places now, 
> and they both involve the TimeMap SpecialPage.
>
> We actually have 3 different types of TimeMaps in the Memento Mediawiki 
> Extension:
> 1. full (starter) - shows the latest 500 revisions
> 2. pivot descending - shows the last 500 (or less) revisions prior to a given 
> timestamp pivot
> 3. pivot ascending - shows the next 500 (or less) revisions after a given 
> timestamp pivot
>
> The pivot ascending and pivot descending TimeMaps are what use the $wgServer 
> URI.
>
> They take the form of 
> http://example.com/index.php/Special:TimeMap/2013072003/1/Article for 
> ascending and 
> http://example.com/index.php/Special:TimeMap/2013072003/-1/Page for 
> descending.
>
> The $wgServer variable is used (as $this->mwbaseurl) to construct the URIs 
> like so:
>
>   $timeMapPage['uri'] = $this->mwbaseurl . '/'
>   . SpecialPage::getTitleFor('TimeMap') . '/'
>   . $pivotTimestamp . '/-1/' . $title;
>
> A similar statement exists for a pivot ascending TimeMap elsewhere in the 
> code.
>
> I've been trying to find a way to eliminate the use of $wgServer altogether, 
> but need to construct these URIs for headers, TimeMap entries, etc.
>
> Is there a better way?
>
$timeMapPage['uri'] = SpecialPage::getTitleFor( 'TimeMap', 
$pivotTimestamp . '/-1/' . $title )->get{???}URL();


{???} will be Full, Local, or Canonical depending on where you're
outputting it.

  * href="" -> Local
  * Somewhere used on other domains -> Full (does output protocol-relative)
  * Print and email -> Canonical
  * HTTP Headers -> Local + wfExpandUrl( , PROTO_CURRENT ); Unless you
use OutputPage::redirect in which case you can simply use Local as
url expansion is taken care for you.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further Development

2013-11-06 Thread Remco de Boer
Good! I'd like to run some tests on some of our data (we run several SMW
instances). I will have to prepare a separate environment with the latest
versions of MW and SMW and the Memento extension. Nothing too difficult,
but it'll probably take some time.


On Tue, Nov 5, 2013 at 1:48 AM, Herbert Van de Sompel wrote:

> On Nov 4, 2013, at 14:24, Remco de Boer  wrote:
> > Hi Shawn,
> >
> >
> > I'm currently working on the Memento Extension for Mediawiki, as
> announced
> >> earlier today by Herbert Van de Sompel.
> >
> > This is very exciting! Coincidentally, at last week's SMWCon (the
> Semantic
> > MediaWiki conference) in Berlin I gave a presentation to argue that we
> need
> > some sort of 'time travelling' feature (slides are available at
> > http://slidesha.re/1iIf3F9). One of the other participants also pointed
> out
> > the Memento protocol.
> >
> > Are you familiar with Semantic MediaWiki (
> http://www.semantic-mediawiki.org/)
> > as an extension to MediaWiki? I'm curious what it would take to let SMW
> > play nice together with Memento.
>
> Before announcing the Memento extension to this list we tested it with a
> locally installed Semantic MediaWiki and all seemed OK. It would be great
> if someone could test it on a live one with actual real data. We got in
> touch with the people behind http://neurolex.org/wiki/Main_Page but they
> are running an older MediaWiki version and are not in a hurry to upgrade
> because they have a lot of extensions.
>
> From the early days of Memento, we have been very interested in semantic
> web, linked data applications of the Memento protocol. See, for example:
> - http://arxiv.org/abs/1003.3661 - illustrates the power of the protocol
> to do time series analysis across versions of linked data description (in
> DBpedia)
> - http://mementoweb.org/depot/native/dbpedia/ - the DBpedia archive that
> we operate and that is Memento compliant -
>
> Greetings
>
> Herbert
>
> >
> > Kind regards,
> >
> > Remco de Boer
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further Development

2013-11-04 Thread Herbert Van de Sompel
On Nov 4, 2013, at 14:24, Remco de Boer  wrote:
> Hi Shawn,
> 
> 
> I'm currently working on the Memento Extension for Mediawiki, as announced
>> earlier today by Herbert Van de Sompel.
> 
> This is very exciting! Coincidentally, at last week's SMWCon (the Semantic
> MediaWiki conference) in Berlin I gave a presentation to argue that we need
> some sort of 'time travelling' feature (slides are available at
> http://slidesha.re/1iIf3F9). One of the other participants also pointed out
> the Memento protocol.
> 
> Are you familiar with Semantic MediaWiki (http://www.semantic-mediawiki.org/)
> as an extension to MediaWiki? I'm curious what it would take to let SMW
> play nice together with Memento.

Before announcing the Memento extension to this list we tested it with a 
locally installed Semantic MediaWiki and all seemed OK. It would be great if 
someone could test it on a live one with actual real data. We got in touch with 
the people behind http://neurolex.org/wiki/Main_Page but they are running an 
older MediaWiki version and are not in a hurry to upgrade because they have a 
lot of extensions.

From the early days of Memento, we have been very interested in semantic web, 
linked data applications of the Memento protocol. See, for example:
- http://arxiv.org/abs/1003.3661 - illustrates the power of the protocol to do 
time series analysis across versions of linked data description (in DBpedia)
- http://mementoweb.org/depot/native/dbpedia/ - the DBpedia archive that we 
operate and that is Memento compliant - 

Greetings

Herbert

> 
> Kind regards,
> 
> Remco de Boer
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further Development

2013-11-04 Thread Remco de Boer
Hi Shawn,


I'm currently working on the Memento Extension for Mediawiki, as announced
> earlier today by Herbert Van de Sompel.
>

This is very exciting! Coincidentally, at last week's SMWCon (the Semantic
MediaWiki conference) in Berlin I gave a presentation to argue that we need
some sort of 'time travelling' feature (slides are available at
http://slidesha.re/1iIf3F9). One of the other participants also pointed out
the Memento protocol.

Are you familiar with Semantic MediaWiki (http://www.semantic-mediawiki.org/)
as an extension to MediaWiki? I'm curious what it would take to let SMW
play nice together with Memento.

Kind regards,

Remco de Boer
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further Development

2013-11-04 Thread Lee Worden

From: Brian Wolff
On 2013-11-04 11:04 AM, "Brad Jorsch (Anomie)"
wrote:

>
>On Fri, Nov 1, 2013 at 6:43 PM, Brian Wolff  wrote:

> >I haven't looked at your code, so not sure about the context - but: In
> >general a hook returns true to denote no futher processing should take
> >place.

>
>If we're talking about wfRunHooks hooks, the usual case is that they
>return*false*  to indicate no further processing, and true means to
>*continue*  processing.
>

D'oh. You are of course correct. Sorry for the mistake.

As an aside, perhaps we should introduce constants for this. Its easy to
mix up the two values.

-bawolff


+1 - great idea!

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further Development

2013-11-04 Thread Seb35
Le Sat, 02 Nov 2013 21:15:01 +0100, Shawn Jones  a écrit  
:

Seb35,

I came across your extension a month ago.  Ours is different in that it  
is also implementing the Memento protocol as used by the Internet  
Archive, Archive-It, and others.


I do however, appreciate your insight in trying to solve many of the  
same problems.  I, too, was trying to address the retrieval of old  
versions of templates, which brought me to your extension.  Your use of  
BeforeParserFetchTemplateAndtitle inspired parts of our Template  
solution.


We're currently trying to figure out how to handle images.

What did you mean by MediaWiki messages?  Are you referring to the  
Messages API as part of I18N?
In my attempt to recreate the more exact previous display of a past  
version, I thought about retrieving the old version of interface messages  
(whose the stylesheet MediaWiki:Common.css/js et others stylesheets); I  
thought also to modifications of LocalSettings.php and MediaWiki version  
in order to recreate previous bugs, but this would be quite difficult and  
probably not very interesting from a user point of view (and probably not  
secure).


Seb35



Thanks again,

--Shawn


On Nov 1, 2013, at 9:50 PM, Seb35  wrote:


Hi,

No responses to your specific questions, but just to mention I worked  
some years ago on an extension [1] aiming at retrieving the  
as-exact-as-possible display of the page at a given past datetime,  
because the current implementation of oldid is only "past wikitext with  
current context (templates, images, etc.)".


I mainly implemented the retrieval of old versions of templates, but a  
lot of other smaller improvements could be done (MediaWiki messages,  
styles/JS, images, etc.). With this approach, some details are  
irremediably lost (e.g. number of articles at given timedate, some  
tricky delete-and-move actions, etc.) and additional informations would  
have to be recorded to retrieve more exactly the past versions.


[1] https://www.mediawiki.org/wiki/Extension:BackwardsTimeTravel

~ Seb35

Le Fri, 01 Nov 2013 20:50:06 +0100, Shawn Jones  a  
écrit:

Hi,

I'm currently working on the Memento Extension for Mediawiki, as  
announced earlier today by Herbert Van de Sompel.


The goal of this extension is to work with the Memento framework,  
which attempts to display web pages as they appeared at a given date  
and time in the past.


Our goal is for this to be a collaborative effort focusing on solving  
issues and providing functionality in "the Wikimedia Way" as much as  
possible.


Without further ado, I have the following technical questions (I  
apologize in advance for the fire hose):


1.  The Memento protocol has a resource called a TimeMap [1] that  
takes an article name and returns text formatted as  
application/link-format.  This text contains a machine-readable list  
of all of the prior revisions (mementos) of this page.  It is  
currently implemented as a SpecialPage which can be accessed like  
http://www.example.com/index.php/Special:TimeMap/Article_Name.  Is  
this the best method, or is it more preferable for us to extend the  
Action class and add a new action to $wgActions in order to return a  
TimeMap from the regular page like  
http://www.example.com/index.php?title=Article_Name&action=gettimemap  
without using the SpecialPage?  Is there another preferred way of  
solving this problem?


2.  We currently make several database calls using the the select  
method of the Database Object.  After some research, we realized that  
Mediawiki provides some functions that do what we need without making  
these database calls directly.  One of these needs is to acquire the  
oldid and timestamp of the first revision of a page, which can be done  
using Title->getFirstRevision()->getId() and  
Title->getFirstRevision()->getTimestamp() methods.  Is there a way to  
get the latest ID and latest timestamp?  I see I can do  
Title->getLatestRevID() to get the latest revision ID; what is the  
best way to get the latest timestamp?


3.  In order to create the correct headers for use with the Memento  
protocol, we have to generate URIs.  To accomplish this, we use the  
$wgServer global variable (through a layer of abstraction); how do we  
correctly handle situations if it isn't set by the installation?  Is  
there an alternative?  Is there a better way to construct URIs?


4.  We use exceptions to indicate when showErrorPage should be run;  
should the hooks that catch these exceptions and then run  
showErrorPage also return false?


5.  Is there a way to get previous revisions of embedded content, like  
images?  I tried using the ImageBeforeProduceHTML hook, but found that  
setting the $time parameter didn't return a previous revision of an  
image.  Am I doing something wrong?  Is there a better way?


6.  Are there any additional coding standards we should be following  
besides those on the "Manual:Coding_conventions" and "Manual:Coding  
Conventions - Media

Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further Development

2013-11-04 Thread Brian Wolff
On 2013-11-04 11:04 AM, "Brad Jorsch (Anomie)" 
wrote:
>
> On Fri, Nov 1, 2013 at 6:43 PM, Brian Wolff  wrote:
> > I haven't looked at your code, so not sure about the context - but: In
> > general a hook returns true to denote no futher processing should take
> > place.
>
> If we're talking about wfRunHooks hooks, the usual case is that they
> return *false* to indicate no further processing, and true means to
> *continue* processing.
>
>
> --
> Brad Jorsch (Anomie)
> Software Engineer
> Wikimedia Foundation
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

D'oh. You are of course correct. Sorry for the mistake.

As an aside, perhaps we should introduce constants for this. Its easy to
mix up the two values.

-bawolff
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further Development

2013-11-04 Thread Brad Jorsch (Anomie)
On Fri, Nov 1, 2013 at 6:43 PM, Brian Wolff  wrote:
> I haven't looked at your code, so not sure about the context - but: In
> general a hook returns true to denote no futher processing should take
> place.

If we're talking about wfRunHooks hooks, the usual case is that they
return *false* to indicate no further processing, and true means to
*continue* processing.


-- 
Brad Jorsch (Anomie)
Software Engineer
Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further Development

2013-11-02 Thread Daniel Friesen
On 2013-11-02 11:53 AM, Shawn Jones wrote:
> That makes me feel a little bit better about our dependencies.
>
> Since our rewrite, we only use $wgServer (via abstraction) in two places now, 
> and they both involve the TimeMap SpecialPage.
>
> We actually have 3 different types of TimeMaps in the Memento Mediawiki 
> Extension:
> 1. full (starter) - shows the latest 500 revisions
> 2. pivot descending - shows the last 500 (or less) revisions prior to a given 
> timestamp pivot
> 3. pivot ascending - shows the next 500 (or less) revisions after a given 
> timestamp pivot
>
> The pivot ascending and pivot descending TimeMaps are what use the $wgServer 
> URI.
>
> They take the form of 
> http://example.com/index.php/Special:TimeMap/2013072003/1/Article for 
> ascending and 
> http://example.com/index.php/Special:TimeMap/2013072003/-1/Page for 
> descending.
>
> The $wgServer variable is used (as $this->mwbaseurl) to construct the URIs 
> like so:
>
>   $timeMapPage['uri'] = $this->mwbaseurl . '/'
>   . SpecialPage::getTitleFor('TimeMap') . '/'
>   . $pivotTimestamp . '/-1/' . $title;
>
> A similar statement exists for a pivot ascending TimeMap elsewhere in the 
> code.
>
> I've been trying to find a way to eliminate the use of $wgServer altogether, 
> but need to construct these URIs for headers, TimeMap entries, etc.
>
> Is there a better way?
>
$timeMapPage['uri'] = SpecialPage::getTitleFor( 'TimeMap', 
$pivotTimestamp . '/-1/' . $title )->get{???}URL();


{???} will be Full, Local, or Canonical depending on where you're
outputting it.

  * href="" -> Local
  * Somewhere used on other domains -> Full (does output protocol-relative)
  * Print and email -> Canonical
  * HTTP Headers -> Local + wfExpandUrl( , PROTO_CURRENT ); Unless you
use OutputPage::redirect in which case you can simply use Local as
url expansion is taken care for you.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further Development

2013-11-02 Thread Shawn Jones
Seb35,

I came across your extension a month ago.  Ours is different in that it is also 
implementing the Memento protocol as used by the Internet Archive, Archive-It, 
and others.

I do however, appreciate your insight in trying to solve many of the same 
problems.  I, too, was trying to address the retrieval of old versions of 
templates, which brought me to your extension.  Your use of 
BeforeParserFetchTemplateAndtitle inspired parts of our Template solution.

We're currently trying to figure out how to handle images.

What did you mean by MediaWiki messages?  Are you referring to the Messages API 
as part of I18N?

Thanks again,

--Shawn


On Nov 1, 2013, at 9:50 PM, Seb35  wrote:

> Hi,
> 
> No responses to your specific questions, but just to mention I worked some 
> years ago on an extension [1] aiming at retrieving the as-exact-as-possible 
> display of the page at a given past datetime, because the current 
> implementation of oldid is only "past wikitext with current context 
> (templates, images, etc.)".
> 
> I mainly implemented the retrieval of old versions of templates, but a lot of 
> other smaller improvements could be done (MediaWiki messages, styles/JS, 
> images, etc.). With this approach, some details are irremediably lost (e.g. 
> number of articles at given timedate, some tricky delete-and-move actions, 
> etc.) and additional informations would have to be recorded to retrieve more 
> exactly the past versions.
> 
> [1] https://www.mediawiki.org/wiki/Extension:BackwardsTimeTravel
> 
> ~ Seb35
> 
> Le Fri, 01 Nov 2013 20:50:06 +0100, Shawn Jones  a écrit:
>> Hi,
>> 
>> I'm currently working on the Memento Extension for Mediawiki, as announced 
>> earlier today by Herbert Van de Sompel.
>> 
>> The goal of this extension is to work with the Memento framework, which 
>> attempts to display web pages as they appeared at a given date and time in 
>> the past.
>> 
>> Our goal is for this to be a collaborative effort focusing on solving issues 
>> and providing functionality in "the Wikimedia Way" as much as possible.
>> 
>> Without further ado, I have the following technical questions (I apologize 
>> in advance for the fire hose):
>> 
>> 1.  The Memento protocol has a resource called a TimeMap [1] that takes an 
>> article name and returns text formatted as application/link-format.  This 
>> text contains a machine-readable list of all of the prior revisions 
>> (mementos) of this page.  It is currently implemented as a SpecialPage which 
>> can be accessed like 
>> http://www.example.com/index.php/Special:TimeMap/Article_Name.  Is this the 
>> best method, or is it more preferable for us to extend the Action class and 
>> add a new action to $wgActions in order to return a TimeMap from the regular 
>> page like 
>> http://www.example.com/index.php?title=Article_Name&action=gettimemap 
>> without using the SpecialPage?  Is there another preferred way of solving 
>> this problem?
>> 
>> 2.  We currently make several database calls using the the select method of 
>> the Database Object.  After some research, we realized that Mediawiki 
>> provides some functions that do what we need without making these database 
>> calls directly.  One of these needs is to acquire the oldid and timestamp of 
>> the first revision of a page, which can be done using 
>> Title->getFirstRevision()->getId() and 
>> Title->getFirstRevision()->getTimestamp() methods.  Is there a way to get 
>> the latest ID and latest timestamp?  I see I can do Title->getLatestRevID() 
>> to get the latest revision ID; what is the best way to get the latest 
>> timestamp?
>> 
>> 3.  In order to create the correct headers for use with the Memento 
>> protocol, we have to generate URIs.  To accomplish this, we use the 
>> $wgServer global variable (through a layer of abstraction); how do we 
>> correctly handle situations if it isn't set by the installation?  Is there 
>> an alternative?  Is there a better way to construct URIs?
>> 
>> 4.  We use exceptions to indicate when showErrorPage should be run; should 
>> the hooks that catch these exceptions and then run showErrorPage also return 
>> false?
>> 
>> 5.  Is there a way to get previous revisions of embedded content, like 
>> images?  I tried using the ImageBeforeProduceHTML hook, but found that 
>> setting the $time parameter didn't return a previous revision of an image.  
>> Am I doing something wrong?  Is there a better way?
>> 
>> 6.  Are there any additional coding standards we should be following besides 
>> those on the "Manual:Coding_conventions" and "Manual:Coding Conventions - 
>> Mediawiki" pages?
>> 
>> 7.  We have two styles for serving pages back to the user:
>>   * 302-style[2], which uses a 302 redirect to tell the user's browser 
>> to go fetch the old revision of the page (e.g. 
>> http://www.example.com/index.php?title=Article&oldid=12345)
>>   * 200-style[3], which actually modifies the page content in place so 
>> that it resembles the old revision 

Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further Development

2013-11-02 Thread Shawn Jones
Thanks Marcin for the response.

I've provided comments and questions inline, where I have them.

On Nov 1, 2013, at 6:51 PM, Marcin Cieslak  wrote:

>>> Shawn Jones  wrote:
> 
>> 1.  The Memento protocol has a resource called a TimeMap [1]
>> that takes an article name and returns text formatted as
>> application/link-format.  This text contains a machine-readable
>> list of all of the prior revisions (mementos) of this page.  It is
>> currently implemented as a SpecialPage which can be accessed like
>> http://www.example.com/index.php/Special:TimeMap/Article_Name.
>> Is this the best method, or is it more preferable for us to
>> extend the Action class and add a new action to $wgActions
>> in order to return a TimeMap from the regular page like
>> http://www.example.com/index.php?title=Article_Name&action=gettimemap
>> without using the SpecialPage?  Is there another preferred way of
>> solving this problem?
> 
> It just occured to be that if TimeMap were a microformat, this
> information could be embeded in to ?title=Article_Name&action=history
> itself.
> 
> Even then, if we need an additional MIME type for that maybe
> we could vary action=history response based on the desired MIME
> type (text/html or linking format).

It would be excellent to have it available as a Microformat.  We had not 
considered it.

The way the Memento framework operates, these TimeMaps are directly accessible 
resources (e.g. GET http://example/TimeMap) and no additional processing is 
performed to extract them.

I'm glad you brought up the action=history.  One of the ideas we had discussed 
was actually varying action=history with an additional set of arguments to 
produce the TimeMap.

We were concerned with what best fit into MediaWiki's future 
plans/goals/philosophy.

> 
>> 3.  In order to create the correct headers for use with the Memento
>> protocol, we have to generate URIs.  To accomplish this, we use the
>> $wgServer global variable (through a layer of abstraction); how do we
>> correctly handle situations if it isn't set by the installation?  Is
>> there an alternative?  Is there a better way to construct URIs?
> 
> We have wfExpandUrl (yes, there are some bugs currently wrt empty $wgServer
> now... https://bugzilla.wikimedia.org/show_bug.cgi?id=54950).

Actually, looking at our code, we have used wfExpandUrl, but can likely use it 
on the few lines left that access $wgServer.  My longer response to Brian Wolff 
now seems unnecessary.

Now that I'm looking at the docs, it states "Assumes $wgServer is correct."

If the local installation munges $wgServer in some way, and we're not using it 
directly, then I guess it's their responsibility to deal with the fallout?

Can I count it good if I just move our remaining lines to use wfExpandUrl?

> 
>> 5.  Is there a way to get previous revisions of embedded content, like
>> images?  I tried using the ImageBeforeProduceHTML hook, but found that
>> setting the $time parameter didn't return a previous revision of an
>> image.  Am I doing something wrong?  Is there a better way?
> 
> I'm not in a position to give you a full answer, but what I would
> do I would try to see if I can setup a MediaWiki with $wgInstantCommons = true
> and see how I can make ForeignAPIRepo to fetch older revisions
> from Wikimedia via API. Then we can have a look at other media
> storage backends, including those used by WMF installation.

I'll look into this.

>> 8.  Some sites don't wish to have their past Talk/Discussion pages
>> accessible via Memento.  We have the ability to exclude namespaces
>> (Talk, Template, Category, etc.) via configurable option.  By default
>> it excludes nothing.  What namespaces should be excluded by default?
> 
> There might be interesting issues about deleted content, some
> people feel very strongly about making it unavailable to others
> (partly due to some legal issues); some people setup wikis dedicated
> to provide content deleted from Wikipedia. Are you sure history
> should not be redacted at times? :-)
> 
> Not sure why somebody does not like archiving Talk pages like this
> but I think this feature could be enabled per-namespace like many
> others in MediaWiki. Archiving media and files will be certainly different
> and you will run into interesting issues with versioning Categories
> and Templates. Extension:FlaggedRevs has some method to
> track what kind of ancilliary content has been modified 
> (FRInclusionManager.php and FRInclusionCache.php might be things
> to look at).

I'll look into this.

> And a question back to you:
> 
> How are you going to handle versioning of stuff like MediaWiki:Common.js,
> MediaWiki:Common.css independently of the proper content itself?
> Some changes might affect presentation of the content meaningfully,
> for example see how https://en.wikipedia.org/wiki/Template:Nts works.

We had not considered Common.js and Common.css yet.  Our first goal was to get 
previous page content loaded, then move on to include previous 

Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further Development

2013-11-02 Thread Shawn Jones
Thanks Brian, this is all good stuff.

To avoid text overload, I, too have responded inline where I have more comments 
and questions.

> 
>> 2.  We currently make several database calls using the the select method of
>> the Database Object.  After some research, we realized that Mediawiki
>> provides some functions that do what we need without making these database
>> calls directly.  One of these needs is to acquire the oldid and timestamp of
>> the first revision of a page, which can be done using
>> Title->getFirstRevision()->getId() and
>> Title->getFirstRevision()->getTimestamp() methods.  Is there a way to get
>> the latest ID and latest timestamp?  I see I can do Title->getLatestRevID()
>> to get the latest revision ID; what is the best way to get the latest
>> timestamp?
> 
> Use existing wrapper functions around DB calls where you can, but if
> you need to its ok to query the db directly.
> 
> For the last part, probably something along the lines of
> WikiPage::factory( $titleObj )->getRevision()->getTimestamp()

That enormous sound you heard was my palm hitting my forehead.  Thanks for 
pointing that one out for me.

We'll be replacing our getFirstMemento and getLastMemento functions soon now 
that we have Mediawiki-esque solutions for them.

There are other instances in which we access the database:
* <= given Timestamp (this is what gets the old revision of the page)
* Time Map data (fetch the id and timestamp of the last 500 revisions)

I doubt there is something built into Mediawiki that already provides that 
capability.  If there is, please advise.  :)

> 
>> 3.  In order to create the correct headers for use with the Memento
>> protocol, we have to generate URIs.  To accomplish this, we use the
>> $wgServer global variable (through a layer of abstraction); how do we
>> correctly handle situations if it isn't set by the installation?  Is there
>> an alternative?  Is there a better way to construct URIs?
> 
> $wgServer is always filled out (Setup.php sets it if user doesn't).
> However you probably shouldn't be using it directly. What the most
> appropriate method to use depends on what sort of urls you want, but
> generally the Title class has methods like getFullURL for this sort of
> thing.

That makes me feel a little bit better about our dependencies.

Since our rewrite, we only use $wgServer (via abstraction) in two places now, 
and they both involve the TimeMap SpecialPage.

We actually have 3 different types of TimeMaps in the Memento Mediawiki 
Extension:
1. full (starter) - shows the latest 500 revisions
2. pivot descending - shows the last 500 (or less) revisions prior to a given 
timestamp pivot
3. pivot ascending - shows the next 500 (or less) revisions after a given 
timestamp pivot

The pivot ascending and pivot descending TimeMaps are what use the $wgServer 
URI.

They take the form of 
http://example.com/index.php/Special:TimeMap/2013072003/1/Article for 
ascending and 
http://example.com/index.php/Special:TimeMap/2013072003/-1/Page for 
descending.

The $wgServer variable is used (as $this->mwbaseurl) to construct the URIs like 
so:

$timeMapPage['uri'] = $this->mwbaseurl . '/'
. SpecialPage::getTitleFor('TimeMap') . '/'
. $pivotTimestamp . '/-1/' . $title;

A similar statement exists for a pivot ascending TimeMap elsewhere in the code.

I've been trying to find a way to eliminate the use of $wgServer altogether, 
but need to construct these URIs for headers, TimeMap entries, etc.

Is there a better way?

>> 5.  Is there a way to get previous revisions of embedded content, like
>> images?  I tried using the ImageBeforeProduceHTML hook, but found that
>> setting the $time parameter didn't return a previous revision of an image.
>> Am I doing something wrong?  Is there a better way?
> 
> FlaggedRevisions manages to set old version of an image, so its
> possible. I think you might want to do something with the
> BeforeParserFetchFileAndTitle hook as well. For the time parameter,
> make sure the function you're using has the $time parameter marked as
> pass by reference. Also note: the time parameter is the timestamp that
> the image version was created, it does not denote get whatever image
> would be relavent at the time specified (I believe).

I'll have to experiment with that and get back.

Thanks again,

--Shawn



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further Development

2013-11-02 Thread Shawn Jones
Thank you all very much for your timely responses.

I'll be reviewing them today and will probably have more questions as time goes 
on.

You've given us a lot to consider and discuss.

--Shawn

On Nov 1, 2013, at 7:33 PM, Chad 
mailto:innocentkil...@gmail.com>> wrote:

On Fri, Nov 1, 2013 at 3:43 PM, Brian Wolff 
mailto:bawo...@gmail.com>> wrote:

Hi, I responded inline.

On 11/1/13, Shawn Jones mailto:sj...@cs.odu.edu>> wrote:
Hi,

I'm currently working on the Memento Extension for Mediawiki, as
announced
earlier today by Herbert Van de Sompel.

The goal of this extension is to work with the Memento framework, which
attempts to display web pages as they appeared at a given date and time
in
the past.

Our goal is for this to be a collaborative effort focusing on solving
issues
and providing functionality in "the Wikimedia Way" as much as possible.

Without further ado, I have the following technical questions (I
apologize
in advance for the fire hose):

1.  The Memento protocol has a resource called a TimeMap [1] that takes
an
article name and returns text formatted as application/link-format.  This
text contains a machine-readable list of all of the prior revisions
(mementos) of this page.  It is currently implemented as a SpecialPage
which
can be accessed like
http://www.example.com/index.php/Special:TimeMap/Article_Name.  Is this
the
best method, or is it more preferable for us to extend the Action class
and
add a new action to $wgActions in order to return a TimeMap from the
regular
page like
http://www.example.com/index.php?title=Article_Name&action=gettimemap
without using the SpecialPage?  Is there another preferred way of solving
this problem?

Special Page vs Action is usually considered equally ok for this sort
of thing. However creating an api module would probably be the
preferred method to return such machine readable data about a page.


I disagree, but maybe that's because it's been a long-term goal of mine
to kill action urls entirely.

-Chad
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further Development

2013-11-01 Thread Seb35

Hi,

No responses to your specific questions, but just to mention I worked some  
years ago on an extension [1] aiming at retrieving the  
as-exact-as-possible display of the page at a given past datetime, because  
the current implementation of oldid is only "past wikitext with current  
context (templates, images, etc.)".


I mainly implemented the retrieval of old versions of templates, but a lot  
of other smaller improvements could be done (MediaWiki messages,  
styles/JS, images, etc.). With this approach, some details are  
irremediably lost (e.g. number of articles at given timedate, some tricky  
delete-and-move actions, etc.) and additional informations would have to  
be recorded to retrieve more exactly the past versions.


[1] https://www.mediawiki.org/wiki/Extension:BackwardsTimeTravel

~ Seb35

Le Fri, 01 Nov 2013 20:50:06 +0100, Shawn Jones  a écrit:

Hi,

I'm currently working on the Memento Extension for Mediawiki, as  
announced earlier today by Herbert Van de Sompel.


The goal of this extension is to work with the Memento framework, which  
attempts to display web pages as they appeared at a given date and time  
in the past.


Our goal is for this to be a collaborative effort focusing on solving  
issues and providing functionality in "the Wikimedia Way" as much as  
possible.


Without further ado, I have the following technical questions (I  
apologize in advance for the fire hose):


1.  The Memento protocol has a resource called a TimeMap [1] that takes  
an article name and returns text formatted as application/link-format.   
This text contains a machine-readable list of all of the prior revisions  
(mementos) of this page.  It is currently implemented as a SpecialPage  
which can be accessed like  
http://www.example.com/index.php/Special:TimeMap/Article_Name.  Is this  
the best method, or is it more preferable for us to extend the Action  
class and add a new action to $wgActions in order to return a TimeMap  
from the regular page like  
http://www.example.com/index.php?title=Article_Name&action=gettimemap  
without using the SpecialPage?  Is there another preferred way of  
solving this problem?


2.  We currently make several database calls using the the select method  
of the Database Object.  After some research, we realized that Mediawiki  
provides some functions that do what we need without making these  
database calls directly.  One of these needs is to acquire the oldid and  
timestamp of the first revision of a page, which can be done using  
Title->getFirstRevision()->getId() and  
Title->getFirstRevision()->getTimestamp() methods.  Is there a way to  
get the latest ID and latest timestamp?  I see I can do  
Title->getLatestRevID() to get the latest revision ID; what is the best  
way to get the latest timestamp?


3.  In order to create the correct headers for use with the Memento  
protocol, we have to generate URIs.  To accomplish this, we use the  
$wgServer global variable (through a layer of abstraction); how do we  
correctly handle situations if it isn't set by the installation?  Is  
there an alternative?  Is there a better way to construct URIs?


4.  We use exceptions to indicate when showErrorPage should be run;  
should the hooks that catch these exceptions and then run showErrorPage  
also return false?


5.  Is there a way to get previous revisions of embedded content, like  
images?  I tried using the ImageBeforeProduceHTML hook, but found that  
setting the $time parameter didn't return a previous revision of an  
image.  Am I doing something wrong?  Is there a better way?


6.  Are there any additional coding standards we should be following  
besides those on the "Manual:Coding_conventions" and "Manual:Coding  
Conventions - Mediawiki" pages?


7.  We have two styles for serving pages back to the user:
   * 302-style[2], which uses a 302 redirect to tell the user's  
browser to go fetch the old revision of the page (e.g.  
http://www.example.com/index.php?title=Article&oldid=12345)
   * 200-style[3], which actually modifies the page content in place  
so that it resembles the old revision of the page

 Which of these styles is preferable as a default?

8.  Some sites don't wish to have their past Talk/Discussion pages  
accessible via Memento.  We have the ability to exclude namespaces  
(Talk, Template, Category, etc.) via configurable option.  By default it  
excludes nothing.  What namespaces should be excluded by default?


Thanks in advance for any advice, assistance, further discussion, and  
criticism on these and other topics.


Shawn M. Jones
Graduate Research Assistant
Department of Computer Science
Old Dominion University

[1] http://www.mementoweb.org/guide/rfc/ID/#Pattern6
[2] http://www.mementoweb.org/guide/rfc/ID/#Pattern1.1
[3] http://www.mementoweb.org/guide/rfc/ID/#Pattern1.2
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/

Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further Development

2013-11-01 Thread Chad
On Fri, Nov 1, 2013 at 3:43 PM, Brian Wolff  wrote:

> Hi, I responded inline.
>
> On 11/1/13, Shawn Jones  wrote:
> > Hi,
> >
> > I'm currently working on the Memento Extension for Mediawiki, as
> announced
> > earlier today by Herbert Van de Sompel.
> >
> > The goal of this extension is to work with the Memento framework, which
> > attempts to display web pages as they appeared at a given date and time
> in
> > the past.
> >
> > Our goal is for this to be a collaborative effort focusing on solving
> issues
> > and providing functionality in "the Wikimedia Way" as much as possible.
> >
> > Without further ado, I have the following technical questions (I
> apologize
> > in advance for the fire hose):
> >
> > 1.  The Memento protocol has a resource called a TimeMap [1] that takes
> an
> > article name and returns text formatted as application/link-format.  This
> > text contains a machine-readable list of all of the prior revisions
> > (mementos) of this page.  It is currently implemented as a SpecialPage
> which
> > can be accessed like
> > http://www.example.com/index.php/Special:TimeMap/Article_Name.  Is this
> the
> > best method, or is it more preferable for us to extend the Action class
> and
> > add a new action to $wgActions in order to return a TimeMap from the
> regular
> > page like
> > http://www.example.com/index.php?title=Article_Name&action=gettimemap
> > without using the SpecialPage?  Is there another preferred way of solving
> > this problem?
>
> Special Page vs Action is usually considered equally ok for this sort
> of thing. However creating an api module would probably be the
> preferred method to return such machine readable data about a page.
>
>
I disagree, but maybe that's because it's been a long-term goal of mine
to kill action urls entirely.

-Chad
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further Development

2013-11-01 Thread Brian Wolff
On 11/1/13, Marcin Cieslak  wrote:

>
> I'm not in a position to give you a full answer, but what I would
> do I would try to see if I can setup a MediaWiki with $wgInstantCommons =
> true
> and see how I can make ForeignAPIRepo to fetch older revisions
> from Wikimedia via API. Then we can have a look at other media
> storage backends, including those used by WMF installation.
>

For what its worth, currently ForeignAPIRepo is marked as a repo not
supporting "old" versions of files. However in terms of the interface,
its probably pretty straightforward to change that. All one needs to
do is implement some methods to get old versions of files. I assume
the original reason is that that is a lot of extra API requests for
something most people don't care about. (If we did this, I imagine
we'd want it disabled by default, and configurable as a Repo option,
as the average instant commons user doesn't need that, and it would
slow things down)


---bawolff

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further Development

2013-11-01 Thread Marcin Cieslak
>> Shawn Jones  wrote:

> 1.  The Memento protocol has a resource called a TimeMap [1]
> that takes an article name and returns text formatted as
> application/link-format.  This text contains a machine-readable
> list of all of the prior revisions (mementos) of this page.  It is
> currently implemented as a SpecialPage which can be accessed like
> http://www.example.com/index.php/Special:TimeMap/Article_Name.
> Is this the best method, or is it more preferable for us to
> extend the Action class and add a new action to $wgActions
> in order to return a TimeMap from the regular page like
> http://www.example.com/index.php?title=Article_Name&action=gettimemap
> without using the SpecialPage?  Is there another preferred way of
> solving this problem?

It just occured to be that if TimeMap were a microformat, this
information could be embeded in to ?title=Article_Name&action=history
itself.

Even then, if we need an additional MIME type for that maybe
we could vary action=history response based on the desired MIME
type (text/html or linking format).

> 3.  In order to create the correct headers for use with the Memento
> protocol, we have to generate URIs.  To accomplish this, we use the
> $wgServer global variable (through a layer of abstraction); how do we
> correctly handle situations if it isn't set by the installation?  Is
> there an alternative?  Is there a better way to construct URIs?

We have wfExpandUrl (yes, there are some bugs currently wrt empty $wgServer
now... https://bugzilla.wikimedia.org/show_bug.cgi?id=54950).

> 5.  Is there a way to get previous revisions of embedded content, like
> images?  I tried using the ImageBeforeProduceHTML hook, but found that
> setting the $time parameter didn't return a previous revision of an
> image.  Am I doing something wrong?  Is there a better way?

I'm not in a position to give you a full answer, but what I would
do I would try to see if I can setup a MediaWiki with $wgInstantCommons = true
and see how I can make ForeignAPIRepo to fetch older revisions
from Wikimedia via API. Then we can have a look at other media
storage backends, including those used by WMF installation.

> 7.  We have two styles for serving pages back to the user:
>* 302-style[2], which uses a 302 redirect to tell the user's browser 
> to go fetch the old revision of the page (e.g. 
> http://www.example.com/index.php?title=Article&oldid=12345)
>* 200-style[3], which actually modifies the page content in place so 
> that it resembles the old revision of the page
>  Which of these styles is preferable as a default?

I guess that 302 is better. Sounds like a much better idea due to caching
(to me).

> 8.  Some sites don't wish to have their past Talk/Discussion pages
> accessible via Memento.  We have the ability to exclude namespaces
> (Talk, Template, Category, etc.) via configurable option.  By default
> it excludes nothing.  What namespaces should be excluded by default?

There might be interesting issues about deleted content, some
people feel very strongly about making it unavailable to others
(partly due to some legal issues); some people setup wikis dedicated
to provide content deleted from Wikipedia. Are you sure history
should not be redacted at times? :-)

Not sure why somebody does not like archiving Talk pages like this
but I think this feature could be enabled per-namespace like many
others in MediaWiki. Archiving media and files will be certainly different
and you will run into interesting issues with versioning Categories
and Templates. Extension:FlaggedRevs has some method to
track what kind of ancilliary content has been modified 
(FRInclusionManager.php and FRInclusionCache.php might be things
to look at).

And a question back to you:

How are you going to handle versioning of stuff like MediaWiki:Common.js,
MediaWiki:Common.css independently of the proper content itself?
Some changes might affect presentation of the content meaningfully,
for example see how https://en.wikipedia.org/wiki/Template:Nts works.

If you don't know already, PediaPress developed generator
of static documents out of wiki content (http://code.pediapress.com/,
see Extension:Collection) and they had to deal with lots of
similar issues in their renderer, mwlib. The renderer accesses
the wiki as a client and fetches all ancillary content as needed.


//Saper


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further Development

2013-11-01 Thread Brian Wolff
Hi, I responded inline.

On 11/1/13, Shawn Jones  wrote:
> Hi,
>
> I'm currently working on the Memento Extension for Mediawiki, as announced
> earlier today by Herbert Van de Sompel.
>
> The goal of this extension is to work with the Memento framework, which
> attempts to display web pages as they appeared at a given date and time in
> the past.
>
> Our goal is for this to be a collaborative effort focusing on solving issues
> and providing functionality in "the Wikimedia Way" as much as possible.
>
> Without further ado, I have the following technical questions (I apologize
> in advance for the fire hose):
>
> 1.  The Memento protocol has a resource called a TimeMap [1] that takes an
> article name and returns text formatted as application/link-format.  This
> text contains a machine-readable list of all of the prior revisions
> (mementos) of this page.  It is currently implemented as a SpecialPage which
> can be accessed like
> http://www.example.com/index.php/Special:TimeMap/Article_Name.  Is this the
> best method, or is it more preferable for us to extend the Action class and
> add a new action to $wgActions in order to return a TimeMap from the regular
> page like
> http://www.example.com/index.php?title=Article_Name&action=gettimemap
> without using the SpecialPage?  Is there another preferred way of solving
> this problem?

Special Page vs Action is usually considered equally ok for this sort
of thing. However creating an api module would probably be the
preferred method to return such machine readable data about a page.

> 2.  We currently make several database calls using the the select method of
> the Database Object.  After some research, we realized that Mediawiki
> provides some functions that do what we need without making these database
> calls directly.  One of these needs is to acquire the oldid and timestamp of
> the first revision of a page, which can be done using
> Title->getFirstRevision()->getId() and
> Title->getFirstRevision()->getTimestamp() methods.  Is there a way to get
> the latest ID and latest timestamp?  I see I can do Title->getLatestRevID()
> to get the latest revision ID; what is the best way to get the latest
> timestamp?

Use existing wrapper functions around DB calls where you can, but if
you need to its ok to query the db directly.

For the last part, probably something along the lines of
WikiPage::factory( $titleObj )->getRevision()->getTimestamp()

> 3.  In order to create the correct headers for use with the Memento
> protocol, we have to generate URIs.  To accomplish this, we use the
> $wgServer global variable (through a layer of abstraction); how do we
> correctly handle situations if it isn't set by the installation?  Is there
> an alternative?  Is there a better way to construct URIs?

$wgServer is always filled out (Setup.php sets it if user doesn't).
However you probably shouldn't be using it directly. What the most
appropriate method to use depends on what sort of urls you want, but
generally the Title class has methods like getFullURL for this sort of
thing.


> 4.  We use exceptions to indicate when showErrorPage should be run; should
> the hooks that catch these exceptions and then run showErrorPage also return
> false?

I haven't looked at your code, so not sure about the context - but: In
general a hook returns true to denote no futher processing should take
place. Displaying an error message sounds like a good criteria to
return true. That said, things may depend on the hook and what
precisely you're doing.
>
> 5.  Is there a way to get previous revisions of embedded content, like
> images?  I tried using the ImageBeforeProduceHTML hook, but found that
> setting the $time parameter didn't return a previous revision of an image.
> Am I doing something wrong?  Is there a better way?

FlaggedRevisions manages to set old version of an image, so its
possible. I think you might want to do something with the
BeforeParserFetchFileAndTitle hook as well. For the time parameter,
make sure the function you're using has the $time parameter marked as
pass by reference. Also note: the time parameter is the timestamp that
the image version was created, it does not denote get whatever image
would be relavent at the time specified (I believe).

>
> 6.  Are there any additional coding standards we should be following besides
> those on the "Manual:Coding_conventions" and "Manual:Coding Conventions -
> Mediawiki" pages?

Those are the important ones. As a rule of thumb, try to make your
code look like it fits in with the rest of mediawiki.

>
> 7.  We have two styles for serving pages back to the user:
>* 302-style[2], which uses a 302 redirect to tell the user's browser
> to go fetch the old revision of the page (e.g.
> http://www.example.com/index.php?title=Article&oldid=12345)
>* 200-style[3], which actually modifies the page content in place so
> that it resembles the old revision of the page
>  Which of these styles is preferable as a default?

[Wikitech-l] Memento Extension for MediaWiki: Advice on Further Development

2013-11-01 Thread Shawn Jones
Hi,

I'm currently working on the Memento Extension for Mediawiki, as announced 
earlier today by Herbert Van de Sompel.

The goal of this extension is to work with the Memento framework, which 
attempts to display web pages as they appeared at a given date and time in the 
past.

Our goal is for this to be a collaborative effort focusing on solving issues 
and providing functionality in "the Wikimedia Way" as much as possible.

Without further ado, I have the following technical questions (I apologize in 
advance for the fire hose):

1.  The Memento protocol has a resource called a TimeMap [1] that takes an 
article name and returns text formatted as application/link-format.  This text 
contains a machine-readable list of all of the prior revisions (mementos) of 
this page.  It is currently implemented as a SpecialPage which can be accessed 
like http://www.example.com/index.php/Special:TimeMap/Article_Name.  Is this 
the best method, or is it more preferable for us to extend the Action class and 
add a new action to $wgActions in order to return a TimeMap from the regular 
page like http://www.example.com/index.php?title=Article_Name&action=gettimemap 
without using the SpecialPage?  Is there another preferred way of solving this 
problem?

2.  We currently make several database calls using the the select method of the 
Database Object.  After some research, we realized that Mediawiki provides some 
functions that do what we need without making these database calls directly.  
One of these needs is to acquire the oldid and timestamp of the first revision 
of a page, which can be done using Title->getFirstRevision()->getId() and 
Title->getFirstRevision()->getTimestamp() methods.  Is there a way to get the 
latest ID and latest timestamp?  I see I can do Title->getLatestRevID() to get 
the latest revision ID; what is the best way to get the latest timestamp?

3.  In order to create the correct headers for use with the Memento protocol, 
we have to generate URIs.  To accomplish this, we use the $wgServer global 
variable (through a layer of abstraction); how do we correctly handle 
situations if it isn't set by the installation?  Is there an alternative?  Is 
there a better way to construct URIs?

4.  We use exceptions to indicate when showErrorPage should be run; should the 
hooks that catch these exceptions and then run showErrorPage also return false?

5.  Is there a way to get previous revisions of embedded content, like images?  
I tried using the ImageBeforeProduceHTML hook, but found that setting the $time 
parameter didn't return a previous revision of an image.  Am I doing something 
wrong?  Is there a better way?

6.  Are there any additional coding standards we should be following besides 
those on the "Manual:Coding_conventions" and "Manual:Coding Conventions - 
Mediawiki" pages?

7.  We have two styles for serving pages back to the user:
   * 302-style[2], which uses a 302 redirect to tell the user's browser to 
go fetch the old revision of the page (e.g. 
http://www.example.com/index.php?title=Article&oldid=12345)
   * 200-style[3], which actually modifies the page content in place so 
that it resembles the old revision of the page
 Which of these styles is preferable as a default?

8.  Some sites don't wish to have their past Talk/Discussion pages accessible 
via Memento.  We have the ability to exclude namespaces (Talk, Template, 
Category, etc.) via configurable option.  By default it excludes nothing.  What 
namespaces should be excluded by default?

Thanks in advance for any advice, assistance, further discussion, and criticism 
on these and other topics.

Shawn M. Jones
Graduate Research Assistant
Department of Computer Science
Old Dominion University

[1] http://www.mementoweb.org/guide/rfc/ID/#Pattern6
[2] http://www.mementoweb.org/guide/rfc/ID/#Pattern1.1
[3] http://www.mementoweb.org/guide/rfc/ID/#Pattern1.2
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l