Re: [Wikitech-l] GSoC PROJECT PROPOSAL DRAFT

2021-04-07 Thread Jay prakash
Hi Ritwik,

Thank you for submitting the proposal. But I did not find it on
https://phabricator.wikimedia.org/. Can you give your proposal phab link?
So that I can review.

Jay Prakash (he/him)

On Thu, Apr 8, 2021 at 11:18 AM Ritwik Srivastava 
wrote:

> Dear Community members,
> I have submitted a draft proposal for the wikimedia foundation project  
> DEVELOP
> A USER/SCRIPT GADGET TUTORIAL FOR WIKIMEDIA.ORG
>
> Kindly review it and give feedback so that I can improve it.
>
> Thanks & Regards
> Ritwik Srivastava
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] GSoC PROJECT PROPOSAL DRAFT

2021-04-07 Thread Ritwik Srivastava
Dear Community members,
I have submitted a draft proposal for the wikimedia foundation project  DEVELOP
A USER/SCRIPT GADGET TUTORIAL FOR WIKIMEDIA.ORG

Kindly review it and give feedback so that I can improve it.

Thanks & Regards
Ritwik Srivastava
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] GSOC project for multiple watch lists - looking for mentors

2017-04-01 Thread Techman224
Hello,

Sorry if this is cutting it close to the deadline, but I’m interested in doing 
a GSOC project over the summer to implement multiple watchlists for users.

What I have in mind is first, deliver a minimum viable product where users can 
add/remove/change custom watchlists. If time permits, it can be integrated into 
other components, such as notifications, a way to add pages into custom watch 
lists from the page itself, the API etc.

I’m looking for potential mentors. In particular, someone who knows how to get 
around Mediawiki’s codebase and can help me guide coding and development. If 
there is any interest let me know!

Thanks,

Techman224

Task: https://phabricator.wikimedia.org/T3492



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] GSoc Project related assistance

2014-03-12 Thread Andre Klapper
Hi Rohit,

On Wed, 2014-03-12 at 19:46 +0530, Rohit Dua wrote:
> It would be great if we could get the community's valuable *suggestions/
> feedback* for the proposal/project.

Do you have specific question(s) when you ask for feedback? Specific
questions normally help me a lot to provide *helpful* feedback...

andre
-- 
Andre Klapper | Wikimedia Bugwrangler
http://blogs.gnome.org/aklapper/


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] GSoc Project related assistance

2014-03-12 Thread Rohit Dua
Hi to all

I have selected a mentor-ship
project*>*named
Google
Books > Internet Archive > Commons upload cycle.

I have proposed a rough proposal (after discussing the steps with
mentors/community).

*Link to proposal:
https://www.mediawiki.org/wiki/User:8ohit.dua/GSoC_proposal_2014
*

It would be great if we could get the community's valuable *suggestions/
feedback* for the proposal/project.

Thanks a lot.
Rohit Dua
(8ohit.dua)
Delhi, India
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] GSOC project: calculating the quality of editors and content (was Guidance for the Project Idea for GSOC 2014)

2014-03-07 Thread Gryllida
I oppose such idea or implementation, automating ranking of content sounds like 
a way to get people focus on the rank/score aggressively instead of human work 
on content. They already focus on 'number of GA reviews' and 'number of FAs I 
contributed to', relying on style and content guides for these more than on the 
concept of freedom of knowledge. I had created 
https://meta.wikimedia.org/wiki/Karma recently in an attempt to start gathering 
examples of such blindly misleading work.

If implemented, I dare to ask that the thing is opt-in...

On Fri, 7 Mar 2014, at 10:25, Quim Gil wrote:
> Hi Devender, I'm not a developer but I hope my feedback as editor is useful.
> 
> On 03/06/2014 12:02 AM, Devender wrote:
> > I want to implement a ranking system of the editors(especially 3rd party
> > editors) of the Wikipedia through which viewers can differentiate between
> > the content of the page. 
> 
> What do you mean with "3rd party editors"?
> 
> 
> > This ranking system will increase the content reliability
> 
> Content reliability is indeed an interesting value for wiki content,
> especially in projects like Wikipedia. However, basing the reliability
> of the content on the quantity of edits done by an editor is risky --to
> say the least.
> 
> Reliability is based on quantity, not quality. If you would find a way
> to assess the quality of the editions of an editor (and therefor the
> reliability of an editor)... Then maybe you could provide a hint about
> the reliability of an article based on the reliability of the editors
> that edited it.
> 
> Even in that case it might be complex to figure out when the reliable
> editors are acting to add more quality to an already good article, or to
> fix the worst issues of a horrible article. When they add and when they
> revert...
> 
> And of course it may also happen that editors not identified as reliable
> produce great content, as it often the case with editors very
> specialized in certain topic, with a short history of excellent edits.
> 
> > 2. Make the different color of the line/paragraph if the content of the
> > line/paragraph is very new and its reliability score is less.
> 
> Even if there is some probability that older paragraphs that have
> survived many edits intact are somewhat reliable, it is too easy to find
> examples disproving this point. This is true especially in the articles
> needing more a quality assessment, those that are not edited often and
> are not watched by many experienced editors.
> 
> 
> > Please let me if I should go with this idea. If not, guide me how to start
> > working on different idea.
> 
> This is just my personal opinion and I'm not an expert. Maybe someone
> else will ave a different, more positive opinion about your project, or
> advice to re-focus it.
> 
> In general, students proposing new projects have more chances of success
> if they start pitching and testing their ideas months before the GSoC.
> Add a factor of x5 at least if your main target is a Wikimedia project.
> 
> If you don't get mentors for your project very soon, then the safest
> option is to choose a project at
> https://www.mediawiki.org/wiki/Summer_of_Code_2014 and go for it.
> 
> Thank you for your interest in contributing to Wikimedia. Also thank you
> for following my suggestion to post at wikitech-l. I hope you wll get
> more feedback from other people in this list.
> 
> -- 
> Quim Gil
> Technical Contributor Coordinator @ Wikimedia Foundation
> http://www.mediawiki.org/wiki/User:Qgil
> 
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> Email had 1 attachment:
> + signature.asc
>   1k (application/pgp-signature)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] GSOC project: calculating the quality of editors and content (was Guidance for the Project Idea for GSOC 2014)

2014-03-07 Thread Tilman Bayer
See also https://meta.wikimedia.org/wiki/Research:Content_persistence

On Thu, Mar 6, 2014 at 7:32 PM, Benjamin Lees  wrote:
> Hi, Devander.  Have you looked at WikiTrust[0]?  It does roughly what you
> describe (though I don't think the live demo works anymore).
>
> [0] https://en.wikipedia.org/wiki/Wikipedia:WikiTrust
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Tilman Bayer
Senior Operations Analyst (Movement Communications)
Wikimedia Foundation
IRC (Freenode): HaeB

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] GSOC project: calculating the quality of editors and content (was Guidance for the Project Idea for GSOC 2014)

2014-03-06 Thread Benjamin Lees
Hi, Devander.  Have you looked at WikiTrust[0]?  It does roughly what you
describe (though I don't think the live demo works anymore).

[0] https://en.wikipedia.org/wiki/Wikipedia:WikiTrust
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] GSOC project: calculating the quality of editors and content (was Guidance for the Project Idea for GSOC 2014)

2014-03-06 Thread Quim Gil
Hi Devender, I'm not a developer but I hope my feedback as editor is useful.

On 03/06/2014 12:02 AM, Devender wrote:
> I want to implement a ranking system of the editors(especially 3rd party
> editors) of the Wikipedia through which viewers can differentiate between
> the content of the page. 

What do you mean with "3rd party editors"?


> This ranking system will increase the content reliability

Content reliability is indeed an interesting value for wiki content,
especially in projects like Wikipedia. However, basing the reliability
of the content on the quantity of edits done by an editor is risky --to
say the least.

Reliability is based on quantity, not quality. If you would find a way
to assess the quality of the editions of an editor (and therefor the
reliability of an editor)... Then maybe you could provide a hint about
the reliability of an article based on the reliability of the editors
that edited it.

Even in that case it might be complex to figure out when the reliable
editors are acting to add more quality to an already good article, or to
fix the worst issues of a horrible article. When they add and when they
revert...

And of course it may also happen that editors not identified as reliable
produce great content, as it often the case with editors very
specialized in certain topic, with a short history of excellent edits.

> 2. Make the different color of the line/paragraph if the content of the
> line/paragraph is very new and its reliability score is less.

Even if there is some probability that older paragraphs that have
survived many edits intact are somewhat reliable, it is too easy to find
examples disproving this point. This is true especially in the articles
needing more a quality assessment, those that are not edited often and
are not watched by many experienced editors.


> Please let me if I should go with this idea. If not, guide me how to start
> working on different idea.

This is just my personal opinion and I'm not an expert. Maybe someone
else will ave a different, more positive opinion about your project, or
advice to re-focus it.

In general, students proposing new projects have more chances of success
if they start pitching and testing their ideas months before the GSoC.
Add a factor of x5 at least if your main target is a Wikimedia project.

If you don't get mentors for your project very soon, then the safest
option is to choose a project at
https://www.mediawiki.org/wiki/Summer_of_Code_2014 and go for it.

Thank you for your interest in contributing to Wikimedia. Also thank you
for following my suggestion to post at wikitech-l. I hope you wll get
more feedback from other people in this list.

-- 
Quim Gil
Technical Contributor Coordinator @ Wikimedia Foundation
http://www.mediawiki.org/wiki/User:Qgil



signature.asc
Description: OpenPGP digital signature
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] GSoC project proposal - Section handling in Semantic Forms

2013-05-02 Thread Himeshi De Silva
Hi,

I have submitted a proposal for GSoC 2013 for the project Section handling
in Semantic Forms.

Link to proposal -
https://www.mediawiki.org/wiki/User:Himeshi/GSoC_2013_Application

Link to bug report - https://bugzilla.wikimedia.org/show_bug.cgi?id=46662

I would appreciate your comments and suggestions for improvement on this.

Thank you,
Himeshi De Silva.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] GSoC project: Extending UploadWizard to more easily upload books

2013-04-29 Thread Tianhao Wang
Hi, I'm Tianhao Wang, from China. 
I'll participate in GSoC this summer and help improve the 
UploadWizard to be more friendly to book uploaders. 
Here is my proposal: https://
www.mediawiki.org/wiki/User:Vvv214wth/UploadWizard. 
Any feedbacks or advices is welcomed!



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] GSoC Project

2013-04-29 Thread Kiran Mathew Koshy
First of all, let me thank everyone who has commented on this thread. Sorry
about not responding earlier. My exams are going on. You can certainly
expect more response from me once they are over.


On Tue, Apr 30, 2013 at 4:18 AM, Emmanuel Engelhart
wrote:

> Dear Kiran
>
> Before commenting your proposal, let me thank:
> * Quim for having renamed this thread... I wouldn't have got a chance to
> read it otherwise.
> * Gnosygnu and Sumana for their previous answers.
>
> Your emails points three problems:
> (1) The size of the offline dumps
> (2) Server mode of the offline solution
> (3) The need of incremental updates
>
> Regarding (1), I disagree. We have the ZIM format which is open, has an
> extremly efficient standard implementation, provides high compression
> rates and fast random access: http://www.openzim.org
>
> Regarding (2), Kiwix, which is a ZIM reader, already does it: you can
> either share Kiwix on a network disk or use Kiwix HTTP compatible daemon
> called kiwix-serve: http://www.kiwix.org/wiki/Kiwix-serve
>
> Regarding (3), I agree. This is an old feature request in the openZIM
> project. It's both on the roadmap and in the bug tracker:
> * http://www.openzim.org/wiki/Roadmap
> * https://bugzilla.wikimedia.org/show_bug.cgi?id=47406
>
> But, I also think the solution you propose isn't adapted to the problem.
> Setting up a Mediawiki is not easy, it's resource intensive and you
> don't need all this power (of the software setup) for the usage you want
> to do.
>


> On the other side, with ZIM you have a format which provides all what
> you need, runs on devices which costs only a few dozens of USD and we
> will make this incremental update trivial for the final user (it's just
> a matter of time ;).
>


I don't think power is much of a priority, but I agree the ZIM format would
be easier, since it directly reads from the ZIM file



>
> So to fix that problem, there is my approach: we should implement two
> tools I call "zimdiff" and "zimpatch":
> * zimdiff is a tool able to compute the difference between two ZIM files
> * zimpatch is a tool able to patch a ZIM file with a ZIM diff file
>
> The incrementation process would be:
> * Compute a ZIM diff file (done by the ZIM provider)
> * Download and path the "old" ZIM file with the ZIM diff file (done by
> the user)
>
> We could implement two modes for zimpatch, "leasy" and "normal":
> * leasy mode: simple merge of the file and rewriting of the index (fast
> but need a lot of mass storage)
> * normal mode: recompute a new file (slow but need less mass storage)
>
> Regarding the ZIM diff file format... the discussion is open, but it
> looks like we could simply reuse the ZIM format and zimpatch would work
> like a "zimmerge" (does not exist, it's just for the explanation).
>
> Everything could be done IMO in "only" a few hundreds of smart lines of
> C++. I would be really surprised if this need more than 2000 lines. But,
> to do that, we need a pretty talentuous C++ developer, maybe you?
>
>
Yes, this is a quite easy task. I can do this. I can go through the ZIM
format and the zimlib library in a few days.

Regarding the *zimpatch*, I think it would be better to implement both
methods( although I prefer the 2nd one). The user can then select the one
which he wants , depending on his configuration.
Lastly, we can add the *zimdiff as an automated task in the server*.
zimpatch and downloading the zim file can also be automated and added to
Kiwix.


If there's time left, I can port the zimlib library to python or PHP, so it
becomes easier for people to hack.

If you have any more suggestions, please comment. I'll submit the proposal
in ~ 12 hours.(again, exams).


> If your or someone else is interested we would probably be able to find
> a tutor.
>
> Kind regards
> Emmanuel
>
> PS: Wikimedia has an offline centric mailing list, let me add it in CC:
> https://lists.wikimedia.org/mailman/listinfo/offline-l
>
> Le 26/04/2013 22:27, Kiran Mathew Koshy a écrit :
> > Hi guys,
> >
> > I have an own idea  for my GSoC project that I'd like to share with you.
> > Its not a perfect one, so please forgive any mistakes.
> >
> > The project is related to the existing GSoC project "*Incremental Data
> dumps
> > *" , but is in no way a replacement for it.
> >
> >
> > *Offline Wikipedia*
> >
> > For a long time, a lot of offline solutions for Wikipedia have sprung up
> on
> > the internet. All of these have been unofficial solutions, and  have
> > limitations. A major problem is the* increasing size of  the data dumps*,
> > and the problem of *updating the local content. *
> >
> > Consider the situation in a place where internet is costly/
> > unavailable.(For the purpose of discussion, lets consider a school in a
> 3rd
> > world country.) Internet speeds are extremely slow, and accessing
> Wikipedia
> > directly from the web is out of the question.
> > Such a school would greatly benefit from an instance of Wikipedia on  a
> > local server. Now up to here, the sc

Re: [Wikitech-l] GSoC Project

2013-04-29 Thread Emmanuel Engelhart
Dear Kiran

Before commenting your proposal, let me thank:
* Quim for having renamed this thread... I wouldn't have got a chance to
read it otherwise.
* Gnosygnu and Sumana for their previous answers.

Your emails points three problems:
(1) The size of the offline dumps
(2) Server mode of the offline solution
(3) The need of incremental updates

Regarding (1), I disagree. We have the ZIM format which is open, has an
extremly efficient standard implementation, provides high compression
rates and fast random access: http://www.openzim.org

Regarding (2), Kiwix, which is a ZIM reader, already does it: you can
either share Kiwix on a network disk or use Kiwix HTTP compatible daemon
called kiwix-serve: http://www.kiwix.org/wiki/Kiwix-serve

Regarding (3), I agree. This is an old feature request in the openZIM
project. It's both on the roadmap and in the bug tracker:
* http://www.openzim.org/wiki/Roadmap
* https://bugzilla.wikimedia.org/show_bug.cgi?id=47406

But, I also think the solution you propose isn't adapted to the problem.
Setting up a Mediawiki is not easy, it's resource intensive and you
don't need all this power (of the software setup) for the usage you want
to do.

On the other side, with ZIM you have a format which provides all what
you need, runs on devices which costs only a few dozens of USD and we
will make this incremental update trivial for the final user (it's just
a matter of time ;).

So to fix that problem, there is my approach: we should implement two
tools I call "zimdiff" and "zimpatch":
* zimdiff is a tool able to compute the difference between two ZIM files
* zimpatch is a tool able to patch a ZIM file with a ZIM diff file

The incrementation process would be:
* Compute a ZIM diff file (done by the ZIM provider)
* Download and path the "old" ZIM file with the ZIM diff file (done by
the user)

We could implement two modes for zimpatch, "leasy" and "normal":
* leasy mode: simple merge of the file and rewriting of the index (fast
but need a lot of mass storage)
* normal mode: recompute a new file (slow but need less mass storage)

Regarding the ZIM diff file format... the discussion is open, but it
looks like we could simply reuse the ZIM format and zimpatch would work
like a "zimmerge" (does not exist, it's just for the explanation).

Everything could be done IMO in "only" a few hundreds of smart lines of
C++. I would be really surprised if this need more than 2000 lines. But,
to do that, we need a pretty talentuous C++ developer, maybe you?

If your or someone else is interested we would probably be able to find
a tutor.

Kind regards
Emmanuel

PS: Wikimedia has an offline centric mailing list, let me add it in CC:
https://lists.wikimedia.org/mailman/listinfo/offline-l

Le 26/04/2013 22:27, Kiran Mathew Koshy a écrit :
> Hi guys,
> 
> I have an own idea  for my GSoC project that I'd like to share with you.
> Its not a perfect one, so please forgive any mistakes.
> 
> The project is related to the existing GSoC project "*Incremental Data dumps
> *" , but is in no way a replacement for it.
> 
> 
> *Offline Wikipedia*
> 
> For a long time, a lot of offline solutions for Wikipedia have sprung up on
> the internet. All of these have been unofficial solutions, and  have
> limitations. A major problem is the* increasing size of  the data dumps*,
> and the problem of *updating the local content. *
> 
> Consider the situation in a place where internet is costly/
> unavailable.(For the purpose of discussion, lets consider a school in a 3rd
> world country.) Internet speeds are extremely slow, and accessing Wikipedia
> directly from the web is out of the question.
> Such a school would greatly benefit from an instance of Wikipedia on  a
> local server. Now up to here, the school can use any of the freely
> available offline Wikipedia solutions to make a local instance. The problem
> arises when the database in the local instance becomes obsolete. The client
> is then required to download an entire new dump(approx. 10 GB in size) and
> load it into the database.
> Another problem that arises is that most 3rd part programs *do not allow
> network access*, and a new instance of the database is required(approx. 40
> GB) on each installation.For instance, in a school with around 50 desktops,
> each desktop would require a 40 GB  database. Plus, *updating* them becomes
> even more difficult.
> 
> So here's my *idea*:
> Modify the existing MediaWiki software and to add a few PHP/Python scripts
> which will automatically update the database and will run in the
> background.(Details on how the update is done is described later).
> Initially, the MediaWiki(modified) will take an XML dump/ SQL dump (SQL
> dump preferred) as input and will create the local instance of Wikipedia.
> Later on, the updates will be added to the database automatically by the
> script.
> 
> The installation process is extremely easy, it just requires a server
> package like XAMPP and the MediaWiki bundle.
> 
> 
> Process of updatin

Re: [Wikitech-l] GSoC Project Idea

2013-04-29 Thread Platonides
On 26/04/13 22:23, Kiran Mathew Koshy wrote:
> Hi guys,
> 
> I have an own idea  for my GSoC project that I'd like to share with you.
> Its not a perfect one, so please forgive any mistakes.
> 
> The project is related to the existing GSoC project "*Incremental Data dumps
> *" , but is in no way a replacement for it.
> 
> 
> *Offline Wikipedia*
> 
> For a long time, a lot of offline solutions for Wikipedia have sprung up on
> the internet. All of these have been unofficial solutions, and  have
> limitations. A major problem is the* increasing size of  the data dumps*,
> and the problem of *updating the local content. *
> 
> Consider the situation in a place where internet is costly/
> unavailable.(For the purpose of discussion, lets consider a school in a 3rd
> world country.) Internet speeds are extremely slow, and accessing Wikipedia
> directly from the web is out of the question.
> Such a school would greatly benefit from an instance of Wikipedia on  a
> local server. Now up to here, the school can use any of the freely
> available offline Wikipedia solutions to make a local instance. The problem
> arises when the database in the local instance becomes obsolete. The client
> is then required to download an entire new dump(approx. 10 GB in size) and
> load it into the database.
> Another problem that arises is that most 3rd part programs *do not allow
> network access*, and a new instance of the database is required(approx. 40
> GB) on each installation.For instance, in a school with around 50 desktops,
> each desktop would require a 40 GB  database. Plus, *updating* them becomes
> even more difficult.

Well, some programs allow network access, and even if not, the school
should download once, and distribute from there to the desktops, not
downloading once per installation. But I agree having a copy on each
computer could be problematic.


> So here's my *idea*:
> Modify the existing MediaWiki software and to add a few PHP/Python scripts
> which will automatically update the database and will run in the
> background.(Details on how the update is done is described later).
> Initially, the MediaWiki(modified) will take an XML dump/ SQL dump (SQL
> dump preferred) as input and will create the local instance of Wikipedia.
> Later on, the updates will be added to the database automatically by the
> script.

Actually, you only need to add some scripts, not to modify mediawiki :)

> The installation process is extremely easy, it just requires a server
> package like XAMPP and the MediaWiki bundle.



> Process of updating:
> 
> There will be two methods of updating the server. Both will be implemented
> into the MediaWiki bundle. Method 2 requires the functionality of
> incremental data dumps, so it can be completed only after the functionality
> is available. Perhaps I can collaborate with the student selected for
> incremental data dumps.
> 
> Method 1: (online update) A list of all pages are made and published by
> Wikipedia. This can be in an XML format. The only information  in the XML
> file will be the page IDs and the last-touched date. This file will be
> downloaded by the MediaWiki bundle, and the page IDs will be compared with
> the pages of the existing local database.

This is available in page.sql.gz


> case 1: A new page ID in XML file: denotes a new page added.
> case 2: A page which is present in the local database is not among the page
> IDs- denotes a deleted page.
> case 3: A page in the local database has a different 'last touched'
>  compared to the one in the local database- denotes an edited page.
(here you would compare the revision id)


> In each case, the change is made in the local database and if the new page
> data is required, the data is obtained using MediaWiki API.
> These offline instances of Wikipedia will be only used in cases where the
> internet speeds are very low, so they *won't cause much load on the servers*
> .
> 
> method 2: (offline update): (Requires the functionality of the existing
> project "Incremental data dumps"):
>In this case, the incremental data dumps are downloaded by the
> user(admin) and fed to the MediaWiki installation the same way the original
> dump is fed(as a normal file), and the corresponding changes are made by
> the bundle. Since I'm not aware of the XML format used in incremental
> updates, I cannot describe it now.
> 
> Advantages : An offline solution can be provided for regions where internet
> access is a scarce resource. this would greatly benefit developing nations
> , and would help in making the world's information more free and openly
> available to everyone.
> 
> All comments are welcome !

Some work on improving the import scripts would be welcome, although I
wonder if what you propose would be big enough for GSoC.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] GSoC Project

2013-04-26 Thread Sumana Harihareswara
On 04/26/2013 04:27 PM, Kiran Mathew Koshy wrote:
> Hi guys,
> 
> I have an own idea  for my GSoC project that I'd like to share with you.
> Its not a perfect one, so please forgive any mistakes.
> 
> The project is related to the existing GSoC project "*Incremental Data dumps
> *" , but is in no way a replacement for it.
> 
> 
> *Offline Wikipedia*
> 
> For a long time, a lot of offline solutions for Wikipedia have sprung up on
> the internet. All of these have been unofficial solutions, and  have
> limitations. A major problem is the* increasing size of  the data dumps*,
> and the problem of *updating the local content. *
> 
> Consider the situation in a place where internet is costly/
> unavailable.(For the purpose of discussion, lets consider a school in a 3rd
> world country.) Internet speeds are extremely slow, and accessing Wikipedia
> directly from the web is out of the question.
> Such a school would greatly benefit from an instance of Wikipedia on  a
> local server. Now up to here, the school can use any of the freely
> available offline Wikipedia solutions to make a local instance. The problem
> arises when the database in the local instance becomes obsolete. The client
> is then required to download an entire new dump(approx. 10 GB in size) and
> load it into the database.
> Another problem that arises is that most 3rd part programs *do not allow
> network access*, and a new instance of the database is required(approx. 40
> GB) on each installation.For instance, in a school with around 50 desktops,
> each desktop would require a 40 GB  database. Plus, *updating* them becomes
> even more difficult.
> 
> So here's my *idea*:
> Modify the existing MediaWiki software and to add a few PHP/Python scripts
> which will automatically update the database and will run in the
> background.(Details on how the update is done is described later).
> Initially, the MediaWiki(modified) will take an XML dump/ SQL dump (SQL
> dump preferred) as input and will create the local instance of Wikipedia.
> Later on, the updates will be added to the database automatically by the
> script.
> 
> The installation process is extremely easy, it just requires a server
> package like XAMPP and the MediaWiki bundle.
> 
> 
> Process of updating:
> 
> There will be two methods of updating the server. Both will be implemented
> into the MediaWiki bundle. Method 2 requires the functionality of
> incremental data dumps, so it can be completed only after the functionality
> is available. Perhaps I can collaborate with the student selected for
> incremental data dumps.
> 
> Method 1: (online update) A list of all pages are made and published by
> Wikipedia. This can be in an XML format. The only information  in the XML
> file will be the page IDs and the last-touched date. This file will be
> downloaded by the MediaWiki bundle, and the page IDs will be compared with
> the pages of the existing local database.
> 
> case 1: A new page ID in XML file: denotes a new page added.
> case 2: A page which is present in the local database is not among the page
> IDs- denotes a deleted page.
> case 3: A page in the local database has a different 'last touched'
>  compared to the one in the local database- denotes an edited page.
> 
> In each case, the change is made in the local database and if the new page
> data is required, the data is obtained using MediaWiki API.
> These offline instances of Wikipedia will be only used in cases where the
> internet speeds are very low, so they *won't cause much load on the servers*
> .
> 
> method 2: (offline update): (Requires the functionality of the existing
> project "Incremental data dumps"):
>In this case, the incremental data dumps are downloaded by the
> user(admin) and fed to the MediaWiki installation the same way the original
> dump is fed(as a normal file), and the corresponding changes are made by
> the bundle. Since I'm not aware of the XML format used in incremental
> updates, I cannot describe it now.
> 
> Advantages : An offline solution can be provided for regions where internet
> access is a scarce resource. this would greatly benefit developing nations
> , and would help in making the world's information more free and openly
> available to everyone.
> 
> All comments are welcome !
> 
> PS: about me: I'm a 2nd year undergraduate student in Indian Institute of
> Technology, Patna. I code for fun.
> Languages: C/C++,Python,PHP,etc.
> hobbies: CUDA programming, robotics, etc.

Thanks for your ideas, Kiran!  So, a few comments:

* In the future, please use a more descriptive email subject line.  As
you can see in
http://lists.wikimedia.org/pipermail/wikitech-l/2013-April/ there's a
lot of mail on this list, especially mail about Google Summer of Code
proposals.  A subject line like "GSoC proposal: supplementing
incremental data dumps with indexes" or something like that helps people
decide to read it.
* You probably want to try to look at some statistics and research to
make sure you are s

Re: [Wikitech-l] GSoC Project

2013-04-26 Thread gnosygnu
I think this is a well-thought out idea. I'm just going to add a few
comments on Method 1:

* Wikimedia provides page.sql.gz dumps (EX:
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-page.sql.gz)
This table does have page_id and page_touched (the latter seems to
correlate to your "last touched")
The file is hefty at 935 MB. (This is because it has other columns, like
page_title). However, I think with 11 million+ pages, you're not probably
going to do much better than 100 MB (using 28 characters per entry, like
"(1234567,'20130407202126')," and a 30% zip ratio)

* Synchronizing latest versions will still be time-consuming.
I'd guesstimate that there are something like 50k changed articles per
month. I'm basing this number on
http://stats.wikimedia.org/EN/TablesWikipediaEN.htm which lists 800 new
articles per day. I then threw in another 800 unique page edits per day and
multiplying by 30 to get to a ballpark 50k. This correlates to a monthly
churn of 1%-2% of the entire article namespace (4.1 million) which I think
is a conservative percentage.

So, assuming this number is somewhat accurate, 50,000 API calls would not
be trivial -- especially for a user with limited internet connectivity.
This is to say nothing of Wikimedia's servers which will need to handle 50k
calls per client at that time of month. In short, I think synchronizing
that many pages would best be served by its own dump.

Also, there may be some months where this percentage is much higher. For
example, when Wikipedia switched its links over to Wikidata, I assume that
at least 50% of the pages were touched. Granted, this is not a common
occurrence, but as more bot activity rises (Wikidata properties for
infoboxes?), then this will complicate the sync accordingly.

Hope this helps and good luck with your project.



On Fri, Apr 26, 2013 at 4:27 PM, Kiran Mathew Koshy <
kiranmathewko...@gmail.com> wrote:

> Hi guys,
>
> I have an own idea  for my GSoC project that I'd like to share with you.
> Its not a perfect one, so please forgive any mistakes.
>
> The project is related to the existing GSoC project "*Incremental Data
> dumps
> *" , but is in no way a replacement for it.
>
>
> *Offline Wikipedia*
>
> For a long time, a lot of offline solutions for Wikipedia have sprung up on
> the internet. All of these have been unofficial solutions, and  have
> limitations. A major problem is the* increasing size of  the data dumps*,
> and the problem of *updating the local content. *
>
> Consider the situation in a place where internet is costly/
> unavailable.(For the purpose of discussion, lets consider a school in a 3rd
> world country.) Internet speeds are extremely slow, and accessing Wikipedia
> directly from the web is out of the question.
> Such a school would greatly benefit from an instance of Wikipedia on  a
> local server. Now up to here, the school can use any of the freely
> available offline Wikipedia solutions to make a local instance. The problem
> arises when the database in the local instance becomes obsolete. The client
> is then required to download an entire new dump(approx. 10 GB in size) and
> load it into the database.
> Another problem that arises is that most 3rd part programs *do not allow
> network access*, and a new instance of the database is required(approx. 40
> GB) on each installation.For instance, in a school with around 50 desktops,
> each desktop would require a 40 GB  database. Plus, *updating* them becomes
> even more difficult.
>
> So here's my *idea*:
> Modify the existing MediaWiki software and to add a few PHP/Python scripts
> which will automatically update the database and will run in the
> background.(Details on how the update is done is described later).
> Initially, the MediaWiki(modified) will take an XML dump/ SQL dump (SQL
> dump preferred) as input and will create the local instance of Wikipedia.
> Later on, the updates will be added to the database automatically by the
> script.
>
> The installation process is extremely easy, it just requires a server
> package like XAMPP and the MediaWiki bundle.
>
>
> Process of updating:
>
> There will be two methods of updating the server. Both will be implemented
> into the MediaWiki bundle. Method 2 requires the functionality of
> incremental data dumps, so it can be completed only after the functionality
> is available. Perhaps I can collaborate with the student selected for
> incremental data dumps.
>
> Method 1: (online update) A list of all pages are made and published by
> Wikipedia. This can be in an XML format. The only information  in the XML
> file will be the page IDs and the last-touched date. This file will be
> downloaded by the MediaWiki bundle, and the page IDs will be compared with
> the pages of the existing local database.
>
> case 1: A new page ID in XML file: denotes a new page added.
> case 2: A page which is present in the local database is not among the page
> IDs- denotes a deleted page.
> case 3: A page in the local database has 

[Wikitech-l] GSoC Project

2013-04-26 Thread Kiran Mathew Koshy
Hi guys,

I have an own idea  for my GSoC project that I'd like to share with you.
Its not a perfect one, so please forgive any mistakes.

The project is related to the existing GSoC project "*Incremental Data dumps
*" , but is in no way a replacement for it.


*Offline Wikipedia*

For a long time, a lot of offline solutions for Wikipedia have sprung up on
the internet. All of these have been unofficial solutions, and  have
limitations. A major problem is the* increasing size of  the data dumps*,
and the problem of *updating the local content. *

Consider the situation in a place where internet is costly/
unavailable.(For the purpose of discussion, lets consider a school in a 3rd
world country.) Internet speeds are extremely slow, and accessing Wikipedia
directly from the web is out of the question.
Such a school would greatly benefit from an instance of Wikipedia on  a
local server. Now up to here, the school can use any of the freely
available offline Wikipedia solutions to make a local instance. The problem
arises when the database in the local instance becomes obsolete. The client
is then required to download an entire new dump(approx. 10 GB in size) and
load it into the database.
Another problem that arises is that most 3rd part programs *do not allow
network access*, and a new instance of the database is required(approx. 40
GB) on each installation.For instance, in a school with around 50 desktops,
each desktop would require a 40 GB  database. Plus, *updating* them becomes
even more difficult.

So here's my *idea*:
Modify the existing MediaWiki software and to add a few PHP/Python scripts
which will automatically update the database and will run in the
background.(Details on how the update is done is described later).
Initially, the MediaWiki(modified) will take an XML dump/ SQL dump (SQL
dump preferred) as input and will create the local instance of Wikipedia.
Later on, the updates will be added to the database automatically by the
script.

The installation process is extremely easy, it just requires a server
package like XAMPP and the MediaWiki bundle.


Process of updating:

There will be two methods of updating the server. Both will be implemented
into the MediaWiki bundle. Method 2 requires the functionality of
incremental data dumps, so it can be completed only after the functionality
is available. Perhaps I can collaborate with the student selected for
incremental data dumps.

Method 1: (online update) A list of all pages are made and published by
Wikipedia. This can be in an XML format. The only information  in the XML
file will be the page IDs and the last-touched date. This file will be
downloaded by the MediaWiki bundle, and the page IDs will be compared with
the pages of the existing local database.

case 1: A new page ID in XML file: denotes a new page added.
case 2: A page which is present in the local database is not among the page
IDs- denotes a deleted page.
case 3: A page in the local database has a different 'last touched'
 compared to the one in the local database- denotes an edited page.

In each case, the change is made in the local database and if the new page
data is required, the data is obtained using MediaWiki API.
These offline instances of Wikipedia will be only used in cases where the
internet speeds are very low, so they *won't cause much load on the servers*
.

method 2: (offline update): (Requires the functionality of the existing
project "Incremental data dumps"):
   In this case, the incremental data dumps are downloaded by the
user(admin) and fed to the MediaWiki installation the same way the original
dump is fed(as a normal file), and the corresponding changes are made by
the bundle. Since I'm not aware of the XML format used in incremental
updates, I cannot describe it now.

Advantages : An offline solution can be provided for regions where internet
access is a scarce resource. this would greatly benefit developing nations
, and would help in making the world's information more free and openly
available to everyone.

All comments are welcome !

PS: about me: I'm a 2nd year undergraduate student in Indian Institute of
Technology, Patna. I code for fun.
Languages: C/C++,Python,PHP,etc.
hobbies: CUDA programming, robotics, etc.

-- 
Kiran Mathew Koshy
Electrical Engineering,
IIT Patna,
Patna
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] GSoC Project Idea

2013-04-26 Thread Kiran Mathew Koshy
Hi guys,

I have an own idea  for my GSoC project that I'd like to share with you.
Its not a perfect one, so please forgive any mistakes.

The project is related to the existing GSoC project "*Incremental Data dumps
*" , but is in no way a replacement for it.


*Offline Wikipedia*

For a long time, a lot of offline solutions for Wikipedia have sprung up on
the internet. All of these have been unofficial solutions, and  have
limitations. A major problem is the* increasing size of  the data dumps*,
and the problem of *updating the local content. *

Consider the situation in a place where internet is costly/
unavailable.(For the purpose of discussion, lets consider a school in a 3rd
world country.) Internet speeds are extremely slow, and accessing Wikipedia
directly from the web is out of the question.
Such a school would greatly benefit from an instance of Wikipedia on  a
local server. Now up to here, the school can use any of the freely
available offline Wikipedia solutions to make a local instance. The problem
arises when the database in the local instance becomes obsolete. The client
is then required to download an entire new dump(approx. 10 GB in size) and
load it into the database.
Another problem that arises is that most 3rd part programs *do not allow
network access*, and a new instance of the database is required(approx. 40
GB) on each installation.For instance, in a school with around 50 desktops,
each desktop would require a 40 GB  database. Plus, *updating* them becomes
even more difficult.

So here's my *idea*:
Modify the existing MediaWiki software and to add a few PHP/Python scripts
which will automatically update the database and will run in the
background.(Details on how the update is done is described later).
Initially, the MediaWiki(modified) will take an XML dump/ SQL dump (SQL
dump preferred) as input and will create the local instance of Wikipedia.
Later on, the updates will be added to the database automatically by the
script.

The installation process is extremely easy, it just requires a server
package like XAMPP and the MediaWiki bundle.


Process of updating:

There will be two methods of updating the server. Both will be implemented
into the MediaWiki bundle. Method 2 requires the functionality of
incremental data dumps, so it can be completed only after the functionality
is available. Perhaps I can collaborate with the student selected for
incremental data dumps.

Method 1: (online update) A list of all pages are made and published by
Wikipedia. This can be in an XML format. The only information  in the XML
file will be the page IDs and the last-touched date. This file will be
downloaded by the MediaWiki bundle, and the page IDs will be compared with
the pages of the existing local database.

case 1: A new page ID in XML file: denotes a new page added.
case 2: A page which is present in the local database is not among the page
IDs- denotes a deleted page.
case 3: A page in the local database has a different 'last touched'
 compared to the one in the local database- denotes an edited page.

In each case, the change is made in the local database and if the new page
data is required, the data is obtained using MediaWiki API.
These offline instances of Wikipedia will be only used in cases where the
internet speeds are very low, so they *won't cause much load on the servers*
.

method 2: (offline update): (Requires the functionality of the existing
project "Incremental data dumps"):
   In this case, the incremental data dumps are downloaded by the
user(admin) and fed to the MediaWiki installation the same way the original
dump is fed(as a normal file), and the corresponding changes are made by
the bundle. Since I'm not aware of the XML format used in incremental
updates, I cannot describe it now.

Advantages : An offline solution can be provided for regions where internet
access is a scarce resource. this would greatly benefit developing nations
, and would help in making the world's information more free and openly
available to everyone.

All comments are welcome !

PS: about me: I'm a 2nd year undergraduate student in Indian Institute of
Technology, Patna. I code for fun.
Languages: C/C++,Python,PHP,etc.
hobbies: CUDA programming, robotics, etc.

-- 
Kiran Mathew Koshy
Electrical Engineering,
IIT Patna,
Patna
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] GSoC Project Update (ConventionExtension)

2012-08-28 Thread Freako F. Freakolowsky
OK, let me try to clarify this issue a bit. I know Akshay understands 
the concept as we have talked about this for about a week before he even 
started this project but i guess he is not using the right terms for the 
message to come across.


Basically what you're asking is "Why use native MW functionality instead 
of custom DB objects to store data". You might not be asking this but if 
you would have tried to understand Akshay's concept and if you'd go 
trough the code and instead of looking for spacing issues took a moment 
to look at the logic, you'd see this is what you're asking.


And here's my anwser:

1. forward compatibility (upgradability). If your code uses very narrow 
link to core you need to maintain that link throughout the native code 
upgrades. By using native concepts of storing and accessing data in the 
first place your code gets upgraded for you when the native code 
upgrades. This extension is using a lot of of such basic concepts that 
can't change radically in a short time without breaking a lot of other 
extensions out there. If we'd use DB we'd first have to make sure our DB 
is abstracted and then maintain compatibility when DB schema and it's 
abstraction evolves. We'd also have to make sure our code is able to 
install and upgrade using mw installer principles which would mean yet 
another additional task.


2. core MW functionality. This extension is in fact a way of linking 
individual pieces of data (in our case articles about a conference, 
about events, about locations, about people) into one structure, but 
while still keeping them what they are: an article. We are applying 
properties to articles that define their purpose and position in this 
structure and for that we're using page_props table as these are just 
that ... page properties. As page_props are a in a sense volatile data, 
as they get rebuilt every time a page gets parsed so the simplest way to 
achieve concurrency of these properties over page rebuilds is to make 
sure that they get recreated by the core itself along with other 
properties and this is done by using a parser tag. There is still some 
work to be done, to enable editors to do some inline editing of these 
tags by masking them and by using alternative editor hook principles. 
But to link back to the question at hand, my counter-question would be 
"why reinvent core mw functionality by using custom DB objects for 
storing page properties?"


3. "cost/effect". The parser tag, besides containing parameters needed 
to establish the right page_props state for a given page, also has a few 
additional data chunks, that could possibly be put into custom db 
objects, however as those chunks of data are small and get used rarely 
and even then in most cases it is only when doing editing trough 
administrative dashboard, it would be prudent to ask yourself if you can 
excuse creating that much additional code for so little effect. I failed 
to see that effect being enough for the cost and that's why i suggested 
Akshay to not use custom DB objects just to store that.



This extension is still a beta and as such needs some lovin' to get it 
from this hackish state to a stable release, but by my accounts the core 
functionality of it is sane.


And as for standards ... I've mentored Akshay the same way i mentored 
some people locally and the same way i have mentored all of the interns 
i've had. I've told them to forget about standards when doing test cases 
and alpha work, remember they exist when doing beta and try to apply 
most of them when doing releases as it has proven to be a good strategy 
for getting fresh solutions and not just crappy copy-pasting of stale 
snippets to get the work done. And also, just as the code, the standards 
too need be updated once in a while (and that can't happen if you 
consider them a law at all times). You might "frown upon" certain things 
at a given moment, but in the next moment that might just become a 
logical solution for your problem ... so do take into an account it's 
"flaws", be skeptical about it's validity ... just don't dismiss it too 
soon.




And John, as i've told you (or at least tried to tell you) on IRC: 
before understanding why someone does things in a certain way it's 
impossible to say if it's good or bad (or even disgusting in your case), 
so please, try to understand before making conclusions.


LP, Jure


On 08/26/2012 08:04 PM, John Du Hart wrote:

On Wed, Aug 22, 2012 at 8:40 AM, akshay chugh  wrote:

6. Parser tags, Magic Words (Variables) and a parser function
  parser tags --> , , ,
, and


This is a disgusting way to store data.




___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC Project Update (ConventionExtension)

2012-08-27 Thread Mark Holmquist

On 12-08-27 04:10 PM, John Du Hart wrote:

Thanks the explain in-depth about why storing configuration in articles is
a good thing. Keep up the good work.


See this is also unnecessary.

Your original message might have been better stated as


Hey, I love this idea, but is there a reason you decided to use articles
instead of a database structure to store the data? Thanks in advance for
the no doubt interesting answer.


Instead, you antagonized Ashkay and didn't get an answer. And now here 
we are.


Wasn't there a thread about conduct? Where did that end up? :)

Ashkay, incidentally, I would also love to hear more about why you 
decided this, if you have a minute to answer!


Thanks all,

--
Mark Holmquist
Contractor, Wikimedia Foundation
mtrac...@member.fsf.org
http://marktraceur.info

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC Project Update (ConventionExtension)

2012-08-27 Thread John Du Hart
Thanks the explain in-depth about why storing configuration in articles is
a good thing. Keep up the good work.
On Aug 26, 2012 2:11 PM, "akshay chugh"  wrote:

> -1
>
> On Sun, Aug 26, 2012 at 11:34 PM, John Du Hart 
> wrote:
>
> > On Wed, Aug 22, 2012 at 8:40 AM, akshay chugh 
> > wrote:
> > > 6. Parser tags, Magic Words (Variables) and a parser function
> > >  parser tags --> , , ,
> > > , and
> > > 
> >
> > This is a disgusting way to store data.
> >
> > --
> > John
> >
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
>
>
>
> --
> Thanks,
> Akshay Chugh
> skype- chughakshay16
> irc - chughakshay16(#mediawiki)
> [[User:Chughakshay16]] on mediawiki.org
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC Project Update (ConventionExtension)

2012-08-26 Thread akshay chugh
Hello Peachey,

Honestly I never got into reading any Coding Conventions or style guide
while I was developing code for my extension, but I have been going through
them for the past couple of weeks and have been fixing those style issues
with my code since. Its only a matter of time before all such issues will
be addressed, I am continually working on fixing them. You can go and see
my latest patches where you will get a sense of all those issues being
taken care into account.
I have kept a todo list of all such comments which were posted in my
changesets, so before my code is finally ready to be tested or even reaches
a point of being presentable it will have all such elements that you are
trying to point out here.


On Mon, Aug 27, 2012 at 2:54 AM, K. Peachey  wrote:

> During your time in GSoC, what type of things did you mentor explain?
>
>
> Because i've had a quick peruse of your gerrit change sets and I know
> they are only minor but I do see a few things that our Coding
> Conventions cover as well as stylize (which is a script that is can be
> run)
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Thanks,
Akshay Chugh
skype- chughakshay16
irc - chughakshay16(#mediawiki)
[[User:Chughakshay16]] on mediawiki.org
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC Project Update (ConventionExtension)

2012-08-26 Thread K. Peachey
During your time in GSoC, what type of things did you mentor explain?


Because i've had a quick peruse of your gerrit change sets and I know
they are only minor but I do see a few things that our Coding
Conventions cover as well as stylize (which is a script that is can be
run)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC Project Update (ConventionExtension)

2012-08-26 Thread Brandon Harris

On Aug 26, 2012, at 11:04 AM, John Du Hart  wrote:
> 
> This is a disgusting way to store data.
> 

I don't think we need to talk to each other like this.


---
Brandon Harris, Senior Designer, Wikimedia Foundation

Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC Project Update (ConventionExtension)

2012-08-26 Thread akshay chugh
-1

On Sun, Aug 26, 2012 at 11:34 PM, John Du Hart  wrote:

> On Wed, Aug 22, 2012 at 8:40 AM, akshay chugh 
> wrote:
> > 6. Parser tags, Magic Words (Variables) and a parser function
> >  parser tags --> , , ,
> > , and
> > 
>
> This is a disgusting way to store data.
>
> --
> John
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Thanks,
Akshay Chugh
skype- chughakshay16
irc - chughakshay16(#mediawiki)
[[User:Chughakshay16]] on mediawiki.org
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC Project Update (ConventionExtension)

2012-08-26 Thread John Du Hart
On Wed, Aug 22, 2012 at 8:40 AM, akshay chugh  wrote:
> 6. Parser tags, Magic Words (Variables) and a parser function
>  parser tags --> , , ,
> , and
> 

This is a disgusting way to store data.

-- 
John

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] GSoC Project Update (ConventionExtension)

2012-08-22 Thread akshay chugh
Hello everyone,

This is one of my major updates regarding my GSoC project (named
ConventionExtension), which I have been working on for about three months
now. This project has come a long way and it has reached a point where a
lot about it can be shared with others. Since I don't post that often in
this list I would like to make this post a long one, and talk about the
status of my extension and where its headed in the coming weeks. Some of
the features which were part of my timeline for GSoC but were not completed
are put under the section "Things yet to be done" along with the other
features that I would be working on in the upcoming weeks.


*1. Completed Features*
*
*
1. Dashboard Page (more features are likely to be added depending upon the
feedback I gather from the people who have set up conferences on their wiki
in the past)
2. Author Registration Page
3. Conference Setup Page
4. Backend (DB) for storing the conference details
5. The basic architecture of the extension:

  5.a) Model classes - encapsulating the basic objects required for this
extension
  5.b) Api Module --  for interacting with ajax calls from the client
  5.c) Util classes
  5.d) Templates -- classes exending QuickTemplate class, providing a basic
layout for Dashboard and Author Register pages
  5.e) UI classes - classes extending SpecialPage class (Dashboard,
AuthorRegister and ConferenceSetup pages)
  5.f) JS + CSS resource modules

6. Parser tags, Magic Words (Variables) and a parser function
 parser tags --> , , ,
, and


  variables --> {{CONFERENCENAME}}, {{CONFERENCEVENUE}},
{{CONFERENCECITY}}, {{CONFERENCEPLACE}}, {{CONFERENCECOUNTRY}},
{{CONFERENCECAPACITY}}, {{CONFERENCEDESCRIPTION}}

  parser function --> {{#cvext-page-link}}
7. Sidebar modification (added some new portals for the conference)
8. Schedule Template System - which automates the process of creating a
schedule for the conference, as new locations and events are added to the
system.
9. Content Pages - these are the default set of pages that are created for
the conference by the extension (Note : these are just like any other wiki
pages whose content can be modified using the wiki interface)

*2. Things yet to be done !*
*
*
1. *DB rollback implementation in most of my model classes
2. *Account Setup Page (for registration of users)
3. *Modification of User pages for displaying content related to the
conference
3. Organizer management module (most part of it is already implemented in
the basic architecture, just some additions needed regarding the
permissions and rights for this group)
4. Payment Gateway
5. Support for languages other than English
6. Some more parser functions and variables which would help in editing the
content pages of the conference

* - These features were not completed during the GSoC period.

I really enjoyed my experience of working with such a vibrant community
over this summer, especially thankful to all the people who helped me out
in the IRC channel may it be regarding the setting up of labs, or helping
me out with the localisation issues, or even suggesting me come up with a
better feature than what I had already implemented. Other community fellows
who reviewed my big chunks of code,  many issues which I very easily missed
were pointed out with a proper explanation of what needs to be done, have
helped me a great deal in improving it. And finally I would like to thank
Sumana and Greg for managing this program so well, and my mentor Jure
Kajzer for his unmatched support and guidance throughout the summer.


Some important links:
Proposal Page -
http://www.mediawiki.org/wiki/User:Chughakshay16/GSOCProposal(2012)
Gerrit changesets -
https://gerrit.wikimedia.org/r/#/q/ConventionExtension,n,z
Extension Page - http://www.mediawiki.org/wiki/Extension:ConventionExtension

Suggestions are always welcome !
-- 
Thanks,
Akshay Chugh
skype- chughakshay16
irc - chughakshay16(#mediawiki)
[[User:Chughakshay16]] on mediawiki.org
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSOC Project 2012 - Convention Extension

2012-06-22 Thread akshay chugh
Details regarding the progress of this project can be found here -
http://www.mediawiki.org/wiki/User:Chughakshay16/GSOCProposal(2012)#Official_Schedule
 .


Thanks,
Akshay Chugh
(irc- chughakshay16
Skype - chughakshay16
User: Chughakshay16)


On Fri, Jun 22, 2012 at 1:17 PM, akshay chugh wrote:

> Hello everyone,
>
> This is my first update in this list, since the GSOC term started,
> regarding the project that I have been working on and would continue to
> work on for the rest of the summer. The title  of my project is Convention
> Extension i.e an extension to help convert a wiki software into a
> conference management system. Well to begin with I have had some changes in
> the road map that i had suggested for this extension before, so below
> mentioned are the components that have been completed by me in the past
> month :
>
> 1. Model ( core ) classes for handling create, edit and delete operations
> 2. Api module (edit+create+delete) for handling the ajax requests.
> 3. Admin interfaces (dashboard and setup special pages)
>
> Tasks that I would be performing in the next two weeks :
> 1. Writing js scripts for the admin interfaces
> 2. Adding ajax calls in js scripts
> 3. Testing admin interface UI components
> 4. Documentation for the code produced
>
> I haven't pushed any of my code to gerrit yet, my live repository lives
> here - [1].
> Any feedback related to the features developed by me are most welcome.
> More in depth details about this extension can be found here - [2].
>
> Links:
> [1] - https://github.com/chughakshay16/ConventionExtension/tree/model
> [2] - http://www.mediawiki.org/wiki/User:Chughakshay16/ConventionExtension
> Thanks,
> Akshay Chugh
> (irc - chughakshay16
> Skype - chughakshay16
> User: Chughakshay16)
>
>
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] GSOC Project 2012 - Convention Extension

2012-06-22 Thread akshay chugh
Hello everyone,

This is my first update in this list, since the GSOC term started,
regarding the project that I have been working on and would continue to
work on for the rest of the summer. The title  of my project is Convention
Extension i.e an extension to help convert a wiki software into a
conference management system. Well to begin with I have had some changes in
the road map that i had suggested for this extension before, so below
mentioned are the components that have been completed by me in the past
month :

1. Model ( core ) classes for handling create, edit and delete operations
2. Api module (edit+create+delete) for handling the ajax requests.
3. Admin interfaces (dashboard and setup special pages)

Tasks that I would be performing in the next two weeks :
1. Writing js scripts for the admin interfaces
2. Adding ajax calls in js scripts
3. Testing admin interface UI components
4. Documentation for the code produced

I haven't pushed any of my code to gerrit yet, my live repository lives
here - [1].
Any feedback related to the features developed by me are most welcome. More
in depth details about this extension can be found here - [2].

Links:
[1] - https://github.com/chughakshay16/ConventionExtension/tree/model
[2] - http://www.mediawiki.org/wiki/User:Chughakshay16/ConventionExtension
Thanks,
Akshay Chugh
(irc - chughakshay16
Skype - chughakshay16
User: Chughakshay16)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [GSoC] project proposal

2012-04-04 Thread Trinh Hoang Nguyen
Thank you very much Bryan. I have got an idea from the tool. 

-Original Message-
From: wikitech-l-boun...@lists.wikimedia.org
[mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Bryan Tong Minh
Sent: 04 April, 2012 9:39 AM
To: Wikimedia developers
Subject: Re: [Wikitech-l] [GSoC] project proposal

On Tue, Apr 3, 2012 at 9:51 PM, Trinh Hoang Nguyen 
wrote:
> Hi there,
>
>
>
> Could you please have a look at my project proposal for Google Summer of
> Code
>
> https://www.mediawiki.org/wiki/User:Trinhtomsk/GSoC_2012_application
>
>

It would be awesome to have a proper Flickr upload system. For some
inspiration take a look at [1], which has been used up to now quite
extensively on Commons. I hope that I soon will able to shut that
service down finally.


Bryan


[1] https://toolserver.org/~bryan/flickr/upload

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [GSoC] project proposal

2012-04-04 Thread Bryan Tong Minh
On Tue, Apr 3, 2012 at 9:51 PM, Trinh Hoang Nguyen  wrote:
> Hi there,
>
>
>
> Could you please have a look at my project proposal for Google Summer of
> Code
>
> https://www.mediawiki.org/wiki/User:Trinhtomsk/GSoC_2012_application
>
>

It would be awesome to have a proper Flickr upload system. For some
inspiration take a look at [1], which has been used up to now quite
extensively on Commons. I hope that I soon will able to shut that
service down finally.


Bryan


[1] https://toolserver.org/~bryan/flickr/upload

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [GSoC] project proposal

2012-04-03 Thread Trinh Hoang Nguyen
Hi Gregory,

Thank you for your comment. There are Flickr APIs available for Java,
JavaScript, and other languages. I have used them to get images from Flickr.
I have never been in touch with folks on toolserver or at Commons.

--
Trinh

-Original Message-
From: wikitech-l-boun...@lists.wikimedia.org
[mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Gregory Varnum
Sent: 04 April, 2012 5:35 AM
To: Wikimedia developers
Subject: Re: [Wikitech-l] [GSoC] project proposal

Trinh,

Thank you for your GSOC proposal!  Between now and Google's deadline of
April 6th (when you'll need to submit via their site as well) - you're
welcome to make more modifications to your proposal.

I actually think you did a great job of concisely explaining what you'd like
to do.  If you have any more info on the how - it would be great.  Does
Flickr have APIs and other tools for this (I assume so) or will these need
to be developed from scratch?

Also, have you been in touch with folks operating similar tools on
toolserver or at Commons?
http://commons.wikimedia.org/wiki/Commons:Flickr_files#Tools

-greg aka varnent


On Apr 3, 2012, at 3:51 PM, Trinh Hoang Nguyen  wrote:

> Hi there,
> 
> 
> 
> Could you please have a look at my project proposal for Google Summer of
> Code
> 
> https://www.mediawiki.org/wiki/User:Trinhtomsk/GSoC_2012_application
> 
> 
> 
> Any comments would be appreciated!
> 
> 
> 
> Thank you.
> 
> 
> 
> --
> 
> Best regards,
> 
> Trinh Hoang Nguyen
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [GSoC] project proposal

2012-04-03 Thread Gregory Varnum
Trinh,

Thank you for your GSOC proposal!  Between now and Google's deadline of April 
6th (when you'll need to submit via their site as well) - you're welcome to 
make more modifications to your proposal.

I actually think you did a great job of concisely explaining what you'd like to 
do.  If you have any more info on the how - it would be great.  Does Flickr 
have APIs and other tools for this (I assume so) or will these need to be 
developed from scratch?

Also, have you been in touch with folks operating similar tools on toolserver 
or at Commons?
http://commons.wikimedia.org/wiki/Commons:Flickr_files#Tools

-greg aka varnent


On Apr 3, 2012, at 3:51 PM, Trinh Hoang Nguyen  wrote:

> Hi there,
> 
> 
> 
> Could you please have a look at my project proposal for Google Summer of
> Code
> 
> https://www.mediawiki.org/wiki/User:Trinhtomsk/GSoC_2012_application
> 
> 
> 
> Any comments would be appreciated!
> 
> 
> 
> Thank you.
> 
> 
> 
> --
> 
> Best regards,
> 
> Trinh Hoang Nguyen
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] [GSoC] project proposal

2012-04-03 Thread Trinh Hoang Nguyen
Hi there,

 

Could you please have a look at my project proposal for Google Summer of
Code

https://www.mediawiki.org/wiki/User:Trinhtomsk/GSoC_2012_application

 

Any comments would be appreciated!

 

Thank you.

 

--

Best regards,

Trinh Hoang Nguyen

 

 

 

 

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project : Implement pre- or post-commit checks in code repositories

2012-03-18 Thread Akash Nawani
Hi

If somebody wants to be the mentor for this project please let me know. I'm
really interested in doing it this summer.

On Sun, Mar 18, 2012 at 3:03 AM, Akash Nawani wrote:

> Hi Everyone
>
> I am a 4th year Engineering student from India. I would love to work on
> the "Implementing pre- or post-commit checks in code repositories" project
> if somebody can mentor this project.
> It is a great idea to have this kind of hooks on the repositories. This
> project is surely going to save a lot of time of code reviewers.
> Looking forward for a mentor. Thanks.
>
> --
> Cheers,
> Akash
>
>


-- 
Cheers,
Akash
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] GSoC project : Implement pre- or post-commit checks in code repositories

2012-03-17 Thread Akash Nawani
Hi Everyone

I am a 4th year Engineering student from India. I would love to work on the
"Implementing pre- or post-commit checks in code repositories" project if
somebody can mentor this project.
It is a great idea to have this kind of hooks on the repositories. This
project is surely going to save a lot of time of code reviewers.
Looking forward for a mentor. Thanks.

-- 
Cheers,
Akash
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSOC project "Improve our Android application"

2012-03-12 Thread Tomasz Finc
On Tue, Mar 6, 2012 at 4:36 AM, Yeshow Lao  wrote:
> Thank you very much, Yuni and Marcin. Your advice is very helpful. I'm
> excited, and will consume the information and hope to set up the build
> environment on my PC and study the source code ASAP.

Yeshow, did you manage to build our app? If so come by to
#wikimedia-mobile (freenode) and we can guide you along to some bugs
that need tending to.

--tomasz

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSOC project "Improve our Android application"

2012-03-06 Thread Yeshow Lao
Thank you very much, Yuni and Marcin. Your advice is very helpful. I'm
excited, and will consume the information and hope to set up the build
environment on my PC and study the source code ASAP.

Best Wishes!

2012/3/6 Yuvi Panda 

> Hello!
>
> On Mon, Mar 5, 2012 at 6:50 PM, Yeshow Lao  wrote:
> > Hello, everybody. I'm a GSOC student from China. With some development
> > experience on Android, I would love to work on this project "Improve our
> > Android application -- integrate with SuggestBot to suggest a mobile task
> > to a user."[1.]
>
> Awesome!
>
> > Could somebody tell me some information, please?
> > 1.Where to download the current Android application? I have searched the
> > keyword "Android" on the wiki, but can not be sure which is the right
> page
> > in so many result pages.
> > 2.Where to see the source code for current Android application?
> >
> > 3.Where to see documents about this project?
>
> http://www.mediawiki.org/wiki/Mobile/PhoneGap/Tutorial has information
> on how to set it up and where to find the code. We use PhoneGap, so it
> is primarily a HTML/CSS/JS application. Give it a spin and see how you
> find it :)
>
> > 4.Who developed the Android application? Is the developer a GSOC student?
> > And whether will he still work on the project this GSOC?
>
> Come to #wikimedia-mobile on IRC to find us :)
>
> > Wish for your reply. Thanks a lot!
>
> Thank you! Hoping to see contributions from you soon :)
>
> --
> Yuvi Panda T
> http://yuvi.in/blog
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Yeshow
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSOC project "Improve our Android application"

2012-03-05 Thread Marcin Cieslak
>> Yeshow Lao  wrote:
> Hello, everybody. I'm a GSOC student from China. With some development
> experience on Android, I would love to work on this project "Improve our
> Android application -- integrate with SuggestBot to suggest a mobile task
> to a user."[1.]
>
> Could somebody tell me some information, please?

Hello,

You will find lots of information about the project from the recent 
announcement:

http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/59033

//Saper


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSOC project "Improve our Android application"

2012-03-05 Thread Yuvi Panda
Hello!

On Mon, Mar 5, 2012 at 6:50 PM, Yeshow Lao  wrote:
> Hello, everybody. I'm a GSOC student from China. With some development
> experience on Android, I would love to work on this project "Improve our
> Android application -- integrate with SuggestBot to suggest a mobile task
> to a user."[1.]

Awesome!

> Could somebody tell me some information, please?
> 1.Where to download the current Android application? I have searched the
> keyword "Android" on the wiki, but can not be sure which is the right page
> in so many result pages.
> 2.Where to see the source code for current Android application?
>
> 3.Where to see documents about this project?

http://www.mediawiki.org/wiki/Mobile/PhoneGap/Tutorial has information
on how to set it up and where to find the code. We use PhoneGap, so it
is primarily a HTML/CSS/JS application. Give it a spin and see how you
find it :)

> 4.Who developed the Android application? Is the developer a GSOC student?
> And whether will he still work on the project this GSOC?

Come to #wikimedia-mobile on IRC to find us :)

> Wish for your reply. Thanks a lot!

Thank you! Hoping to see contributions from you soon :)

-- 
Yuvi Panda T
http://yuvi.in/blog

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] GSOC project "Improve our Android application"

2012-03-05 Thread Yeshow Lao
Hello, everybody. I'm a GSOC student from China. With some development
experience on Android, I would love to work on this project "Improve our
Android application -- integrate with SuggestBot to suggest a mobile task
to a user."[1.]

Could somebody tell me some information, please?
1.Where to download the current Android application? I have searched the
keyword "Android" on the wiki, but can not be sure which is the right page
in so many result pages.

2.Where to see the source code for current Android application?

3.Where to see documents about this project?

4.Who developed the Android application? Is the developer a GSOC student?
And whether will he still work on the project this GSOC?

Wish for your reply. Thanks a lot!


Best Wishes.


-- 
Yeshow
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project - SocialProfile Extension - UserStatus feature

2011-09-03 Thread Sumana Harihareswara
On 09/03/2011 02:55 PM, Женя Власенко wrote:
> Hi, my name is Yevhenii Vlasenko, and I am MediaWiki GSoC student.
> My project was making UserStatus feature in SocialProfile Extension.
> If you are interested, you can read about the results of my work here:
> 
> https://docs.google.com/document/d/1b9kd2Lu992VyTkvCNLvoN6AeJnX1Q5pTQO_XSpAsi9g/edit?hl=en_US
> 
> --
> Thanks,
> Zhenya

For ease of linking, searching, commenting, changetracking, etc.,
Zhenya's mentor Jack Phoenix moved the document to:

http://www.mediawiki.org/wiki/Extension:SocialProfile/UserStatus

In considering how this feature might be improved in the future, Zhenya
writes:

"I think that things that can be done to improve the usability of this
feature is an implementing API from different social networks, allowing
sharing UserStatuses through them with the link on the primary source
(wiki where status was posted)."

You can see Zhenya's code at

http://www.mediawiki.org/wiki/Special:Code/MediaWiki/author/zhenya

Thanks, Zhenya!

-- 
Sumana Harihareswara
Volunteer Development Coordinator
Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] GSoC project - SocialProfile Extension - UserStatus feature

2011-09-03 Thread Женя Власенко
Hi, my name is Yevhenii Vlasenko, and I am MediaWiki GSoC student.
My project was making UserStatus feature in SocialProfile Extension.
If you are interested, you can read about the results of my work here:

https://docs.google.com/document/d/1b9kd2Lu992VyTkvCNLvoN6AeJnX1Q5pTQO_XSpAsi9g/edit?hl=en_US

--
Thanks,
Zhenya

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Gsoc Project Finished "javascript overhaul of Semantic MediaWiki "

2010-08-23 Thread Sanyam goyal
Hi
  My name is Sanyam Goyal. I have recently finished my gsoc project
"javascript overhaul of
SMW"under the guidance
of Yaron Koren. The project was successfully finished as
planned. more information can be found on the status
page
.

Basic Summary is that lot of functionality like autocomplete , float-box,
datepicker, combo-box etc  in SMW and its spin-off extensions are
re-implemented using the jQuery library , which is becoming a MW standard
now . Also few new features like autocomplete in Special:Ask , jqPlot and
jqPie in SRF , autoGrow textarea in SF,  creating Special:CreateClass more
dynamic, simpledatepicker in SFI etc. have been added as new functionality
again using jQuery only.

I can understand if the wiki-pages looks less informative for now, But I
will try to put more detailed wiki pages soon.

moreover if you have any questions about any functionality or anything in
general, I will be glad to answer.

Thanks
Hope you like the new features.


-- 
Sanyam Goyal
BTech/Alumni @ IIT Bombay
9343115798
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-04-05 Thread Ilmari Karonen
Conrad Irwin wrote:
> On 03/31/2010 12:31 PM, Victor wrote:
>> Hi, now I see...
>>
>> I've posted a message to the fa.caml newsgroup:
>> http://groups.google.com/group/fa.caml/browse_frm/thread/1593e053759d7679
>>
>> hopefully somebody will volunteer to fix the issues, thus
>> saving the human resources for better tasks.
> 
> While I'll no doubt regret saying this, I am happy to fix some of these
> bugs. With the majority of these it's harder to decide "should we fix"
> and "how should we fix" rather than actually being hard to implement.
> 
> What I don't want to do is fix things to find that it then gets
> immediately re-implemented in PHP, which seems to be what people want.

I don't think "reimplement texvc in PHP" is anyone's goal as such.  The 
real goal is "make texvc actively maintained".  Reimplementing it in PHP 
would be one way to achieve that, since we have plenty of PHP 
programmers who could then maintain it.  Finding some person or people 
who know OCaml and are willing to do the work would be another route to 
the same end.


In general, there's a form of decision paralysis common to volunteer 
projects, particularly ones relying on skilled volunteers.  Basically, 
there are two solutions to a problem, X and Y:

Person A says "I can try to do X, but I don't want to spend the time if 
we're just going to do Y instead."

Person B says "I can try to do Y, but I don't want to spend the time if 
we're just going to do X instead."

No-one else wants to commit to either X or Y, since they want to keep 
the other option open in case A or B doesn't succeed with their favored 
approach after all.

End result is that neither X nor Y actually gets done.

There are two ways out of this situation: either the project needs to 
commit to one option and make sure it gets done, or A and B need to 
accept the risk that they might end up doing redundant work. 
Ironically, the meta-decision on whether to commit to one approach or 
try both can also get suck in a similar dilemma on a higher level.


Going back to the concrete issue here, I'd personally recommend trying 
both _for now_.  In particular, rewriting texvc in Python or PHP, as a 
MediaWiki extension, would seem like good GSoC project even if it didn't 
actually end up being adopted into MW core in the end.

Meanwhile, fixing at least the simplest and most critical issues in the 
current OCaml implementation would also be of immediate value, even if 
that implementation might possibly end up being replaced at some point 
off in the future.  I wouldn't necessarily recommend going immediately 
for the more tricky issues, or the ones with lower short-term benefit 
per effort, but I'm sure there must be some low-hanging fruit ready to 
be picked by anyone who's simply familiar with the language.

By autumn, we ought to have some idea how much, if any, progress has 
been made with each approach.  At that point, we should be better able 
to decide whether to commit to one approach or the other, and if so, which.

I should also note that, as long as one implementation doesn't 
_completely_ supersede the other in every way, there would probably be 
people interested in using each of them if they were available as 
optional extensions.  In particular, I'm sure there are people who have 
access to Python but haven't managed to set up OCaml -- and I wouldn't 
be completely surprised if the opposite turned out to be also true.

Of course, that's just my opinion as a random occasional contributor. 
Take it with as much salt as you think appropriate.

-- 
Ilmari Karonen

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-31 Thread David Gerard
On 30 March 2010 15:34, Victor  wrote:
> On Tue, 30 Mar 2010 16:05:02 +0300, David Gerard  wrote:

>> Getting it off Ocaml is an excellent first step. I have tried and
>> failed to get texvc working properly in MediaWiki myself more than a
>> few times, because of Ocaml not wanting to play nice ...

> Actually I completely disagree. Since I've got some experience with both
> OCaml and PHP the idea to convert Maths processing to PHP looks
> like a not so good idea at all.
> Probably the issues you had were more like a wrong/problematic
> configuration or something like that. OCaml itself is actively developed
> and is a mature language and development environment, much better than
> PHP or Python (IMHO).


Oh, I don't doubt that at all. It just didn't work for me :-)


- d.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-31 Thread Roan Kattouw
2010/3/31 Daniel Schwen :
>> ([8, 32, 189].sort()[0]) === (new Boolean(false) ? 189 : 8)
>
> Why such a contrived example?! This all boils down to
>
> new Boolean(false) == false
> returning true
>
> new Boolean(false) === false
> returning false
>
Not quite. There's also the issue of [8, 32, 189].sort() returning
[189, 32, 8]. The real fun is that a "sane" interpretation of both
operands yields 8 === 8 , which is true, whereas the expression really
evaluates to true via 189 === 189.

But I digress. We should indeed be talking about GSoC in this thread.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-31 Thread Daniel Schwen
> ([8, 32, 189].sort()[0]) === (new Boolean(false) ? 189 : 8)

Why such a contrived example?! This all boils down to

new Boolean(false) == false
returning true

new Boolean(false) === false
returning false

The ?: operator is just not performing an implicit cast. The fact that
the Boolean object is different from the Boolean primitive type  is...
...well unfortunate. That's why it's use is as an Object is deprecated
since JavaScript 1.3
You example gives the expected outcome if you call Boolean as a function:
([8, 32, 189].sort()[0]) === (Boolean(false) ? 189 : 8)

So, can we drop this dreaded debate now?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-31 Thread Conrad Irwin

On 03/31/2010 05:32 PM, Daniel Schwen wrote:
>> I could be worse, It could be Math in Javascript:
>> v = (011 + "1" + 0.1)/3;
>> 303.36667
> 
> Somebody ought to slap you for mixing four different types in such a
> horrendous manner to construct something that is supposed to make
> people who do not understand octal numbers and string concatenation to
> think JavaScript is insane. *shakes head*

The point should stand even if there's faulty reasoning:

([8, 32, 189].sort()[0]) === (new Boolean(false) ? 189 : 8)

Conrad

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-31 Thread Daniel Schwen
> I could be worse, It could be Math in Javascript:
> v = (011 + "1" + 0.1)/3;
> 303.36667

Somebody ought to slap you for mixing four different types in such a
horrendous manner to construct something that is supposed to make
people who do not understand octal numbers and string concatenation to
think JavaScript is insane. *shakes head*

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-31 Thread Ævar Arnfjörð Bjarmason
On Wed, Mar 31, 2010 at 14:24, Tei  wrote:
> Doing Math in any programming language or digital computer is a bad
> idea. Anyway.

The texvc component doesn't "do math". It just sanitizes LaTeX and
passes it off to have a PNG generated from it.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-31 Thread Tei
On 30 March 2010 16:34, Victor  wrote:

>>
>> Getting it off Ocaml is an excellent first step. I have tried and
>> failed to get texvc working properly in MediaWiki myself more than a
>> few times, because of Ocaml not wanting to play nice ...
>
>
> Actually I completely disagree. Since I've got some experience with both
> OCaml and PHP the idea to convert Maths processing to PHP looks
> like a not so good idea at all.
>

Doing Math in any programming language or digital computer is a bad
idea. Anyway.

I could be worse, It could be Math in Javascript:


v = (011 + "1" + 0.1)/3;

303.36667



-- 
--
ℱin del ℳensaje.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-31 Thread Conrad Irwin

On 03/31/2010 12:31 PM, Victor wrote:
> 
> Hi, now I see...
> 
> I've posted a message to the fa.caml newsgroup:
> http://groups.google.com/group/fa.caml/browse_frm/thread/1593e053759d7679
> 
> hopefully somebody will volunteer to fix the issues, thus
> saving the human resources for better tasks.
> 
> With best regards,
> Victor

While I'll no doubt regret saying this, I am happy to fix some of these
bugs. With the majority of these it's harder to decide "should we fix"
and "how should we fix" rather than actually being hard to implement.

What I don't want to do is fix things to find that it then gets
immediately re-implemented in PHP, which seems to be what people want.

Some of the issues have LaTeX dependencies, particularly "support
Unicode", so fixing them could be dangerous unless we conditionally
include support for them, i.e. by running a feature-test as part of the
installation (we should probably do that anyway, so that we can provide
nicer error messages to the user if they are missing one of the other
dependencies).

Conrad

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-31 Thread Victor
On Wed, 31 Mar 2010 13:58:03 +0300, Ævar Arnfjörð Bjarmason  
 wrote:

> On Tue, Mar 30, 2010 at 14:34, Victor  wrote:
>> Actually I completely disagree. Since I've got some experience with both
>> OCaml and PHP the idea to convert Maths processing to PHP looks
>> like a not so good idea at all.
>>
>> Probably the issues you had were more like a wrong/problematic
>> configuration or something like that. OCaml itself is actively developed
>> and is a mature language and development environment, much better than
>> PHP or Python (IMHO).
>>
>> It is just interesting to wait a bit and compare the PHP and OCaml
>> implementations of texvc (if there will be anything at all to compare).
>
> c is "better", so is Common Lisp, Scheme, Haskell, Clojure or a number
> of other languages.
>
> The problem is that worse is better. OCaml isn't widely known among
> programmers or as easy for PHP programmers to get into as say Perl,
> Python or Ruby. As a result the math/ directory has been untouched
> (aside from the stray doc+bug fix) since 2003.
>
> There are many long standing core issues with the texvc component in
> Bugzilla (http://bit.ly/bsSUPM) that noone is looking at.
>
> I don't think anyone would have a problem with it remaining in OCaml
> if it was being maintained and these long-standing bugs were being
> fixed.

Hi, now I see...

I've posted a message to the fa.caml newsgroup:
http://groups.google.com/group/fa.caml/browse_frm/thread/1593e053759d7679

hopefully somebody will volunteer to fix the issues, thus
saving the human resources for better tasks.

With best regards,
Victor


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-31 Thread Nikola Smolenski
Ævar Arnfjörð Bjarmason wrote:
> On Wed, Mar 31, 2010 at 10:58, Ævar Arnfjörð Bjarmason  
> wrote:
>> c is "better"
> 
> That should have been "OCaml".

No it shouldn't! :)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-31 Thread Ævar Arnfjörð Bjarmason
On Wed, Mar 31, 2010 at 10:58, Ævar Arnfjörð Bjarmason  wrote:
> c is "better"

That should have been "OCaml".

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-31 Thread Ævar Arnfjörð Bjarmason
On Tue, Mar 30, 2010 at 14:34, Victor  wrote:
> Actually I completely disagree. Since I've got some experience with both
> OCaml and PHP the idea to convert Maths processing to PHP looks
> like a not so good idea at all.
>
> Probably the issues you had were more like a wrong/problematic
> configuration or something like that. OCaml itself is actively developed
> and is a mature language and development environment, much better than
> PHP or Python (IMHO).
>
> It is just interesting to wait a bit and compare the PHP and OCaml
> implementations of texvc (if there will be anything at all to compare).

c is "better", so is Common Lisp, Scheme, Haskell, Clojure or a number
of other languages.

The problem is that worse is better. OCaml isn't widely known among
programmers or as easy for PHP programmers to get into as say Perl,
Python or Ruby. As a result the math/ directory has been untouched
(aside from the stray doc+bug fix) since 2003.

There are many long standing core issues with the texvc component in
Bugzilla (http://bit.ly/bsSUPM) that noone is looking at.

I don't think anyone would have a problem with it remaining in OCaml
if it was being maintained and these long-standing bugs were being
fixed.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-30 Thread Victor
On Tue, 30 Mar 2010 16:05:02 +0300, David Gerard  wrote:

> On 23 March 2010 19:17, Rob Lanphier  wrote:
>
>> As I'm sure you've already gathered from the other responses, this is
>> exactly the right place.  I'm a little skeptical myself that porting  
>> that
>> particular piece of code from OCaml to Python is going to be a really  
>> big
>> win for us (because it's still a "foreign" language as far as PHP-based
>> MediaWiki is concerned, so integration is still a little clunky and
>> performance may take a hit due to yet another interpreter needing to  
>> load),
>> but I'll let others weigh in on whether I'm making too big a deal about
>> that.
>
>
> Getting it off Ocaml is an excellent first step. I have tried and
> failed to get texvc working properly in MediaWiki myself more than a
> few times, because of Ocaml not wanting to play nice ...


Actually I completely disagree. Since I've got some experience with both
OCaml and PHP the idea to convert Maths processing to PHP looks
like a not so good idea at all.

Probably the issues you had were more like a wrong/problematic
configuration or something like that. OCaml itself is actively developed
and is a mature language and development environment, much better than
PHP or Python (IMHO).

It is just interesting to wait a bit and compare the PHP and OCaml
implementations of texvc (if there will be anything at all to compare).

Wish you good luck,
Victor


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-30 Thread Platonides
Bryan Tong Minh wrote:
> On Tue, Mar 30, 2010 at 2:13 PM, Platonides wrote:
>> Changing to python will also break for people that compiled math, update
>> without reading the release notes and don't have python.
>>
> While this is ofcourse possible, how big is the chance that somebody
> will have ocaml but not python?

My point is, we shouldn't strive so much for backwards compatibility.
It's possible, but extremely unlikely.


Dmitry wrote:
> Fedora Linux has ocaml for ages. yum install ocaml or something like 
> that. Compiling texvc is fast and easy - never had any problems. Since 
> ocaml was developed in France, chances are bigger that it has wider 
> spread over there.
> Dmitriy

Note that people installing mediawiki from packages will be using
something like mediawiki-math package, and upgrade would be transparent
for them.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-30 Thread Dmitriy Sintsov
* Bryan Tong Minh  [Tue, 30 Mar 2010 17:22:09 
+0200]:
> While this is ofcourse possible, how big is the chance that somebody
> will have ocaml but not python?
>
Fedora Linux has ocaml for ages. yum install ocaml or something like 
that. Compiling texvc is fast and easy - never had any problems. Since 
ocaml was developed in France, chances are bigger that it has wider 
spread over there.
Dmitriy

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-30 Thread Bryan Tong Minh
On Tue, Mar 30, 2010 at 2:13 PM, Platonides  wrote:
> Changing to python will also break for people that compiled math, update
> without reading the release notes and don't have python.
>
While this is ofcourse possible, how big is the chance that somebody
will have ocaml but not python?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-30 Thread David Gerard
On 29 March 2010 01:12, Aryeh Gregor  wrote:

>> I have never built a wiki
>> where texvc has been needed, wanted, or even thought harmless.

> Granted, this is not as widely used as some other optional features.
> There are certainly many wikis that do use it, though -- it's not like
> no one will be affected.


I'd *like* to use it, but it's such an arse I've never yet got it
working ... perhaps it's just me.


- d.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-30 Thread David Gerard
On 23 March 2010 23:19, K. Peachey  wrote:
> On Wed, Mar 24, 2010 at 9:16 AM, Trevor Parscal  
> wrote:

>> I think we should really consider LOLCODE for this sort of thing.
>> http://en.wikipedia.org/wiki/Lolcode
>> It's just more fun!

> Also rewrite parser functions to use it? that would be interesting on
> en.wiki since they are always complaining about the syntax.


I can think of little more appropriate.


- d.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-30 Thread David Gerard
On 23 March 2010 19:17, Rob Lanphier  wrote:

> As I'm sure you've already gathered from the other responses, this is
> exactly the right place.  I'm a little skeptical myself that porting that
> particular piece of code from OCaml to Python is going to be a really big
> win for us (because it's still a "foreign" language as far as PHP-based
> MediaWiki is concerned, so integration is still a little clunky and
> performance may take a hit due to yet another interpreter needing to load),
> but I'll let others weigh in on whether I'm making too big a deal about
> that.


Getting it off Ocaml is an excellent first step. I have tried and
failed to get texvc working properly in MediaWiki myself more than a
few times, because of Ocaml not wanting to play nice ...


- d.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-30 Thread Platonides
Damon Wang wrote:
> Option (2) is the most maintainable and feasible option, and it's
> precisely the one that cannot be done in PHP. As far as I know, PHP has
> no parser-generator package. (Please, please let me know if that's
> incorrect so I can stop embarrassing myself and get on with writing a
> GSoC proposal.)

A quick search shows http://pear.php.net/package/PHP_ParserGenerator and
http://code.google.com/p/antlrphpruntime/
Maybe they are useless, but it's worth evaluating.
I suppose you could keep open the language to use in the GSoC proposal.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-30 Thread Platonides
Aryeh Gregor wrote:
> On Mon, Mar 29, 2010 at 12:46 PM, Chad wrote:
>> What if it was written as an extension and moved to /extensions?
>> Then we get the benefit of decoupling Math from the core software,
> 
> What benefit is this?  It's not realistically decoupled from the core
> software unless it avoids using MediaWiki functions and classes.  We
> have some code in core that deliberately does this, like
> IEContentAnalyzer.php and all of includes/normal/.  Making something
> an extension per se doesn't change how tightly it's coupled with
> MediaWiki -- it's an orthogonal issue.
> 
> The only reasons I see to have extensions in trunk at all, instead of
> having everything in core, is that 1) it helps keep us honest about
> ensuring there are extension points for third parties to use, and 2)
> otherwise the tarball would be huge.  Neither point argues for
> breaking things already in core out to extensions.

 was implemented directly in the parser a really long time ago.
That's the reason it has always been in core, despite it not being
available for 99% users.
Only recently, I freed the 'math' from the parser (r57997), so it can be
used by another tag hook to provide the same functionality (I gave up on
math and wanted to use [[Extension:Mimetex_alternative]]) and then Tim
moved it out creating CoreTagHooks (r61913).
It logically is an extension, much more than eg. ParserFunctions. I bet
there are more users of ParserFunctions than of math.
Changing to python will also break for people that compiled math, update
without reading the release notes and don't have python. We can as well
move it to a separate extension instead of embedding it, although that
wouldn't be much an issue. The only way to ensure it will work for
everyone would be to provide a PHP implementation, in which case
thousands of installs will get a toolbar button suddenly working.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-29 Thread Aryeh Gregor
On Mon, Mar 29, 2010 at 12:46 PM, Chad  wrote:
> What if it was written as an extension and moved to /extensions?
> Then we get the benefit of decoupling Math from the core software,

What benefit is this?  It's not realistically decoupled from the core
software unless it avoids using MediaWiki functions and classes.  We
have some code in core that deliberately does this, like
IEContentAnalyzer.php and all of includes/normal/.  Making something
an extension per se doesn't change how tightly it's coupled with
MediaWiki -- it's an orthogonal issue.

The only reasons I see to have extensions in trunk at all, instead of
having everything in core, is that 1) it helps keep us honest about
ensuring there are extension points for third parties to use, and 2)
otherwise the tarball would be huge.  Neither point argues for
breaking things already in core out to extensions.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-29 Thread Chad
On Mon, Mar 29, 2010 at 12:12 PM, Aryeh Gregor
 wrote:
> On Sun, Mar 28, 2010 at 11:45 PM, Damon Wang  wrote:
>> Can we make update.php ask the user if he wants to install the new extension?
>
> That would be hacky and unreliable.  We'd have to make sure the
> versions match, automatically alter LocalSettings.php (!), hope that
> the wiki files are writable to the web server (they probably aren't),
> hope that it's not on a firewalled intranet, . . . also, update.php
> doesn't require user interaction, and changing that would break
> everything.
>
>> Is there any place we could get usage statistics for the math feature?
>
> No, we don't have this kind of tracking in place.  People would
> probably object if we did.
>
>> I think the advantages for new installations justify inconveniencing
>> some existing users, especially if we can automate installation of the
>> new extension, but this discussion would be better with some data.
>
> I'm not so much worried about math specifically as about what would
> happen if we started systematically moving relatively-unused things
> from core to extensions.  Few people use math, but a whole lot of
> people probably use at least one little-used feature that could be
> moved to an extension.  We generally haven't moved things from core to
> extensions AFAIK -- if we started doing it, it could cumulatively have
> repercussions on ease of upgrade, for no real benefit that I see.
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

What if it was written as an extension and moved to /extensions?
Then we get the benefit of decoupling Math from the core software,
but we don't require users to download a new extension to keep
their existing functionality. As long as it's clearly indicated in the
RELEASE-NOTES and relevant Manual pages that you might need
to update the path to Math, I don't see a huge drawback.

-Chad

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-29 Thread Aryeh Gregor
On Sun, Mar 28, 2010 at 11:45 PM, Damon Wang  wrote:
> Can we make update.php ask the user if he wants to install the new extension?

That would be hacky and unreliable.  We'd have to make sure the
versions match, automatically alter LocalSettings.php (!), hope that
the wiki files are writable to the web server (they probably aren't),
hope that it's not on a firewalled intranet, . . . also, update.php
doesn't require user interaction, and changing that would break
everything.

> Is there any place we could get usage statistics for the math feature?

No, we don't have this kind of tracking in place.  People would
probably object if we did.

> I think the advantages for new installations justify inconveniencing
> some existing users, especially if we can automate installation of the
> new extension, but this discussion would be better with some data.

I'm not so much worried about math specifically as about what would
happen if we started systematically moving relatively-unused things
from core to extensions.  Few people use math, but a whole lot of
people probably use at least one little-used feature that could be
moved to an extension.  We generally haven't moved things from core to
extensions AFAIK -- if we started doing it, it could cumulatively have
repercussions on ease of upgrade, for no real benefit that I see.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-28 Thread Damon Wang
> If you make sure to run update.php, it's very rare for your wiki to
> break, unless you've hacked things or not updated your extensions or
> such.  We're usually pretty careful to avoid significant regressions
> when upgrading wikis that are using supported/sane configurations.

Can we make update.php ask the user if he wants to install the new extension?

> It's a regression for people who already have math working.  What's
> the advantage?  We have an awful lot of marginal features in core.
> When have we ever split a feature that we'd released in core into an
> extension, when a significant number of people were using it?  I don't
> see the point.  It's not like we're going to significantly reduce the
> size of the tarball or anything.

Is there any place we could get usage statistics for the math feature?
I think the advantages for new installations justify inconveniencing
some existing users, especially if we can automate installation of the
new extension, but this discussion would be better with some data.

Yours,
Damon Wang

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-28 Thread Aryeh Gregor
On Sun, Mar 28, 2010 at 7:10 PM, Happy-melon  wrote:
> As opposed to their wiki breaking when they upgrade for all the other
> reasons that we document in the release notes?

If you make sure to run update.php, it's very rare for your wiki to
break, unless you've hacked things or not updated your extensions or
such.  We're usually pretty careful to avoid significant regressions
when upgrading wikis that are using supported/sane configurations.

> I have never built a wiki
> where texvc has been needed, wanted, or even thought harmless.

Granted, this is not as widely used as some other optional features.
There are certainly many wikis that do use it, though -- it's not like
no one will be affected.

> Currently MW
> users have to compile and configure a binary from a language 99.99% of them
> cannot understand, and enable the functionality using config variables.
> Asking them instead to download and install an extension like every other
> non-ubiquitous feature in MediaWiki is far from being a regression.

It's a regression for people who already have math working.  What's
the advantage?  We have an awful lot of marginal features in core.
When have we ever split a feature that we'd released in core into an
extension, when a significant number of people were using it?  I don't
see the point.  It's not like we're going to significantly reduce the
size of the tarball or anything.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-28 Thread Happy-melon

"Aryeh Gregor"  wrote in message 
news:7c2a12e21003281059i551c4650p8a8e51e100b62...@mail.gmail.com...
> On Fri, Mar 26, 2010 at 10:48 PM, Damon Wang  
> wrote:
>> (You also as a Mediawiki extension rather than a core feature; I'm going
>> to do that, but I won't say anything more because it seems fairly
>> uncontroversial.)
>
> I actually disagree with this pretty strongly.  It would be a
> regression in functionality for existing users -- if they upgrade,
> their wiki breaks unless they install a new extension.  There's no
> reason to remove it from core that I see that outweighs this
> disadvantage.

As opposed to their wiki breaking when they upgrade for all the other 
reasons that we document in the release notes?  I have never built a wiki 
where texvc has been needed, wanted, or even thought harmless.  Currently MW 
users have to compile and configure a binary from a language 99.99% of them 
cannot understand, and enable the functionality using config variables. 
Asking them instead to download and install an extension like every other 
non-ubiquitous feature in MediaWiki is far from being a regression.

--HM
 



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-28 Thread Neil Harris
On 28/03/10 18:59, Aryeh Gregor wrote:
> On Fri, Mar 26, 2010 at 10:48 PM, Damon Wang  wrote:
>
>> (You also as a Mediawiki extension rather than a core feature; I'm going
>> to do that, but I won't say anything more because it seems fairly
>> uncontroversial.)
>>  
> I actually disagree with this pretty strongly.  It would be a
> regression in functionality for existing users -- if they upgrade,
> their wiki breaks unless they install a new extension.  There's no
> reason to remove it from core that I see that outweighs this
> disadvantage.
>
>
>> Since the subset of TeX you need parsed has a context-free grammar, it
>> needs an LALR parser, not just a bunch of regexes. I know three ways to
>> get an LALR parser:
>>
>> (1) write a pushdown automaton manually (i.e., be yacc)
>> (2) write input for a parser-generator
>> (3) write a parser-generator, and give it input
>>
>> Option (2) is the most maintainable and feasible option, and it's
>> precisely the one that cannot be done in PHP. As far as I know, PHP has
>> no parser-generator package. (Please, please let me know if that's
>> incorrect so I can stop embarrassing myself and get on with writing a
>> GSoC proposal.)
>>
>> I could probably do (1), or some hackish kludge at half of it, by
>> throwing custom control structures into a bucketload of regexes, but I
>> don't think that's in the project's best interests. As has been pointed
>> out, the OCaml implementation is really concise and elegant. A large
>> fraction of that concision and elegance comes from not actually being a
>> parser but rather only a context-free grammar written in a BNF-like
>> syntax common to most parser-generators.
>>  
> Okay, well, maybe you're right.  I'd be interested to hear Tim
> Starling's opinion on this (using parser generators vs. writing by
> hand).  Writing it in Python would certainly be a big step forward
> from OCaml -- any site with LaTeX accessible to MediaWiki will almost
> certainly have Python available, so Python vs. PHP should make no
> difference to end-users.  And Python is probably the second-best-known
> language among MediaWiki hackers.
>

Have you had a look at pyparsing, which is a ready-made 
all-singing-all-dancing Python parser package with a large amount of 
syntactic sugar built in to allow the more-or-less direct input of 
grammar notations?

Given that the texvc source already has a grammar encoded into it in 
machine-executable form, it might be an idea to consider mechanically 
extract that grammar from the texvc OCaml source, and then reformatting 
it into a grammar in pyparsing's natural format.

-- Neil


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-28 Thread Aryeh Gregor
On Fri, Mar 26, 2010 at 10:48 PM, Damon Wang  wrote:
> (You also as a Mediawiki extension rather than a core feature; I'm going
> to do that, but I won't say anything more because it seems fairly
> uncontroversial.)

I actually disagree with this pretty strongly.  It would be a
regression in functionality for existing users -- if they upgrade,
their wiki breaks unless they install a new extension.  There's no
reason to remove it from core that I see that outweighs this
disadvantage.

> Since the subset of TeX you need parsed has a context-free grammar, it
> needs an LALR parser, not just a bunch of regexes. I know three ways to
> get an LALR parser:
>
>    (1) write a pushdown automaton manually (i.e., be yacc)
>    (2) write input for a parser-generator
>    (3) write a parser-generator, and give it input
>
> Option (2) is the most maintainable and feasible option, and it's
> precisely the one that cannot be done in PHP. As far as I know, PHP has
> no parser-generator package. (Please, please let me know if that's
> incorrect so I can stop embarrassing myself and get on with writing a
> GSoC proposal.)
>
> I could probably do (1), or some hackish kludge at half of it, by
> throwing custom control structures into a bucketload of regexes, but I
> don't think that's in the project's best interests. As has been pointed
> out, the OCaml implementation is really concise and elegant. A large
> fraction of that concision and elegance comes from not actually being a
> parser but rather only a context-free grammar written in a BNF-like
> syntax common to most parser-generators.

Okay, well, maybe you're right.  I'd be interested to hear Tim
Starling's opinion on this (using parser generators vs. writing by
hand).  Writing it in Python would certainly be a big step forward
from OCaml -- any site with LaTeX accessible to MediaWiki will almost
certainly have Python available, so Python vs. PHP should make no
difference to end-users.  And Python is probably the second-best-known
language among MediaWiki hackers.

> I think it'd be easier to find a programmer who has worked with a
> parser-generator and can learn a little bit of OCaml, than it would be
> to find a PHP programmer who has to read himself into a manually
> implemented parser. After all, how many PHP programmers do you know who
> have experience mucking around inside an LALR parser?

The parsing part is unlikely to need much maintenance.  There are
other things currently in OCaml that make more sense to modify from
time to time -- like the whitelist of commands, and (some of?) the
code for non-image output formats.  So for instance, MathML output is
theoretically supported, but I don't know how good the support is.
That might become more important in the future, since Firefox is
likely to support inline MathML in text/html not too long from now.
This sort of thing would be harder if it were Python rather than PHP.

I don't think it would be a big deal if it were rewritten entirely in
Python, though.  It would be a big step forward in any case, and if
it's easier for you, great.  So personally I'd be okay with it,
although it's perhaps not ideal.

> Also, would anyone be interested in mentoring this project?

I probably wouldn't be of any help for this particular project, since
I don't know anything about parsers, and my Python and TeX are
passable but not great.  We could probably come up with a mentor,
though.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-26 Thread Rob Lanphier
On Fri, Mar 26, 2010 at 7:48 PM, Damon Wang  wrote:

> > There's a few Python-based things that might be interesting, but I
> > think you'll get a lot more love for doing something in PHP or C.
> >  Since this is a student internship, you shouldn't be bashful about
> > using this as a learning opportunity.
> >
> > I'd only caution against convincing yourself (and us) that you'll be
> > more interested in learning something like PHP than you truly are.  It
> > might help you land a spot, but it will work against you in having a
> > successful project, and
>
> > this has such high visibility that you'll really want to be
> > successful.
>
> What visibility does this have? I thought it was some abandoned corner
> of the wiki that nobody has touched in the seven years since it was
> first written.  What happens if I make a hash of this?
>


Hi Damon,

Oopsthat was a little ambiguous and probably applies a little more
pressure than intended.  What I meant to say is that Google Summer of Code
generally is pretty high visibility, not this project in particular.
 Projects often go back and review results from previous years (just like we
did: http://www.mediawiki.org/wiki/Summer_of_Code_Past_Projects ).  There's
plenty of ways to have a noble failure that won't reflect poorly on you, but
that's probably not what you should aim for.  There's nothing particularly
high profile about this particular project relative to other GSoC stuff.

Anyway, in response to the specifics about Python/texvc.  I was looking
around for some ideas about how to approach replacing texvc with a Python
implementation, and stumbled into this:
http://www.mediawiki.org/wiki/Texvc_PHP_Alternative

That implementation seems to punt on the whole parsing thing, and as near as
I can tell from a cursory reading, just passes it all through to latex, so
that probably won't do.  However, there may be something I'm missing.

Interestingly enough, though, looking at the Talk page for that leads you
here:
http://sourceforge.net/projects/latex2mathml/

This *does* have a parser.
 As you might expect, the code looks pretty involved, and seems to be
handling parsing 101 without the benefit of anything other than the trusty
substr and strpos functions.  There's enough code there doing enough
character-by-character manipulation that it makes me fear for the
performance.  Still, it looks like there's some serious work that's actually
done, so it bears some level of investigation.

Anyway, I hear what you're saying about Python's much better parsing support
(it wasn't too long ago I was gushing about the simpleparse module on my
blog[1]).  Given the number of other external dependencies that would
probably still remain even with a PHP implementation, it's probably not
worth sweating the additional Python dependency in the grand scheme of
things.  Python seems like a much less daunting dependency than OCaml, but I
know far too little about OCaml to actually assert that with any confidence.

Regardless of which path you choose, I'd be happy to be your mentor assuming
we have enough slots for this project.

Rob
[1]  http://blog.robla.net/2010/simpleparse/
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-26 Thread Damon Wang
> There's a few Python-based things that might be interesting, but I
> think you'll get a lot more love for doing something in PHP or C.
>  Since this is a student internship, you shouldn't be bashful about
> using this as a learning opportunity.
>
> I'd only caution against convincing yourself (and us) that you'll be
> more interested in learning something like PHP than you truly are.  It
> might help you land a spot, but it will work against you in having a
> successful project, and

> this has such high visibility that you'll really want to be
> successful.

What visibility does this have? I thought it was some abandoned corner
of the wiki that nobody has touched in the seven years since it was
first written.  What happens if I make a hash of this?

> So, if you find yourself thinking about doing this in PHP and having
> your inner voice say "meh", then I'd recommend sticking to your guns
> and propose doing this or something else in Python and/or C.

Well, now my inner voice says, "I really don't want to make a hash of
this texvc port!", so let me explain why I want to do it in Python
rather than PHP.  I agree that the performance will probably be just
fine, and that it would be a great coup for maintainability and
installation and usage. The problem is, I don't think PHP has a
parser-generator package.

So let me make sure I understand the problem here. You already have a
texvc implementation that has worked just fine for the last seven years.
TeX is pretty stable at this point, so chances are good you'd make it
another seven years without problems. But you're still dissatisfied
because OCaml is a hard language to find programmers for, and the
existing implementation isn't really maintained. You want it ported to a
different language that has more programmers available.

(You also as a Mediawiki extension rather than a core feature; I'm going
to do that, but I won't say anything more because it seems fairly
uncontroversial.)

Since the subset of TeX you need parsed has a context-free grammar, it
needs an LALR parser, not just a bunch of regexes. I know three ways to
get an LALR parser:

(1) write a pushdown automaton manually (i.e., be yacc)
(2) write input for a parser-generator
(3) write a parser-generator, and give it input

Option (2) is the most maintainable and feasible option, and it's
precisely the one that cannot be done in PHP. As far as I know, PHP has
no parser-generator package. (Please, please let me know if that's
incorrect so I can stop embarrassing myself and get on with writing a
GSoC proposal.)

I could probably do (1), or some hackish kludge at half of it, by
throwing custom control structures into a bucketload of regexes, but I
don't think that's in the project's best interests. As has been pointed
out, the OCaml implementation is really concise and elegant. A large
fraction of that concision and elegance comes from not actually being a
parser but rather only a context-free grammar written in a BNF-like
syntax common to most parser-generators.

I think it'd be easier to find a programmer who has worked with a
parser-generator and can learn a little bit of OCaml, than it would be
to find a PHP programmer who has to read himself into a manually
implemented parser. After all, how many PHP programmers do you know who
have experience mucking around inside an LALR parser?

So that's why, while I'm happy to take it on in PHP as a learning
experience for myself, I think it'd be better for Mediawiki to port
texvc to Python. That gets us the larger pool of potential maintainers
that comes with using a commonly known language, without sacrificing the
amazing advantage of only needing to maintain a grammar rather than the
parser itself.

And as far as dependencies are concerned, Python is still a much easier
dependency to satisfy, both for programmers working with the code and
for sysadmins installing it.

What do you guys think?

Also, would anyone be interested in mentoring this project?

Yours,
Damon Wang

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-24 Thread Platonides
Aryeh Gregor wrote:
> As long as the worst that could happen on a large majority of
> installations is DoS, I don't think we should be afraid to rewrite the
> code just because *maybe* it would be less secure.  We should
> obviously check over the new code carefully, but I wouldn't say it's
> any more security-critical than random pieces of MediaWiki -- which
> are typically vulnerable to XSS if someone forgets to escape
> something.

Getting shell access is not a DoS or XSS. Specially for a large majority
of installs where it compromises their only account.
Does this mean that we shouldn't rewrite it? No. We should rewrite it,
and make it more secure. We start it by having enough eyes on the code.
I wouldn't be surprised if we found a vulnerability on texvc during the
rewrite.

Running the LaTeX interpreter under ulimit -u 1 should be provide a
quite safe default against external launches. But take into account that
file writes are also a dangerous vector.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-24 Thread Aryeh Gregor
On Wed, Mar 24, 2010 at 10:43 AM, Conrad Irwin
 wrote:
> Yes, \openout, \write, \closeout, \openin, \read, \closein. The infamous
> one is \write18, 18 is a special file descriptor that just executes
> shell commands, you can also use \openin={|}.
>
> People have noticed this problem, so some distributions disable \write18
> (and opening with |), and also configure it such that files can only be
> read and written within the current directory or subdirectories. This
> is, to my knowledge, not by-passable.

As long as the worst that could happen on a large majority of
installations is DoS, I don't think we should be afraid to rewrite the
code just because *maybe* it would be less secure.  We should
obviously check over the new code carefully, but I wouldn't say it's
any more security-critical than random pieces of MediaWiki -- which
are typically vulnerable to XSS if someone forgets to escape
something.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-24 Thread Daniel Schwen
> Even if it were not possible to break out of restricted write18, there
> will exist installations with write18 enabled, and I can't imagine
> people remembering to check. Depending on the flavour of LaTeX in use it

If people won't remember, surely either the MediaWiki installer or the
extension itself can be made to check this and simply refuse to enable
 tags.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-24 Thread Conrad Irwin

On 03/24/2010 02:00 PM, Aryeh Gregor wrote:
> On Tue, Mar 23, 2010 at 6:28 PM, Conrad Irwin
>  wrote:
>> Many LaTeX installations can be made read/write/execute anything by
>> default.
> 
> What does that mean?  LaTeX can invoke external programs?  Using what
> commands?  Is this functionality actually enabled in practice in stock
> LaTeX installs?

Yes, \openout, \write, \closeout, \openin, \read, \closein. The infamous
one is \write18, 18 is a special file descriptor that just executes
shell commands, you can also use \openin={|}.

People have noticed this problem, so some distributions disable \write18
(and opening with |), and also configure it such that files can only be
read and written within the current directory or subdirectories. This
is, to my knowledge, not by-passable.

There's also a (more recent) mode called "restricted write18", used by
more user-friendly distributions, this allows some commands (such as
bibtex or tex) to be used with write18 and |. Sadly, as arguments passed
to the commands are not validated (though they are shell escaped), it is
possible to break out of the sandbox.

Even if it were not possible to break out of restricted write18, there
will exist installations with write18 enabled, and I can't imagine
people remembering to check. Depending on the flavour of LaTeX in use it
is often possible to pass -no-shell-escape or --disable-write18 on the
command line, we should probably do that, unless unknown flags cause
errors, I'm not sure.

Conrad

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-24 Thread Aryeh Gregor
On Tue, Mar 23, 2010 at 6:28 PM, Conrad Irwin
 wrote:
> Many LaTeX installations can be made read/write/execute anything by
> default.

What does that mean?  LaTeX can invoke external programs?  Using what
commands?  Is this functionality actually enabled in practice in stock
LaTeX installs?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-24 Thread Bryan Tong Minh
On Tue, Mar 23, 2010 at 9:06 AM, Damon Wang  wrote:
> Hello everyone,
>
> I'm interested in porting texvc to Python, and I was hoping this list
> here might help me hash out the plan. Please let me know if I should
> take my questions elsewhere.
>
If I understand correctly, you want to write a 
script that validates latex, and calls the latex compiler.
Why can't the validator be written as an integral extension for
MediaWiki itself and have MediaWiki call the latex compiler. Is there
a particular reason to have the validation done by an external
program?


Bryan

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-24 Thread Tei
Is my impression that this is a problem where a PHP implementation
could be better. Who cares if is slow? the result can be cache
forever?, is something you will run only once, and the heavyweight
work (draw) will be made by C compiled code like the GD library?.

you need speed in stuff that run inside loops (runs N times), or on
stuff that delay other stuff (can't be made async), or on stuff that
is CPU intensive and runs every time (the computer get angry), or
stuff that is very IO intensive (mechanical stuff is slow), or stuff
that nees gigawats of memory (the memory gets angry if you touch lots
of pages).

stuff that is paralelizable,  async, memory light, and not IO
intensive don't need any optimization at all. write the better code
and have a big smiley :-)

-- 
--
ℱin del ℳensaje.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-23 Thread Conrad Irwin

On 03/23/2010 10:44 PM, Tim Starling wrote:
> Just because a language is context-sensitive doesn't mean it will be
> hard to write a parser for it. That's just a myth propagated by
> computer scientists who, strangely enough given their profession, have
> a disdain for the algorithm as a descriptive framework.

Context free grammars have a strong advantage because they come with
documentation built in. Given a BNF-esque description of a language, it
is possible to understand, at a high level, how the language works. This
means it's easy to write a parser (though much easier to get the parser
written automatically), it's also more pleasant to verify properties of
the language (no state to keep track of), and to reason about the
consequences of modifications. Of course it is possible to give decent
documentation for a context-sensitive language; from what I've seen,
this just doesn't happen.

Take for example Python and perl, or Markdown and MediaWiki. In both
cases the former has a syntax that can be mainly modeled by a context
free grammar and there are many implementations that all work. The
latter of the pair has a context sensitive grammar defined only by the
reference implementation, there are no other parsers with feature
completeness. This is certainly not a technical limitation, rather a
reflection on the ability of your average human programmer.

Conrad

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-23 Thread Platonides
Happy-melon wrote:
> I took it to mean that he wanted to split the math parsing out as a 
> **MediaWiki** extension, implementing  as a parser tag hook in the 
> usual way.  Which is definitely highly desirable.
> 
> --HM 

Making it a MediaWiki extension is of course desirable (moving texvc out
of core is a pending issue, at least now  can be used by extensions).

but Damon wrote:
> Another possibility be writing it in C to avoid all interpreter
> overhead, and using a foreign function interface. Unfortunately, I'm not
> familiar with PHP's FFI. Google takes me to
> http://wiki.php.net/rfc/php_native_interface
> which seems to think that as of a year ago there weren't any good ones,
> but this doesn't look too painful:
> http://theserverpages.com/php/manual/en/zend.creating.php

That's about PHP extensions (which are written in C).
So, instead of going that path, he should make a C program which does
what texvc does. It can then be moved into a PHP extension if really
needed, but starting with Zend extensions would be an unneeded pain for
this project.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-23 Thread Happy-melon

"Platonides"  wrote in message 
news:hobfpi$4u...@dough.gmane.org...
> You seem to be thinking about creating a PHP extension. I don't think
> you should go that route. A binary is good enough, we don't need it to
> be in a PHP extension. That glue could be added later if needed, but
> would increase the complexity to write and debug.

I took it to mean that he wanted to split the math parsing out as a 
**MediaWiki** extension, implementing  as a parser tag hook in the 
usual way.  Which is definitely highly desirable.

--HM 



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-23 Thread Rob Lanphier
On Tue, Mar 23, 2010 at 2:00 PM, Damon Wang  wrote:

> I've been writing projects
> for university and for a computer lab I work at, but it's mostly small,
> one-off sysadmin things and usually the emphasis is more on "xyz server
> has to be back up before we open tomorrow" than writing good, clean code.
> So, yes, I'd welcome other suggestions.
>


Cool!  So, I'm assuming you're looking forward to an opportunity to write
good, clean code as a summer project.  :)


There are ways to make [Python-based extensions]  run faster if performance
> is a concern. For
> example, mod_python or mod_wscgi, or explicitly pulling the Python out
> into a standalone daemon that listens for requests from the webserver.
>


Personally, I'd avoid trying to make that pitch for a GSoC project.  While
you're right that Python is a pretty defensible choice when embarking on a
large project, trading one dependency for another for this size/scale of
project won't be as compelling as eliminating a dependency altogether.

Of course, as I say that, I see Platonides disagrees with me here.  Choosing
Python is not a huge disadvantage in this context, but it's not going to
have the same unanimous(-ish) approval of using PHP.



> Another possibility be writing it in C to avoid all interpreter
> overhead, and using a foreign function interface. Unfortunately, I'm not
> familiar with PHP's FFI. Google takes me to
>http://wiki.php.net/rfc/php_native_interface
> which seems to think that as of a year ago there weren't any good ones,
> but this doesn't look too painful:
>http://theserverpages.com/php/manual/en/zend.creating.php
>
>
I think straight PHP would be fine for this particular project.  The
downside of a C implementation is that, while its almost certainly going to
have the best performance characteristics, it also makes it more likely to
fall into disrepair and be a possible source of buffer overruns and other
security issues.

The nice thing about a PHP port (if done correctly) is that it would be a
trivial install for small wikis and Wikipedia alike.  That translates into
more usage, which in turn translates into higher likelihood that it stays
maintained.

That said, there have got to be a ton of projects that could benefit from
PHP->native C bindings.  I'm going to leave it to some other folks to
suggest projects in this area.


> I'm most familiar with Python and C, for whatever that's worth coming
> from an undergrad who didn't know Python existed five years ago. I
> learned PHP to maintain the web interfaces of an in-house print system
> at work, but I haven't used it for anything as involved as what we're
> discussing here. So, in terms of productivity, yes, if I have to work in
> PHP my mentor will probably get asked a few more newbie questions.
>
> In terms of happiness, though, it'd be a great opportunity to dig into
> PHP and finally learn to use it as more than really smart CSS with a
> database connection. Although I prefer Python or even C because I think
> I'd be more useful, I wouldn't be very upset at all if it turned out you
> guys were willing to let me learn PHP on your time.
>


There's a few Python-based things that might be interesting, but I think
you'll get a lot more love for doing something in PHP or C.  Since this is a
student internship, you shouldn't be bashful about using this as a learning
opportunity.

I'd only caution against convincing yourself (and us) that you'll be more
interested in learning something like PHP than you truly are.  It might help
you land a spot, but it will work against you in having a successful
project, and this has such high visibility that you'll really want to be
successful.  So, if you find yourself thinking about doing this in PHP and
having your inner voice say "meh", then I'd recommend sticking to your guns
and propose doing this or something else in Python and/or C.



> > 2.  Are you zeroing in on  parsing and parsing in general because
> > that's an area that you're already developing expertise in and/or are
> deeply
> > interested in getting into, or is that just something that looked kinda
> > interesting to learn about relative to other opportunities you
> considered?
>
> I like the  parsing project because it seems well-suited for a
> third-year undergrad who knows LaTeX and reads a few other functional
> languages and has studied lex/yacc before in his coursework. The goals
> are clear, and I know how to break them down into smaller problems and
> how to tackle each one. It's a little isolated from the rest of
> Mediawiki, so I don't need to grok the entire code base.
>
> Basically, this looks like a way to make a concrete contribution despite
> being a newcomer to the project. That doesn't mean I'm not happy to
> entertain alternatives, just that they have a pretty high bar to clear.
>

This is a really smart way of thinking about this, so that's great that
you're thinking the right way about the project scope.  I agree with you
that finding something

Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-23 Thread K. Peachey
On Wed, Mar 24, 2010 at 9:16 AM, Trevor Parscal  wrote:
> I think we should really consider LOLCODE for this sort of thing.
>
> http://en.wikipedia.org/wiki/Lolcode
>
> It's just more fun!
>
> - Trevor
Also rewrite parser functions to use it? that would be interesting on
en.wiki since they are always complaining about the syntax.



jks

-Peacvhey

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-23 Thread Trevor Parscal
I think we should really consider LOLCODE for this sort of thing.

http://en.wikipedia.org/wiki/Lolcode

It's just more fun!

- Trevor

On 3/23/10 3:44 PM, Tim Starling wrote:
> Conrad Irwin wrote:
>
>> On 03/23/2010 05:23 PM, Aryeh Gregor wrote:
>>  
>>> On Tue, Mar 23, 2010 at 1:00 PM, Roan Kattouw  
>>> wrote:
>>>
 DFAs parse regular languages, which means those languages can also be
 expressed as regexes. In fact, the regexes accepted by the preg_*()
 functions allow certain extensions to the language theory definition
 of regular expressions, allowing them to describe certain non-regular
 languages as well. In short: preg_split() can do everything a DFA can
 do, and more. The only reason to use a DFA parser would be
 performance, but since the preg_*() functions are so heavily optimized
 I don't think that'll be an issue.
  
>>> This much I know, but is LaTeX actually a regular language?
>>>
>> It's not even context free, luckily the subset we are interested in is
>> (as clearly shown by the texvc parser :p).
>>  
> Just because a language is context-sensitive doesn't mean it will be
> hard to write a parser for it. That's just a myth propagated by
> computer scientists who, strangely enough given their profession, have
> a disdain for the algorithm as a descriptive framework.
>
> In the last few decades, pure mathematicians have been exploring the
> power of algorithms as a general description of an axiomatic system.
> And simultaneously, computer scientists have embraced the idea that
> the best way to process text is by trying to shoehorn all computer
> languages into some Chomsky-inspired representation, regardless of how
> awkward that representation is, or how inefficient the resulting
> algorithm becomes, when compared to an algorithm constructed a priori.
>
> -- Tim Starling
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-23 Thread Tim Starling
Conrad Irwin wrote:
> On 03/23/2010 05:23 PM, Aryeh Gregor wrote:
>> On Tue, Mar 23, 2010 at 1:00 PM, Roan Kattouw  wrote:
>>> DFAs parse regular languages, which means those languages can also be
>>> expressed as regexes. In fact, the regexes accepted by the preg_*()
>>> functions allow certain extensions to the language theory definition
>>> of regular expressions, allowing them to describe certain non-regular
>>> languages as well. In short: preg_split() can do everything a DFA can
>>> do, and more. The only reason to use a DFA parser would be
>>> performance, but since the preg_*() functions are so heavily optimized
>>> I don't think that'll be an issue.
>> This much I know, but is LaTeX actually a regular language?
> 
> It's not even context free, luckily the subset we are interested in is
> (as clearly shown by the texvc parser :p).

Just because a language is context-sensitive doesn't mean it will be
hard to write a parser for it. That's just a myth propagated by
computer scientists who, strangely enough given their profession, have
a disdain for the algorithm as a descriptive framework.

In the last few decades, pure mathematicians have been exploring the
power of algorithms as a general description of an axiomatic system.
And simultaneously, computer scientists have embraced the idea that
the best way to process text is by trying to shoehorn all computer
languages into some Chomsky-inspired representation, regardless of how
awkward that representation is, or how inefficient the resulting
algorithm becomes, when compared to an algorithm constructed a priori.

-- Tim Starling


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-23 Thread Platonides
Python is a nice language. PHP (portability) or C/C++ (speed) would be
better but Python is preferable to OCaml.

You mention ANTLR, something like that could be a good because it should
allow to generate the same parser in a different language with not so
much effort (probably you won't have enough time in gsoc for that, but a
design taking that option into account would be interesting).

So you could do (please don't take this as a requisites list):
*Figure out wth is doing the current texvc.
*Document it heavily.
*Design how to create the next textvc.
*Any parser you make for it.
*Actual implementation.

You seem to be thinking about creating a PHP extension. I don't think
you should go that route. A binary is good enough, we don't need it to
be in a PHP extension. That glue could be added later if needed, but
would increase the complexity to write and debug.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-23 Thread Conrad Irwin

On 03/23/2010 05:23 PM, Aryeh Gregor wrote:
> On Tue, Mar 23, 2010 at 1:00 PM, Roan Kattouw  wrote:
>> DFAs parse regular languages, which means those languages can also be
>> expressed as regexes. In fact, the regexes accepted by the preg_*()
>> functions allow certain extensions to the language theory definition
>> of regular expressions, allowing them to describe certain non-regular
>> languages as well. In short: preg_split() can do everything a DFA can
>> do, and more. The only reason to use a DFA parser would be
>> performance, but since the preg_*() functions are so heavily optimized
>> I don't think that'll be an issue.
> 
> This much I know, but is LaTeX actually a regular language?

It's not even context free, luckily the subset we are interested in is
(as clearly shown by the texvc parser :p).

> 
> On Tue, Mar 23, 2010 at 1:13 PM, Conrad Irwin
>  wrote:
>> And here was me thinking that maintenance didn't happen because making
>> changes to security critical sections of the code is dangerous :).
> 
> It's not security-critical.  The worst you could possibly do is DoS,
> and any DoS could be instantly shut off by just turning off math
> briefly.  Furthermore, the part that makes DoS impossible is a quite
> small portion of the code that would need to change effectively never.
>  No, the problem is that most PHP programmers have never even heard of
> OCaml, let alone used it.

Many LaTeX installations can be made read/write/execute anything by
default. LaTeX also allows you to redefine the meaning of characters in
the input, if you accidentally let a single command through, then all
the whitelisting becomes pointless. It certainly is a security issue.

Conrad

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-23 Thread Damon Wang
Hello Rob,

> Just to be really clear, I'm not looking for a "right" answer on any of
> those questions.  It's not necessary for you to be even interested in
> getting deeply involved in the Wikipedia user community to have a really
> successful project.  The purpose of this line of questions is to figure out
> if we should continue helping you refine your current idea, or suggest some
> other direction that's a bigger payoff and/or easier sell.

I understand, and that'd be very helpful. To be honest, I'm not
passionately committed to any project at all. I've been writing projects
for university and for a computer lab I work at, but it's mostly small,
one-off sysadmin things and usually the emphasis is more on "xyz server
has to be back up before we open tomorrow" than writing good, clean code.
So, yes, I'd welcome other suggestions.

> As I'm sure you've already gathered from the other responses, this is
> exactly the right place.  I'm a little skeptical myself that porting that
> particular piece of code from OCaml to Python is going to be a really big
> win for us (because it's still a "foreign" language as far as PHP-based
> MediaWiki is concerned, so integration is still a little clunky and
> performance may take a hit due to yet another interpreter needing to load),
> but I'll let others weigh in on whether I'm making too big a deal about
> that.

There are ways to make this run faster if performance is a concern. For
example, mod_python or mod_wscgi, or explicitly pulling the Python out
into a standalone daemon that listens for requests from the webserver.

Another possibility be writing it in C to avoid all interpreter
overhead, and using a foreign function interface. Unfortunately, I'm not
familiar with PHP's FFI. Google takes me to
http://wiki.php.net/rfc/php_native_interface
which seems to think that as of a year ago there weren't any good ones,
but this doesn't look too painful:
http://theserverpages.com/php/manual/en/zend.creating.php

> Stepping back from the specifics of your proposal (which I think the others
> on this list have responded to pretty well), I'd like to find out more about
> what general sorts of projects interest you the most, which may help us
> figure out if we should keep going in this direction.  Some questions:
> 1.  Are you most interested in having a Python-based project, or would you
> be *equally* happy and productive programming something in PHP?

I'm most familiar with Python and C, for whatever that's worth coming
from an undergrad who didn't know Python existed five years ago. I
learned PHP to maintain the web interfaces of an in-house print system
at work, but I haven't used it for anything as involved as what we're
discussing here. So, in terms of productivity, yes, if I have to work in
PHP my mentor will probably get asked a few more newbie questions.

In terms of happiness, though, it'd be a great opportunity to dig into
PHP and finally learn to use it as more than really smart CSS with a
database connection. Although I prefer Python or even C because I think
I'd be more useful, I wouldn't be very upset at all if it turned out you
guys were willing to let me learn PHP on your time.

> 2.  Are you zeroing in on  parsing and parsing in general because
> that's an area that you're already developing expertise in and/or are deeply
> interested in getting into, or is that just something that looked kinda
> interesting to learn about relative to other opportunities you considered?

I like the  parsing project because it seems well-suited for a
third-year undergrad who knows LaTeX and reads a few other functional
languages and has studied lex/yacc before in his coursework. The goals
are clear, and I know how to break them down into smaller problems and
how to tackle each one. It's a little isolated from the rest of
Mediawiki, so I don't need to grok the entire code base.

Basically, this looks like a way to make a concrete contribution despite
being a newcomer to the project. That doesn't mean I'm not happy to
entertain alternatives, just that they have a pretty high bar to clear.

> 3.  Are you coming at this as someone who is already deep into
> Wikipedia/MediaWiki usage who is looking to resolve particular things (like
>  parsing) that are painful as an end user, or are you more casually
> involved and more interested in applying in this project because it looks
> like we've got a lot of interesting programming problems to solve?

The second. I just want to tackle a problem that's near but not quite
beyond my limits, and if I can help out a site I use daily, so much the
better.

Yours,
Damon Wang

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-23 Thread Rob Lanphier
Hi Damon,

Thank you so much for floating your GSoC ideas early here on the mailing
list!  Putting out concrete examples we can weigh in on is really helpful,
and engaging in this way is a fantastic way of demonstrating how you'll be
able to engage with us if we select your project.


On Tue, Mar 23, 2010 at 1:06 AM, Damon Wang  wrote:

> I'm interested in porting texvc to Python, and I was hoping this list
> here might help me hash out the plan.



As I'm sure you've already gathered from the other responses, this is
exactly the right place.  I'm a little skeptical myself that porting that
particular piece of code from OCaml to Python is going to be a really big
win for us (because it's still a "foreign" language as far as PHP-based
MediaWiki is concerned, so integration is still a little clunky and
performance may take a hit due to yet another interpreter needing to load),
but I'll let others weigh in on whether I'm making too big a deal about
that.

Stepping back from the specifics of your proposal (which I think the others
on this list have responded to pretty well), I'd like to find out more about
what general sorts of projects interest you the most, which may help us
figure out if we should keep going in this direction.  Some questions:
1.  Are you most interested in having a Python-based project, or would you
be *equally* happy and productive programming something in PHP?
2.  Are you zeroing in on  parsing and parsing in general because
that's an area that you're already developing expertise in and/or are deeply
interested in getting into, or is that just something that looked kinda
interesting to learn about relative to other opportunities you considered?
3.  Are you coming at this as someone who is already deep into
Wikipedia/MediaWiki usage who is looking to resolve particular things (like
 parsing) that are painful as an end user, or are you more casually
involved and more interested in applying in this project because it looks
like we've got a lot of interesting programming problems to solve?

Just to be really clear, I'm not looking for a "right" answer on any of
those questions.  It's not necessary for you to be even interested in
getting deeply involved in the Wikipedia user community to have a really
successful project.  The purpose of this line of questions is to figure out
if we should continue helping you refine your current idea, or suggest some
other direction that's a bigger payoff and/or easier sell.

Rob
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-23 Thread Damon Wang
2010/3/23 Roan Kattouw :
> 2010/3/23 Aryeh Gregor :
>> This much I know, but is LaTeX actually a regular language?
>>
> I don't know; I was just making the point that writing a DFA parser in
> PHP is probably not very useful.

Sorry, I got confused and wrote DFA when I should have written LALR.
DFAs cannot parse even the allowed subset of AMS-LaTeX, because there
are some permitted environments.

Without claiming to know much formal language theory, a rule of thumb is
that languages with matched delimiters were never regular, because of
the pumping lemma:
http://en.wikipedia.org/wiki/Pumping_lemma_for_regular_languages

So, for example, it's theoretically impossible to check that parentheses
nested correctly using regular expressions, and similarly it'd be
impossible to check that the \begin and \end commands matched up.

In practice there might be ways to hack around that by using multiple
regular expressions and manually tracking how they nest, but at that
point we're basically writing half of a bad LALR parser.

Fortunately, though, Python has parser generators! And if we're really
concerned about speed, there's PyBison, which does the parsing in C and
apparently produces (at least) five-fold improvements over Python-native
alternatives.

Yours,
Damon Wang

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-23 Thread Roan Kattouw
2010/3/23 Aryeh Gregor :
> This much I know, but is LaTeX actually a regular language?
>
I don't know; I was just making the point that writing a DFA parser in
PHP is probably not very useful.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-23 Thread Aryeh Gregor
On Tue, Mar 23, 2010 at 1:00 PM, Roan Kattouw  wrote:
> DFAs parse regular languages, which means those languages can also be
> expressed as regexes. In fact, the regexes accepted by the preg_*()
> functions allow certain extensions to the language theory definition
> of regular expressions, allowing them to describe certain non-regular
> languages as well. In short: preg_split() can do everything a DFA can
> do, and more. The only reason to use a DFA parser would be
> performance, but since the preg_*() functions are so heavily optimized
> I don't think that'll be an issue.

This much I know, but is LaTeX actually a regular language?

On Tue, Mar 23, 2010 at 1:13 PM, Conrad Irwin
 wrote:
> And here was me thinking that maintenance didn't happen because making
> changes to security critical sections of the code is dangerous :).

It's not security-critical.  The worst you could possibly do is DoS,
and any DoS could be instantly shut off by just turning off math
briefly.  Furthermore, the part that makes DoS impossible is a quite
small portion of the code that would need to change effectively never.
 No, the problem is that most PHP programmers have never even heard of
OCaml, let alone used it.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-23 Thread Conrad Irwin

On 03/23/2010 05:00 PM, Roan Kattouw wrote:

>>> I suggested a Python port because
>>>http://www.mediawiki.org/wiki/Summer_of_Code_2010#MediaWiki_core
>>> lists it as a potential project idea. I was under the impression that
>>> people around here did not want to leave texvc in OCaml. Is this wrong?
>>
>> No, it's right.  Conrad is crazy.  :P
>>
> Having it in a language no one understands is a bad thing and leads to
> maintenance not happening, so yeah, we definitely want it rewritten in
> PHP. If the PHP implementation turns out to be too slow to run on WMF,
> for instance, we could do a C++ port à la wikidiff2 (a C++ port of our
> ludicrously slow PHP diff implementation).
> 

And here was me thinking that maintenance didn't happen because making
changes to security critical sections of the code is dangerous :). The
current implementation is just over a thousand lines of exceedingly
concise code, while I agree that a re-implementation in PHP is probably
sensible, I'll stubbornly maintain that the existing OCaml is more
suited to the task. (Oh, and it seems I misread that proposal; I could
not imagine a language other than LaTeX being useful for doing maths :p).

While re-implementing the syntax whitelister would not be too hard,
LaTeX, with it's wonderfully re-definable syntax is incredibly
dangerous. Have fun, and be careful!

Conrad

http://tug.ctan.org/cgi-bin/ctanPackageInformation.py?id=xii

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-23 Thread Roan Kattouw
2010/3/23 Aryeh Gregor :
>> I've never used PHP for real programming, but how difficult would it be
>> to write a really simple, stupid first pass at a DFA parser? I suspect
>> I'd need much more than three months to make it useful, but would it be
>> possible to implement some coherent subset of the features? E.g.,
>> building the LR0 automaton, at least?
>
> I don't think you'd need a "real" parser here.  Mostly we just use
> preg_split() for this sort of thing.  I'm not familiar with formal
> grammars and such, so I can't say what the concrete disadvantages of
> that approach are.
>
DFAs parse regular languages, which means those languages can also be
expressed as regexes. In fact, the regexes accepted by the preg_*()
functions allow certain extensions to the language theory definition
of regular expressions, allowing them to describe certain non-regular
languages as well. In short: preg_split() can do everything a DFA can
do, and more. The only reason to use a DFA parser would be
performance, but since the preg_*() functions are so heavily optimized
I don't think that'll be an issue.

>> I suggested a Python port because
>>    http://www.mediawiki.org/wiki/Summer_of_Code_2010#MediaWiki_core
>> lists it as a potential project idea. I was under the impression that
>> people around here did not want to leave texvc in OCaml. Is this wrong?
>
> No, it's right.  Conrad is crazy.  :P
>
Having it in a language no one understands is a bad thing and leads to
maintenance not happening, so yeah, we definitely want it rewritten in
PHP. If the PHP implementation turns out to be too slow to run on WMF,
for instance, we could do a C++ port à la wikidiff2 (a C++ port of our
ludicrously slow PHP diff implementation).

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] GSoC project advice: port texvc to Python?

2010-03-23 Thread Aryeh Gregor
On Tue, Mar 23, 2010 at 4:06 AM, Damon Wang  wrote:
> I'm interested in porting texvc to Python, and I was hoping this list
> here might help me hash out the plan. Please let me know if I should
> take my questions elsewhere.

Python is much better than OCaml, and I prefer Python to PHP, but a
PHP implementation would be preferable for core IMO.  Not all
MediaWiki developers know Python, but all obviously know PHP.  If you
did a Python implementation, though, then at least someone could
translate it to PHP pretty easily.

> 1. Collect test cases and write a testing script
> Thanks to avar from #wikimedia, I already have the ... bits
> from enwiki and dewiki. I would also construct some simpler ones by hand
> to test each of the acceptable LaTeX commands.
>
> Would there be any possibility of logging the input seen by texvc on a
> production instance of Mediawiki, so I could get some invalid input
> submitted by actual users?
>
> This could also be useful to future maintainers for regression testing.

If you have a Unix box handy, it's pretty easy to install MediaWiki
with math support so you can test yourself.  sudo apt-get install
mediawiki mediawiki-math should do it on anything Debian-based, for
example.

> 2. Implement an AMS-TeX validator
> I'll probably use PLY because it's rumored to have helpful debugging
> features (designed for a first-year compilers class, apparently). ANTLR
> is another popular option, but this guy
>    http://www.bearcave.com/software/antlr/antlr_expr.html
> thinks it's complicated and hard to debug. I've never used either, so if
> anyone on this list knows of a good Python parsing package I'd welcome
> suggestions.

If it's in PHP, you'd probably have to write a parser yourself, but
LaTeX is pretty easy to parse, I'd think.

> 4. Add HTML rendering to texvc and test script
> I don't even understand how the existing texvc decides whether HTML is
> good enough. It looks like the original programmer just decreed that
> certain LaTeX commands could be rendered to HTML, and defaults to PNG if
> it sees anything not on that list. How important is this feature?

Fairly important, IMO, if the goal is to replace texvc, although not
critical.  x shouldn't render x as a PNG -- that's silly.

> Python doesn't have parsing just locked right down the way C does with
> flex/bison, but there are some good options, I have the most experience
> with it, and I think I'd be able to complete the port faster in Python
> than in either of the other languages. I was tempted at first to port to
> PHP, to conform with the rest of Mediawiki, but there don't seem to be
> any good parsing packages for PHP. (Please tell me if that's wrong.)

Would it really be very hard to write a LaTeX parser in PHP?  I'd
think it could be done easily, if you permit only a carefully-selected
subset.  I don't think you'd need any parser theory, just use
preg_split() and loop through all the tokens.

> I'd appreciate any advice or criticism. Since my only previous
> experience has been using Wikipedia and setting up a test Mediawiki
> instance for my ACM chapter, I'm only just now learning my way around
> the code base and it's not always evident why things were done as they
> are. Does this look like a reasonable and worthwhile project?

Rewriting texvc in PHP would be a nice project to have, which is small
enough in scope that I'm optimistic that it could be done in a summer.
 I'd say it's a good choice.

On Tue, Mar 23, 2010 at 6:23 AM, Conrad Irwin
 wrote:
> I am not too fussed about the HTML output, though I can't speak for
> everyone, at the moment it seems that many more of the Unicode
> characters should be let through (at least at some level of HTML),
> though I don't know enough about worldwide unicode support.

I suspect we need to be about as conservative as we currently are for
platforms like IE6 on XP.  We should be able to expand the range of
HTML characters in the future, though.

> A good PHP parser library would be exceptionally useful for MediaWiki
> (and many extensions), at the moment we have loads of methods that do
> regex "parsing", so if you felt like writing one... :D.

Wouldn't a real generic parser implementation written in PHP be too
slow to be useful?  preg_replace() has the advantage of being
implemented in C.

> I am
> less convinced of the utility of a Python port, OCaml is a great
> language for implementing this, and I fear a lot of your time would be
> wasted trying to make the Python similarly nice. As you note, MediaWiki
> is not written in Python, doing this in PHP would be a larger step in
> the right direction, though without such nice frameworks, maybe less
> nice to do.

OCaml might be a great language for implementing this, but very few of
us understand it.  texvc has been totally unmaintained for years,
other than new things being added to the whitelist sometimes by means
of cargo-culting what previous commits do.  Rewriting texvc in
*anything* that more people understand would 

  1   2   >