[Wikitech-l] Call for participation in OpenSym 2015, Aug 19-20, San Francisco!
Call for participation in OpenSym 2015! Aug 19-20, 2015, San Francisco, http://opensym.org FOUR FANTASTIC KEYNOTES Richard Gabriel (IBM) on Using Machines to Manage Public Sentiment on Social Media Peter Norvig (GOOGLE) on Applying Machine Learning to Programs Robert Glushko (UC BERKELEY) on Collaborative Authoring, Evolution, and Personalization Anthony Wassermann (CMU SV) on Barriers and Pathways to Successful Collaboration More at http://www.opensym.org/category/conference-contributions/keynotes-invited-talks/ GREAT RESEARCH PROGRAM All core open collaboration tracks, including - free/libre/open source - open data - Wikipedia - wikis and open collaboration, and - open innovation More at http://www.opensym.org/2015/06/25/preliminary-opensym-2015-program-announced/ INCLUDING OPEN SPACE The facilities provide room and space for your own working groups. AT A WONDERFUL LOCATION OpenSym 2015 takes place from Aug 19-20 at the Golden Gate Club of San Francisco, smack in the middle of the Presidio, with a wonderful view of the Golden Gate Bridge. More at http://www.opensym.org/os2015/location/ REGISTRATION Is simple, subsidized, and all-encompassing. Find it here: http://www.opensym.org/os2015/registration/ Prices will go up after July 12th, so be sure to register early! We would like to thank our sponsors Wikimedia Foundation, Google, TJEF, and the ACM. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] WikiSym + OpenSym 2013: Less than 2 weeks for Community Track Submissions
the demo, a specific description of what you plan to demo, what you hope to get out of demoing, and how the audience will benefit. A short note of any special technical requirements should be included. Demo submissions will be reviewed based on their relevance to the community. All accepted demos will given space at a joint demo session (90 minutes) during the conference. Tutorials Tutorials tutorials are half-day classes, taught by experts, designed to help professionals rapidly come up to speed on a specific technology or methodology. Tutorials can be lecture-oriented or participatory. Tutorial attendees deserve the highest standard of excellence in tutorial preparation and delivery. Tutorial presenters are typically experts in their chosen topic and experienced speakers skilled in preparing and delivering educational presentations. When selecting tutorials, we will consider the presenters knowledge of the proposed topic and past success at teaching it. SUBMISSION INFORMATION AND INSTRUCTIONS There are two submission deadlines, an early and a regular one. The early deadline is for those who need to know early that their community track submission has been accepted. This mostly applies to workshops that require a program committee and their own paper submission and review process (as opposed, for example, to walk-in workshops). Also, some may need the additional time to raise funds and acquire a visa. Submissions should follow the standard ACM SIG proceedings format. For advice and templates, please see http://www.acm.org/sigs/publications/proceedings-templates. All papers must conform at time of submission to the formatting instructions and must not exceed the page limits, including all text, references, appendices and figures. All submissions must in PDF format. All papers and proposals should be submitted electronically through EasyChair using the following URL: https://www.easychair.org/conferences/?conf=opensym2013community SUBMISSION AND NOTIFICATION DEADLINES * Early submission deadline: March 17, 2013 * Notification for early submissions: March 31, 2013 * Regular submission deadline: May 17, 2013 * Notification for regular submissions: May 31, 2013 * Camera-ready for both rounds: June 9, 2013 As long as it is May 17 somewhere on earth, your submission will be accepted. COMMUNITY TRACK PROGRAM COMMITTEE Chairs Regis Barondeau (Université du Québec à Montréal) Dirk Riehle (Friedrich-Alexander University Erlangen-Nürnberg) -- Website: http://dirkriehle.com - Twitter: @dirkriehle Ph (DE): +49-157-8153-4150 - Ph (US): +1-650-450-8550 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] programmatically extracting lists from list pages on Wikipedia
Try the Sweble parser for extracting structured data from Wikitext http://sweble.org http://dirkriehle.com, +49 157 8153 4150, +1 650 450 8550 On Nov 22, 2011 9:35 PM, "Fred Zimmerman" wrote: > hi, > > I want to programmatically extract lists from list pages on Wikipedia. That > is to say, if there is a page that mostly consists of a list (list of > episodes, list of presidents, etc.) I want to be able to extract the list > from the page, with article names/links. Has anyone already done this? can > anyone suggest a good strategy? > > FredZ > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Announcing Wikihadoop: using Hadoop to analyze Wikipedia dump files
Hello everyone! Wikihadoop sounds like a great project! I wanted to point out that you can make it even more powerful for many research applications by combining it with the Sweble Wikitext parser. Doing so, you could enable Wikipedia dump processing not only on the rough XML dump level, but on the fine grain individual element (bold piece, heading, paragraph, category, page, etc.) level. You can learn more about Sweble here: http://sweble.org Cheers, Dirk On 08/17/2011 06:58 PM, Diederik van Liere wrote: > Hello! > > Over the last few weeks, Yusuke Matsubara, Shawn Walker, Aaron Halfaker and > Fabian Kaelin (who are all Summer of Research fellows)[0] have worked hard > on a customized stream-based InputFormatReader that allows parsing of both > bz2 compressed and uncompressed files of the full Wikipedia dump (dump file > with the complete edit histories) using Hadoop. Prior to WikiHadoop and the > accompanying InputFormatReader it was not possible to use Hadoop to analyze > the full Wikipedia dump files (see the detailed tutorial / background for an > explanation why that was not possible). > > This means: > 1) We can now harness Hadoop's distributed computing capabilities in > analyzing the full dump files. > 2) You can send either one or two revisions to a single mapper so it's > possible to diff two revisions and see what content has been addded / > removed. > 3) You can exclude namespaces by supplying a regular expression. > 4) We are using Hadoop's Streaming interface which means people can use this > InputFormat Reader using different languages such as Java, Python, Ruby and > PHP. > > The source code is available at: https://github.com/whym/wikihadoop > A more detailed tutorial and installation guide is available at: > https://github.com/whym/wikihadoop/wiki > > > (Apologies for cross-posting to wikitech-l and wiki-research-l) > > [0] http://blog.wikimedia.org/2011/06/01/summerofresearchannouncement/ > > > Best, > > Diederik > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > -- Website: http://dirkriehle.com - Twitter: @dirkriehle Ph (DE): +49-157-8153-4150 - Ph (US): +1-650-450-8550 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] WYSIWYG and parser plans (was What is wrong with Wikia's WYSIWYG?)
On 05/03/2011 08:28 PM, Neil Harris wrote: > On 03/05/11 19:44, MZMcBride wrote: ... >> The point is that the wikitext and its parsing should be completely separate >> from MediaWiki/PHP/HipHop/Zend. >> >> I think some of the bigger picture is getting lost here. Wikimedia produces >> XML dumps that contain wikitext. For most people, this is the only way to >> obtain and reuse large amounts of content from Wikimedia wikis (especially >> as the HTML dumps haven't been re-created since 2008). There needs to be a >> way for others to be able to very easily deal with this content. >> >> Many people have suggested (with good reason) that this means that wikitext >> parsing needs to be reproducible in other programming languages. While >> HipHop may be the best thing since sliced bread, I've yet to see anyone put >> forward a compelling reason that the current state of affairs is acceptable. >> Saying "well, it'll soon be much faster for MediaWiki to parse" doesn't >> overcome the legitimate issues that re-users have (such as programming in a >> language other than PHP, banish the thought). >> >> For me, the idea that all that's needed is a faster parser in PHP is a >> complete non-starter. >> >> MZMcBride >> > > I agree completely. > > I think it cannot be emphasized enough that what's valuable about > Wikipedia and other similar wikis is the hard-won _content_, not the > software used to write and display it at any given, which is merely a > means to that end. > > Fashions in programming languages and data formats come and go, but the > person-centuries of writing effort already embodied in Mediawiki's > wikitext format needs to have a much longer lifespan: having a > well-defined syntax for its current wikitext format will allow the > content itself to continue to be maintained for the long term, beyond > the restrictions of its current software or encoding format. > > -- Neil +1 to both MZMcBride and Neil. So relieved to see things put so eloquently. Dirk -- Website: http://dirkriehle.com - Twitter: @dirkriehle Ph (DE): +49-157-8153-4150 - Ph (US): +1-650-450-8550 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Announcing the Open Source Sweble Wikitext Parser v1.0
>>> You should identify whether you mean "MediaWikitext", or some other >>> dialect -- MediaWiki Is Not The Only Wiki... >>> >>> and you should post to wikitext-l as well. The real parser maniacs >>> hang out over there, even though traffic is low. >> >> It is MediaWiki's Wikitext; elsewhere it is usually called wiki >> markup. > > Improperly and incompletely, perhaps, yes. > > I'm a MW partisan, and think it's better than nearly all its competitors, > for nearly all uses... but even I try not to be *that* partisan. Hmm, never viewed it that way. IMO, MediaWiki (developers) invented a wiki markup language and called it Wikitext; other engines just call it wiki markup or what not. For me, Wikitext always was the particular markup of MediaWiki, much like php or C++ are particular language names. Is there any other engine that calls it's markup Wikitext? I'd be surprised. Even for WikiCreole wikicreole.org we used wiki markup. Cheers, Dirk -- Website: http://dirkriehle.com - Twitter: @dirkriehle Ph (DE): +49-157-8153-4150 - Ph (US): +1-650-450-8550 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Announcing the Open Source Sweble Wikitext Parser v1.0
> You should identify whether you mean "MediaWikitext", or some other > dialect -- MediaWiki Is Not The Only Wiki... > > and you should post to wikitext-l as well. The real parser maniacs hang > out over there, even though traffic is low. It is MediaWiki's Wikitext; elsewhere it is usually called wiki markup. Cheers, Dirk -- Website: http://dirkriehle.com - Twitter: @dirkriehle Ph (DE): +49-157-8153-4150 - Ph (US): +1-650-450-8550 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] WikiCreole (was Re: What would be a perfect wiki syntax? (Re: WYSIWYG))
> As long as we're hung up on details of the markup syntax, it's going to be > very very hard to make useful forward motion on things that are actually > going to enhance the capabilities of the system and put creative power in > the hands of the users. > > Forget about syntax -- what do we want to *accomplish*? I think you got this sideways. The concrete syntax doesn't matter, but the abstract syntax does. Without a clear specification no competing parsers, no interoperability, no decoupling APIs, no independently evolving components. (Abstract syntax here means "XML representation" or structured representation or DOM tree i.e. an abstract syntax tree. But for that you need a language i.e. Wikitext specification and an implementation of a parser as of today doesn't do the job.) > worrying about memorizing ASCII code points, it's let us go beyond > fixed-width ASCII text (a monitor emulating a teletype, which was really a > friendlier version of punch cards) to have things like _graphics_. Text can > be in different sizes, different styles, and different languages. We can see > pictures; we can draw pictures; we can use colors and shapes to create a far > richer, more creative experience for the user. > > GUIs didn't come about from a better, more universal way of encoding text -- > Unicode came years after GUI conventions were largely standardized in > practice. In order to have a visual editor or three, combined with a plain text editor, combined with some fancy other editor we have yet to invent, you will still need that specification that tells you what a valid wiki instance is. This is the core data; only if you have a clear spec of that can you have tool and UI innovation on top of that. Cheers, Dirk -- Website: http://dirkriehle.com - Twitter: @dirkriehle Ph (DE): +49-157-8153-4150 - Ph (US): +1-650-450-8550 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] WikiCreole (was Re: What would be a perfect wiki syntax? (Re: WYSIWYG))
>> (Note that I think any conversation about parser changes should consider >> the GoodPractices page from http://www.wikicreole.org/wiki/GoodPractices.) >> >> If nothing else, perhaps there would be some use for the EBNF grammar >> that was developed for WikiCreole. >> http://dirkriehle.com/2008/01/09/an-ebnf-grammar-for-wiki-creole-10/ > > WikiCreole used to not be parsable by a grammar, either. And it has > inconsistencies like "italic is // unless it appears in a url". > Good to see they improved. WikiCreole only had a prose specification, hence it was ambiguous. Our syntax definition improved that so that in theory (and practice) you could now have multiple competing parser implementations. The issue with WikiCreole now is that it is simply too small---lots of stuff that it can't do but that any wiki engine will want. The real reason why to care about a precise specification (that is not, as in the case of Mediawiki, simply the implementation), is the option to evolve faster. The real paper for this is http://dirkriehle.com/2008/07/19/a-grammar-for-standardized-wiki-markup/ - wouldn't it be nice if we could be innovating on a wiki platform? Cheers, Dirk -- Website: http://dirkriehle.com - Twitter: @dirkriehle Ph (DE): +49-157-8153-4150 - Ph (US): +1-650-450-8550 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Alternative Mediawiki (Parser) Implementations (was: Re: Extension of Wikitext
One option might be to use one of the alternative Mediawiki (parser) implementations. I know of JAMwiki and Bliki. JAMwiki is a mostly complete Java implementation. The parser can be taken out and is reasonably well factored, based on a grammar for JFlex, a parser generator (if I remember this correctly). Bliki is purely a Mediawiki parser implementation, not a full-blown wiki engine, also done in Java. I'm generally interested in finding a non-php well-factored Mediawiki syntax parser, ideally written in Java, that I can use for my own projects. Are there new alternatives; does anyone have opinions/insights into the state of the tools mentioned above? It seems pretty tough to track the Mediawiki syntax... Cheers, Dirk On Mon, Nov 17, 2008 at 5:34 AM, Alex Bernier <[EMAIL PROTECTED]> wrote: > Hello, > > I hope it is the right place to ask my question... > > I work on a "collaborative correction of books" project. I know there is > already some projects related to this subject, like Wikisource. The main > difference between my project and Wikisource is that my books are stored > in text using DAISY (see http://www.daisy.org/), a format based on XML. I > have some questions : > > 1) Is there some tools to import XML files in a Wiki ? > > 2) Is there tools to export a Wiki page in XML ? > > 3) I will have to extend the Wikitext (I want to import DAISY XML files in > my Wiiki and export them from the Wiki to XML DAISY after correction, > without loosing information). I think it would be easy for the majority of > the new tags I want to add, but it would be more difficult for some of them. > For example, I need to improve the headings possibilities of the Wikitext. > For the moment, it is limited to 5 levels. I need potentially infinite > imbrications, like this: > > Title 1 > > Title 2 > ... > >Title n > > > > > Is it possible to add this kind of thing in the Wikitext ? If yes, is it > possible to do this with an extension, or is it necessary to do "low-levels" > modifications of the Wikitext parser ? > > Best regards, > > Alex Bernier > > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > -- Phone: +1 650 215 3459 Weblog: http://www.riehle.org Twitter: http://twitter.com/driehle ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l