Re: [Mediawiki-l] Wikitext grammar

2010-08-09 Thread BPJ
2010-08-07 20:24, lmhelp skrev: > >> So why not use the "real" parser? > > Exactly. Where can it be found, please? > > Thanks and all the best, > -- > Lmhelp fetch the html from wikipedia.org with something like wget (playing nicely and using delays!) and then extract the first element with somet

Re: [Mediawiki-l] Wikitext grammar

2010-08-09 Thread Léa Massiot
Ok. I can answer myself the question, it is: no. It doesn't depend on the Wikipedia language. -- Lmhelp On 8/9/2010 9:19 AM, lmhelp2 wrote: > > Hi Axel, > > Thank you for your answer. > > I am wondering... how do you explain that the two templates > "{{Guil|'''parti philosophique'''}}" and "{{s-

Re: [Mediawiki-l] Wikitext grammar

2010-08-09 Thread Léa Massiot
Hi Magnus, This would be really great if I could do that! Where can I download the "real" parser? Can I use it in the following way: => let's suppose: - the parser's name is "wiki_to_html_parser", - I have a "Wikipedia" article in its "Wikitext" version "article.wikitext", - I want

Re: [Mediawiki-l] Wikitext grammar

2010-08-09 Thread lmhelp2
Hi Axel, Thank you for your answer. I am wondering... how do you explain that the two templates "{{Guil|'''parti philosophique'''}}" and "{{s-|XVIII|e|}}" in my example are not processed correctly (by default) (*)? Is it because "Bliki" works correctly with English "wiki" articles and not with,

Re: [Mediawiki-l] Wikitext grammar

2010-08-08 Thread Axel
On Sun, Aug 8, 2010 at 9:49 PM, lmhelp wrote: > > Hi, > > I have abandonned "Bliki" because look what happenned: > > Here is what I gave to "Bliki" as an input: > --- > Le {{Guil|'''parti philosophique'''}} désignait globa

Re: [Mediawiki-l] Wikitext grammar

2010-08-08 Thread lmhelp
Hi, I have abandonned "Bliki" because look what happenned: Here is what I gave to "Bliki" as an input: --- Le {{Guil|'''parti philosophique'''}} désignait globalement au {{s-|XVIII|e|}}, en [[France]], les intellectuels

Re: [Mediawiki-l] Wikitext grammar

2010-08-07 Thread lmhelp
> So why not use the "real" parser? Exactly. Where can it be found, please? Thanks and all the best, -- Lmhelp -- View this message in context: http://old.nabble.com/Wikitext-grammar-tp29350471p29376156.html Sent from the WikiMedia General mailing list archive at Nabble.com.

Re: [Mediawiki-l] Wikitext grammar

2010-08-07 Thread lmhelp
>- > mwlib was written in conjunction with the WMF, and IIRC had at least some > input from Brion Vibber. It's high quality and works well. There is a 2-3 > hour learning curve for navigating the python modules and methods

Re: [Mediawiki-l] Wikitext grammar

2010-08-07 Thread Magnus Manske
So why not use the "real" parser? * Get rendered HTML page * Extract * Take the first element in there Profit! Magnus On Sat, Aug 7, 2010 at 6:19 PM, Brian J Mingus wrote: > On Sat, Aug 7, 2010 at 10:54 AM, lmhelp wrote: > >> >> Hi, >> >> Thank you for your answer. >> >> > mwlib is the bes

Re: [Mediawiki-l] Wikitext grammar

2010-08-07 Thread Brian J Mingus
On Sat, Aug 7, 2010 at 10:54 AM, lmhelp wrote: > > Hi, > > Thank you for your answer. > > > mwlib is the best parser available for folks who want to do a quick job > > such > > as yours. > > Maybe it is, I don't know... > I know (since recently) it is not an easy task constructing a parser for >

Re: [Mediawiki-l] Wikitext grammar

2010-08-07 Thread lmhelp
Hi, Thank you for your answer. > mwlib is the best parser available for folks who want to do a quick job > such > as yours. Maybe it is, I don't know... I know (since recently) it is not an easy task constructing a parser for "Wikitext"... but, fairly, it is not really satisfactory to have {{l

Re: [Mediawiki-l] Wikitext grammar

2010-08-07 Thread Brian J Mingus
On Sat, Aug 7, 2010 at 9:21 AM, lmhelp wrote: > > MY FIRST QUESTION IS: > = > I was wondering if you knew a better tool than this one... one which > wouldn't "miss" some "Wikitext" chunks of code like in the above > example (or maybe which at least would handle usual templates

Re: [Mediawiki-l] Wikitext grammar

2010-08-07 Thread lmhelp
Thank you all for your contribs :). Hi, So... I was over-optimistic about managing to extract the first paragraph of a "Wikipedia" article out of its "Wikitext" easily... Yet, I managed (1) for instance (for the "Wikipedia" article "Čokot") to get the following "Wikitext" sentence:

Re: [Mediawiki-l] Wikitext grammar

2010-08-06 Thread David Gerard
On 6 August 2010 18:59, Trevor Parscal wrote: > In short, the current "parser" is a bad example of how to write a > parser, I forgot to call it "a box of pure malevolent evil, a purveyor of insidious insanity, an eldritch manifestation that would make Bill Gates let out a low whistle of admirat

Re: [Mediawiki-l] Wikitext grammar

2010-08-06 Thread Trevor Parscal
The current "parser" is, as David Gerard said, not much of a parser by any conventional definition. It's more of a macro-expander (for parser tags and templates) and a series of mostly-regular-expression-based replacement routines, which result in partially valid HTML which is then repaired i

Re: [Mediawiki-l] Wikitext grammar

2010-08-06 Thread Brian J Mingus
On Fri, Aug 6, 2010 at 10:18 AM, Léa Massiot wrote: > Are you sure this will be able to extract the > introductory paragraph (only) which is not in any > section... (because it is not trivial). > > There is only one example I could find at > http://code.pediapress.com/wiki/wiki/mwlib > ... which

Re: [Mediawiki-l] Wikitext grammar

2010-08-06 Thread Léa Massiot
Are you sure this will be able to extract the introductory paragraph (only) which is not in any section... (because it is not trivial). There is only one example I could find at http://code.pediapress.com/wiki/wiki/mwlib ... which is not so easy to understand by the way... Cheers, -- Lmhelp O

Re: [Mediawiki-l] Wikitext grammar

2010-08-06 Thread Brian J Mingus
On Fri, Aug 6, 2010 at 10:06 AM, Léa Massiot wrote: > A colleague told me about that... so we had a look at it. > Unfortunately, abstracts are not correct most of the time... > > - > Example (in French): > ---

Re: [Mediawiki-l] Wikitext grammar

2010-08-06 Thread Léa Massiot
A colleague told me about that... so we had a look at it. Unfortunately, abstracts are not correct most of the time... - Example (in French): - Wikipédia : Arabie saoudit

Re: [Mediawiki-l] Wikitext grammar

2010-08-06 Thread Brian J Mingus
On Wed, Aug 4, 2010 at 1:45 PM, lmhelp wrote: > > > I need to extract automatically the first paragraph of a Wiki article... > > See Extracted page extracts for Yahoo: http://download.wikimedia.org/enwiki/20100730/ ___ MediaWiki-l mailing list MediaWiki

Re: [Mediawiki-l] Wikitext grammar

2010-08-06 Thread Magnus Manske
Also ignore lines starting with "#", ":", " " (space), or ";" . Then there are (potentially nested) tables, which start with a line beginning with "{|" and end in a line beginning with "|}". There are more "magic words" with the general pattern "__SOMEUPPERCASECHARACTERS__", IIRC. Note that some

Re: [Mediawiki-l] Wikitext grammar

2010-08-06 Thread nevio carlos de alarcão
If you are to extract only Wikipedia'a articles first paragraph no problema. 2010/8/6 Katharina Wolkwitz > Hi, > > Am 05.08.2010 16:47 schrieb lmhelp2: > > > > Thank you! > > > > So here is the list I have for the moment: > > I need to ignore lines: > > - containing: {{...}} > > => pos

Re: [Mediawiki-l] Wikitext grammar

2010-08-05 Thread Katharina Wolkwitz
Hi, Am 05.08.2010 16:47 schrieb lmhelp2: > > Thank you! > > So here is the list I have for the moment: > I need to ignore lines: > - containing: {{...}} > => possibly spreading over several lines, > => being possibly nested {{... {{ ... }} ... }}. > - containing: [[...]] >

Re: [Mediawiki-l] Wikitext grammar

2010-08-05 Thread lmhelp2
Thank you! So here is the list I have for the moment: I need to ignore lines: - containing: {{...}} => possibly spreading over several lines, => being possibly nested {{... {{ ... }} ... }}. - containing: [[...]] => being possibly nested [[... [[ ... ]] ... ]]. - eq

Re: [Mediawiki-l] Wikitext grammar

2010-08-05 Thread Katharina Wolkwitz
Hi, there might be an occurrence of __TOC__ or __NOTOC__ before the first "real" paragraph. Good luck with finding all exeptions. :) Katharina Am 05.08.2010 14:10 schrieb lmhelp2: > > Hi, > > Thanks to all of you for your answers. > > I have decided (in the light of what you told me) > to rea

Re: [Mediawiki-l] Wikitext grammar

2010-08-05 Thread Magnus Manske
On Thu, Aug 5, 2010 at 1:10 PM, lmhelp2 wrote: > > Hi, > > Thanks to all of you for your answers. > > I have decided (in the light of what you told me) > to read the "Wikitext" line after line. > > I must "ignore" leading: > - templates (including the ones which span >  over several consecutive li

Re: [Mediawiki-l] Wikitext grammar

2010-08-05 Thread lmhelp2
Hi, Thanks to all of you for your answers. I have decided (in the light of what you told me) to read the "Wikitext" line after line. I must "ignore" leading: - templates (including the ones which span over several consecutive lines like "infoboxes"): {{...}}, - isolated internal links: [[...]

Re: [Mediawiki-l] Wikitext grammar

2010-08-05 Thread Edward Swing
media.org] On Behalf Of lmhelp Sent: Wednesday, August 04, 2010 3:45 PM To: mediawiki-l@lists.wikimedia.org Subject: [Mediawiki-l] Wikitext grammar Hi, Thank you or reading my post. I am wondering if there exists a "grammar" for the "Wikicode"/"Wikitext" language (

Re: [Mediawiki-l] Wikitext grammar

2010-08-05 Thread Scheid, Bernhard
@lists.wikimedia.org Betreff: [Mediawiki-l] Wikitext grammar Hi, Thank you or reading my post. I am wondering if there exists a "grammar" for the "Wikicode"/"Wikitext" language (or an *exhaustive* (and formal) set of rules about how is constructed a "Wikitext")

Re: [Mediawiki-l] Wikitext grammar

2010-08-04 Thread David Gerard
On 4 August 2010 23:58, David Gerard wrote: > On 4 August 2010 20:45, lmhelp wrote: >> - Is a grammar available somewhere? >> - Do you have any idea how to extract the first paragaph of a Wiki article? >> - Any advice? >> - Does a Java "Wikitext" "parser" exists which would do it? > If anyone e

Re: [Mediawiki-l] Wikitext grammar

2010-08-04 Thread David Gerard
On 4 August 2010 20:45, lmhelp wrote: > I am wondering if there exists a "grammar" for the "Wikicode"/"Wikitext" > language (or an *exhaustive* (and formal) set of rules about how is > constructed > a "Wikitext"). > I've looked for such a grammar/set of rules on the Web but I couldn't find > one.

Re: [Mediawiki-l] Wikitext grammar

2010-08-04 Thread Platonides
lmhelp wrote: > > Hi, > > Thank you or reading my post. > > I am wondering if there exists a "grammar" for the "Wikicode"/"Wikitext" > language (or an *exhaustive* (and formal) set of rules about how is > constructed > a "Wikitext"). > I've looked for such a grammar/set of rules on the Web but

[Mediawiki-l] Wikitext grammar

2010-08-04 Thread lmhelp
Hi, Thank you or reading my post. I am wondering if there exists a "grammar" for the "Wikicode"/"Wikitext" language (or an *exhaustive* (and formal) set of rules about how is constructed a "Wikitext"). I've looked for such a grammar/set of rules on the Web but I couldn't find one... I need to