2010-08-07 20:24, lmhelp skrev:
>
>> So why not use the "real" parser?
>
> Exactly. Where can it be found, please?
>
> Thanks and all the best,
> --
> Lmhelp
fetch the html from wikipedia.org with something like wget
(playing nicely and using delays!) and then extract the
first element with somet
Ok. I can answer myself the question, it is: no.
It doesn't depend on the Wikipedia language.
--
Lmhelp
On 8/9/2010 9:19 AM, lmhelp2 wrote:
>
> Hi Axel,
>
> Thank you for your answer.
>
> I am wondering... how do you explain that the two templates
> "{{Guil|'''parti philosophique'''}}" and "{{s-
Hi Magnus,
This would be really great if I could do that!
Where can I download the "real" parser?
Can I use it in the following way:
=> let's suppose:
- the parser's name is "wiki_to_html_parser",
- I have a "Wikipedia" article in its "Wikitext" version
"article.wikitext",
- I want
Hi Axel,
Thank you for your answer.
I am wondering... how do you explain that the two templates
"{{Guil|'''parti philosophique'''}}" and "{{s-|XVIII|e|}}"
in my example are not processed correctly (by default) (*)?
Is it because "Bliki" works correctly with English
"wiki" articles and not with,
On Sun, Aug 8, 2010 at 9:49 PM, lmhelp wrote:
>
> Hi,
>
> I have abandonned "Bliki" because look what happenned:
>
> Here is what I gave to "Bliki" as an input:
> ---
> Le {{Guil|'''parti philosophique'''}} désignait globa
Hi,
I have abandonned "Bliki" because look what happenned:
Here is what I gave to "Bliki" as an input:
---
Le {{Guil|'''parti philosophique'''}} désignait globalement au
{{s-|XVIII|e|}},
en [[France]], les intellectuels
> So why not use the "real" parser?
Exactly. Where can it be found, please?
Thanks and all the best,
--
Lmhelp
--
View this message in context:
http://old.nabble.com/Wikitext-grammar-tp29350471p29376156.html
Sent from the WikiMedia General mailing list archive at Nabble.com.
>-
> mwlib was written in conjunction with the WMF, and IIRC had at least some
> input from Brion Vibber. It's high quality and works well. There is a 2-3
> hour learning curve for navigating the python modules and methods
So why not use the "real" parser?
* Get rendered HTML page
* Extract
* Take the first element in there
Profit!
Magnus
On Sat, Aug 7, 2010 at 6:19 PM, Brian J Mingus
wrote:
> On Sat, Aug 7, 2010 at 10:54 AM, lmhelp wrote:
>
>>
>> Hi,
>>
>> Thank you for your answer.
>>
>> > mwlib is the bes
On Sat, Aug 7, 2010 at 10:54 AM, lmhelp wrote:
>
> Hi,
>
> Thank you for your answer.
>
> > mwlib is the best parser available for folks who want to do a quick job
> > such
> > as yours.
>
> Maybe it is, I don't know...
> I know (since recently) it is not an easy task constructing a parser for
>
Hi,
Thank you for your answer.
> mwlib is the best parser available for folks who want to do a quick job
> such
> as yours.
Maybe it is, I don't know...
I know (since recently) it is not an easy task constructing a parser for
"Wikitext"...
but, fairly, it is not really satisfactory to have {{l
On Sat, Aug 7, 2010 at 9:21 AM, lmhelp wrote:
>
> MY FIRST QUESTION IS:
> =
> I was wondering if you knew a better tool than this one... one which
> wouldn't "miss" some "Wikitext" chunks of code like in the above
> example (or maybe which at least would handle usual templates
Thank you all for your contribs :).
Hi,
So... I was over-optimistic about managing to extract the first
paragraph of a "Wikipedia" article out of its "Wikitext" easily...
Yet, I managed (1) for instance (for the "Wikipedia" article "Čokot")
to get the following "Wikitext" sentence:
On 6 August 2010 18:59, Trevor Parscal wrote:
> In short, the current "parser" is a bad example of how to write a
> parser,
I forgot to call it "a box of pure malevolent evil, a purveyor of
insidious insanity, an eldritch manifestation that would make Bill
Gates let out a low whistle of admirat
The current "parser" is, as David Gerard said, not much of a parser by
any conventional definition. It's more of a macro-expander (for parser
tags and templates) and a series of mostly-regular-expression-based
replacement routines, which result in partially valid HTML which is then
repaired i
On Fri, Aug 6, 2010 at 10:18 AM, Léa Massiot wrote:
> Are you sure this will be able to extract the
> introductory paragraph (only) which is not in any
> section... (because it is not trivial).
>
> There is only one example I could find at
> http://code.pediapress.com/wiki/wiki/mwlib
> ... which
Are you sure this will be able to extract the
introductory paragraph (only) which is not in any
section... (because it is not trivial).
There is only one example I could find at
http://code.pediapress.com/wiki/wiki/mwlib
... which is not so easy to understand by the way...
Cheers,
--
Lmhelp
O
On Fri, Aug 6, 2010 at 10:06 AM, Léa Massiot wrote:
> A colleague told me about that... so we had a look at it.
> Unfortunately, abstracts are not correct most of the time...
>
> -
> Example (in French):
> ---
A colleague told me about that... so we had a look at it.
Unfortunately, abstracts are not correct most of the time...
-
Example (in French):
-
Wikipédia : Arabie saoudit
On Wed, Aug 4, 2010 at 1:45 PM, lmhelp wrote:
>
>
> I need to extract automatically the first paragraph of a Wiki article...
>
>
See Extracted page extracts for Yahoo:
http://download.wikimedia.org/enwiki/20100730/
___
MediaWiki-l mailing list
MediaWiki
Also ignore lines starting with "#", ":", " " (space), or ";" .
Then there are (potentially nested) tables, which start with a line
beginning with "{|" and end in a line beginning with "|}".
There are more "magic words" with the general pattern
"__SOMEUPPERCASECHARACTERS__", IIRC.
Note that some
If you are to extract only Wikipedia'a articles first paragraph no problema.
2010/8/6 Katharina Wolkwitz
> Hi,
>
> Am 05.08.2010 16:47 schrieb lmhelp2:
> >
> > Thank you!
> >
> > So here is the list I have for the moment:
> > I need to ignore lines:
> > - containing: {{...}}
> > => pos
Hi,
Am 05.08.2010 16:47 schrieb lmhelp2:
>
> Thank you!
>
> So here is the list I have for the moment:
> I need to ignore lines:
> - containing: {{...}}
> => possibly spreading over several lines,
> => being possibly nested {{... {{ ... }} ... }}.
> - containing: [[...]]
>
Thank you!
So here is the list I have for the moment:
I need to ignore lines:
- containing: {{...}}
=> possibly spreading over several lines,
=> being possibly nested {{... {{ ... }} ... }}.
- containing: [[...]]
=> being possibly nested [[... [[ ... ]] ... ]].
- eq
Hi,
there might be an occurrence of __TOC__ or __NOTOC__ before the first "real"
paragraph.
Good luck with finding all exeptions. :)
Katharina
Am 05.08.2010 14:10 schrieb lmhelp2:
>
> Hi,
>
> Thanks to all of you for your answers.
>
> I have decided (in the light of what you told me)
> to rea
On Thu, Aug 5, 2010 at 1:10 PM, lmhelp2 wrote:
>
> Hi,
>
> Thanks to all of you for your answers.
>
> I have decided (in the light of what you told me)
> to read the "Wikitext" line after line.
>
> I must "ignore" leading:
> - templates (including the ones which span
> over several consecutive li
Hi,
Thanks to all of you for your answers.
I have decided (in the light of what you told me)
to read the "Wikitext" line after line.
I must "ignore" leading:
- templates (including the ones which span
over several consecutive lines like "infoboxes"): {{...}},
- isolated internal links: [[...]
media.org] On Behalf Of lmhelp
Sent: Wednesday, August 04, 2010 3:45 PM
To: mediawiki-l@lists.wikimedia.org
Subject: [Mediawiki-l] Wikitext grammar
Hi,
Thank you or reading my post.
I am wondering if there exists a "grammar" for the "Wikicode"/"Wikitext"
language (
@lists.wikimedia.org
Betreff: [Mediawiki-l] Wikitext grammar
Hi,
Thank you or reading my post.
I am wondering if there exists a "grammar" for the "Wikicode"/"Wikitext"
language (or an *exhaustive* (and formal) set of rules about how is
constructed
a "Wikitext")
On 4 August 2010 23:58, David Gerard wrote:
> On 4 August 2010 20:45, lmhelp wrote:
>> - Is a grammar available somewhere?
>> - Do you have any idea how to extract the first paragaph of a Wiki article?
>> - Any advice?
>> - Does a Java "Wikitext" "parser" exists which would do it?
> If anyone e
On 4 August 2010 20:45, lmhelp wrote:
> I am wondering if there exists a "grammar" for the "Wikicode"/"Wikitext"
> language (or an *exhaustive* (and formal) set of rules about how is
> constructed
> a "Wikitext").
> I've looked for such a grammar/set of rules on the Web but I couldn't find
> one.
lmhelp wrote:
>
> Hi,
>
> Thank you or reading my post.
>
> I am wondering if there exists a "grammar" for the "Wikicode"/"Wikitext"
> language (or an *exhaustive* (and formal) set of rules about how is
> constructed
> a "Wikitext").
> I've looked for such a grammar/set of rules on the Web but
Hi,
Thank you or reading my post.
I am wondering if there exists a "grammar" for the "Wikicode"/"Wikitext"
language (or an *exhaustive* (and formal) set of rules about how is
constructed
a "Wikitext").
I've looked for such a grammar/set of rules on the Web but I couldn't find
one...
I need to
33 matches
Mail list logo