On Fri, Oct 23, 2009 at 12:20 PM, Andrew Dunbar wrote:
> Yes I didn't specify tl_namespace
In MySQL that will usually make it impossible to effectively use an
index on (tl_namespace, tl_title), so it's essential that you specify
the NS. (Which you should anyway to avoid hitting things like
[[Tem
On Fri, Oct 23, 2009 at 7:04 AM, Roan Kattouw wrote:
> 2009/10/23 Robert Rohde :
>> Given the fairly obvious utility for data mining, it might make sense
>> for someone to extend the Mediawiki API to generate a list of template
>> calls and the parameters sent in each case.
>>
> We had a discussio
Note: the trailing "}" is part of the URL. Some mail readers may
cut it off.
On Fri, Oct 23, 2009 at 18:45, Jona Christopher Sahnwaldt
wrote:
> Because of result count restrictions, these queries don't
> return all ISO language codes extracted by DBpedia,
> but I think they give a good impression
Because of result count restrictions, these queries don't
return all ISO language codes extracted by DBpedia,
but I think they give a good impression of the data quality
and coverage (or sometimes lack thereof):
http://dbpedia.org/sparql?query=select+distinct+%3Fs%2C+%3Fo+where{%3Fs+%3Chttp%3A%2F%
2009/10/23 Aryeh Gregor :
> On Fri, Oct 23, 2009 at 8:27 AM, Andrew Dunbar wrote:
>> Yes I found how to get it through the API now. It was actually just
>> the Toolserver database that was intractably slow.
>
> There's nothing slow about the TS database here:
>
> mysql> pager true
> PAGER set to '
2009/10/23 William Pietri :
> George Herbert wrote:
>> This discussion brings to mind several historical threads.
>> I wonder if a project to simply mine the whole article contents and
>> provide a DB of some sort with the articles and infobox contents would
>> be worthwhile. Develop a specific p
Fascinating!
It seems to be a repeating pattern on these mailing lists that people
ignore existing solutions and discuss re-inventing wheels (please
correct me if I'm wrong here).
While I agree this is fun some it rarely helps the OP...
[[User:Dschwen]]
___
On Fri, Oct 23, 2009 at 8:27 AM, Andrew Dunbar wrote:
> Yes I found how to get it through the API now. It was actually just
> the Toolserver database that was intractably slow.
There's nothing slow about the TS database here:
mysql> pager true
PAGER set to 'true'
mysql> SELECT tl_from FROM templ
Robert Ullmann wrote:
>> I've been spending hours on the parsing now and don't find it simple
>> at all due to the fact that templates can be nested. Just extracting
>> the Infobox as one big lump is hard due to the need to match nested {{
>> and }}
>>
>> Andrew Dunbar (hippietrail)
>>
>
> Hi,
2009/10/23 Robert Rohde :
> Given the fairly obvious utility for data mining, it might make sense
> for someone to extend the Mediawiki API to generate a list of template
> calls and the parameters sent in each case.
>
We had a discussion about this Tuesday in the tech staff meeting, and
decided th
2009/10/23 Robert Ullmann :
>> I've been spending hours on the parsing now and don't find it simple
>> at all due to the fact that templates can be nested. Just extracting
>> the Infobox as one big lump is hard due to the need to match nested {{
>> and }}
>>
>> Andrew Dunbar (hippietrail)
>
> Hi,
>
> I've been spending hours on the parsing now and don't find it simple
> at all due to the fact that templates can be nested. Just extracting
> the Infobox as one big lump is hard due to the need to match nested {{
> and }}
>
> Andrew Dunbar (hippietrail)
Hi,
Come now, you are over-thinking it. F
I am so glad that someone re-re-resurrects this topic :-)
On Fri, Oct 23, 2009 at 1:27 PM, Andrew Dunbar wrote:
> I've been spending hours on the parsing now and don't find it simple
> at all due to the fact that templates can be nested. Just extracting
> the Infobox as one big lump is hard due
Given the fairly obvious utility for data mining, it might make sense
for someone to extend the Mediawiki API to generate a list of template
calls and the parameters sent in each case.
-Robert Rohde
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia
2009/10/23 Robert Ullmann :
> Hi Hippietrail!
>
> What do you mean by "intractably slow"? Just how fast must it be?
>
> If I do
> http://en.wikipedia.org/w/api.php?action=query&list=embeddedin&eititle=Template:Infobox_Language&eilimit=100&einamespace=0
> it says (on one given try) that it was serve
> I wonder if a project to simply mine the whole article contents and
> provide a DB of some sort with the articles and infobox contents would
> be worthwhile. Develop a specific parser and generate and publish the
> complete set of article-infobox-(key-value) sets...
That is a brilliant idea...
Hi Hippietrail!
What do you mean by "intractably slow"? Just how fast must it be?
If I do
http://en.wikipedia.org/w/api.php?action=query&list=embeddedin&eititle=Template:Infobox_Language&eilimit=100&einamespace=0
it says (on one given try) that it was served in 0,047 seconds. How
long can it take
On Fri, Oct 23, 2009 at 08:37, George Herbert wrote:
> I wonder if a project to simply mine the whole article contents and
> provide a DB of some sort with the articles and infobox contents would
> be worthwhile. Develop a specific parser and generate and publish the
> complete set of article-inf
2009/10/23 Dmitriy Sintsov :
> 2. My extension generates dynamical content. Because of that, I use
> $parser->disableCache() in my tag parser hook. But, the dynamical
> content is being changed only in two cases:
>
> a) The user edits the page. In such case, disableCache() is not
> required, becaus
2009/10/23 Andrew Dunbar :
> But my attempts to find such pages using either the Toolserver's
> Wikipedia database or the Mediawiki API have not been fruitful. In
> particular, SQL queries on the templatelinks table are intractably
> slow. Why are there no keys on tl_from or tl_title?
>
There are:
Hi!
I've made significant cleanup and restructurization of my extension's
code that I'd like to submit to SVN. Before trying to submit my
extension, I'd like to ask two important questions related to Parser and
Article cache.
1. I've implemented my own parser function. The description of functi
21 matches
Mail list logo