Hi All,

for an applied research work, I am working on extracting links from the
Wikipedia corpus.

I've been using in the past the XML streams, but not I was hoping to speed
up and handle better the situation by parsing the sql tables.

However, I am stuck on this:

I could not find a way to filter the relevant links.

I can only filter by namespace apparently, while I want to only keep the
links that were mentioned in the main text, still namespace 0, but not
belonging to the infoboxes and navboxes menu.

How could I do that?
Is there any information that a link belongs to a menu or to the main
content, beyond the namespace?

Thanks All for your help,
L.
_______________________________________________
Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org
To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org

Reply via email to