Hi All, for an applied research work, I am working on extracting links from the Wikipedia corpus.
I've been using in the past the XML streams, but not I was hoping to speed up and handle better the situation by parsing the sql tables. However, I am stuck on this: I could not find a way to filter the relevant links. I can only filter by namespace apparently, while I want to only keep the links that were mentioned in the main text, still namespace 0, but not belonging to the infoboxes and navboxes menu. How could I do that? Is there any information that a link belongs to a menu or to the main content, beyond the namespace? Thanks All for your help, L. _______________________________________________ Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org