Re: [Wiki-research-l] Wikipedia mathematical search engine
Hi Daniel, if you want precise extraction, try the Sweble parser, http://sweble.org Cheers, Dirk On 03.04.2012 00:15, Daniel Mietchen wrote: Hi Jozef, I just played around a bit and liked what I saw, though I didn't see much, as the site was very slow. How did you strip the dump of the non-mathematical articles? I am asking because one of the major uses that I have in mind for a good mathematical search engine would be to identify areas around topic A (say, theoretical biology) that use the same concepts as those in topic B (say, economics). Very often such distant fields are only weakly connected, but solutions or approaches that work in one of them are not infrequently transferable. In order to be useful for such purposes, your corpus would still have to contain the economics/ theoretical biology articles (at least those that use equations), but I couldn't find evidence for that. Daniel On Mon, Apr 2, 2012 at 2:21 PM, Jozef Misutkamisu...@ksi.mff.cuni.cz wrote: Hi, I want to introduce a *mathematical* search engine working over English Wikipedia dump. The key advantage is simple - *it works* ;). Better than a nice speech is a real demo which can be found here: http://egomath.projekty.ms.mff.cuni.cz If you are somehow interested or just want to share your thoughts do not hesitate to contact me. Best regards, Jozef Misutka __ Charles University in Prague, Department of Software Engineering, www: http://www.ksi.mff.cuni.cz/cs/~misutka ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- Website: http://dirkriehle.com - Twitter: @dirkriehle Ph (DE): +49-157-8153-4150 - Ph (US): +1-650-450-8550 ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Wikipedia mathematical search engine
Hi Daniel, On Tue, Apr 3, 2012 at 12:15 AM, Daniel Mietchen daniel.mietc...@googlemail.com wrote: Hi Jozef, I just played around a bit and liked what I saw, though I didn't see much, as the site was very slow. it was a HW failure (RAID5 I think...). Anyway, it was fixed several hours ago. How did you strip the dump of the non-mathematical articles? Very simply: a mathematical article is an article which contains lt;/math inside. I do not claim to have a perfect Wikipedia tag parser but the vast majority of the formulae in Wikipedia are typeset using standard Wikipedia rules and are simply inside text which is fine. I am asking because one of the major uses that I have in mind for a good mathematical search engine would be to identify areas around topic A (say, theoretical biology) that use the same concepts as those in topic B (say, economics). Very often such distant fields are only weakly connected, but solutions or approaches that work in one of them are not infrequently transferable. That is exactly one of the interesting applications for a mathematical search engine. I wanted to reply to you with something interesting, so I called my friend asking him about interesting formulae from economy. He told me about Vasicek model, so I tried to search for the formula dr_t = a(b-r_t) dt + \sigma dW_t which resulted in 2 hits at no abstraction level - no big deal. But then I tried to abstract it and another hit came which is imho interesting (different variables used but the same formula). Vasicek modelhttp://egomath.projekty.ms.mff.cuni.cz/index.php?q=math=dr_t+%3D+a%28b-r_t%29+dt+%2B+%5Csigma+dW_thide_snippets=0 Vasicek model and similarhttp://egomath.projekty.ms.mff.cuni.cz/index.php?q=math=dr_t+%3D+a%28b-r_t%29+dt+%2B+%5Csigma+dW_tlevel=9hide_snippets=0 In order to be useful for such purposes, your corpus would still have to contain the economics/ theoretical biology articles (at least those that use equations), but I couldn't find evidence for that. See the number of documents (and categories) when you search for simple text e.g., economy http://egomath.projekty.ms.mff.cuni.cz/index.php?math=q=economy biology http://egomath.projekty.ms.mff.cuni.cz/index.php?math=q=biology Jozef Daniel On Mon, Apr 2, 2012 at 2:21 PM, Jozef Misutka misu...@ksi.mff.cuni.cz wrote: Hi, I want to introduce a *mathematical* search engine working over English Wikipedia dump. The key advantage is simple - *it works* ;). Better than a nice speech is a real demo which can be found here: http://egomath.projekty.ms.mff.cuni.cz If you are somehow interested or just want to share your thoughts do not hesitate to contact me. Best regards, Jozef Misutka __ Charles University in Prague, Department of Software Engineering, www: http://www.ksi.mff.cuni.cz/cs/~misutka ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Wikipedia mathematical search engine
Dear Jozef, a good example - abstractions like that from dr_t = a(b-r_t) dt + \sigma dW_t in http://en.wikipedia.org/wiki/Vasicek_model to dx_t = \theta (\mu-x_t)\,dt + \sigma\, dW_t in http://en.wikipedia.org/wiki/Ornstein%E2%80%93Uhlenbeck_process are indeed very useful and an invitation to play. Thank you! Daniel On Wed, Apr 4, 2012 at 3:03 AM, Jozef Misutka misu...@ksi.mff.cuni.cz wrote: Hi Daniel, On Tue, Apr 3, 2012 at 12:15 AM, Daniel Mietchen daniel.mietc...@googlemail.com wrote: Hi Jozef, I just played around a bit and liked what I saw, though I didn't see much, as the site was very slow. it was a HW failure (RAID5 I think...). Anyway, it was fixed several hours ago. How did you strip the dump of the non-mathematical articles? Very simply: a mathematical article is an article which contains lt;/math inside. I do not claim to have a perfect Wikipedia tag parser but the vast majority of the formulae in Wikipedia are typeset using standard Wikipedia rules and are simply inside text which is fine. I am asking because one of the major uses that I have in mind for a good mathematical search engine would be to identify areas around topic A (say, theoretical biology) that use the same concepts as those in topic B (say, economics). Very often such distant fields are only weakly connected, but solutions or approaches that work in one of them are not infrequently transferable. That is exactly one of the interesting applications for a mathematical search engine. I wanted to reply to you with something interesting, so I called my friend asking him about interesting formulae from economy. He told me about Vasicek model, so I tried to search for the formula dr_t = a(b-r_t) dt + \sigma dW_t which resulted in 2 hits at no abstraction level - no big deal. But then I tried to abstract it and another hit came which is imho interesting (different variables used but the same formula). Vasicek model Vasicek model and similar In order to be useful for such purposes, your corpus would still have to contain the economics/ theoretical biology articles (at least those that use equations), but I couldn't find evidence for that. See the number of documents (and categories) when you search for simple text e.g., economy http://egomath.projekty.ms.mff.cuni.cz/index.php?math=q=economy biology http://egomath.projekty.ms.mff.cuni.cz/index.php?math=q=biology Jozef Daniel On Mon, Apr 2, 2012 at 2:21 PM, Jozef Misutka misu...@ksi.mff.cuni.cz wrote: Hi, I want to introduce a *mathematical* search engine working over English Wikipedia dump. The key advantage is simple - *it works* ;). Better than a nice speech is a real demo which can be found here: http://egomath.projekty.ms.mff.cuni.cz If you are somehow interested or just want to share your thoughts do not hesitate to contact me. Best regards, Jozef Misutka __ Charles University in Prague, Department of Software Engineering, www: http://www.ksi.mff.cuni.cz/cs/~misutka ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Wikipedia mathematical search engine
Hi Jozef, I just played around a bit and liked what I saw, though I didn't see much, as the site was very slow. How did you strip the dump of the non-mathematical articles? I am asking because one of the major uses that I have in mind for a good mathematical search engine would be to identify areas around topic A (say, theoretical biology) that use the same concepts as those in topic B (say, economics). Very often such distant fields are only weakly connected, but solutions or approaches that work in one of them are not infrequently transferable. In order to be useful for such purposes, your corpus would still have to contain the economics/ theoretical biology articles (at least those that use equations), but I couldn't find evidence for that. Daniel On Mon, Apr 2, 2012 at 2:21 PM, Jozef Misutka misu...@ksi.mff.cuni.cz wrote: Hi, I want to introduce a *mathematical* search engine working over English Wikipedia dump. The key advantage is simple - *it works* ;). Better than a nice speech is a real demo which can be found here: http://egomath.projekty.ms.mff.cuni.cz If you are somehow interested or just want to share your thoughts do not hesitate to contact me. Best regards, Jozef Misutka __ Charles University in Prague, Department of Software Engineering, www: http://www.ksi.mff.cuni.cz/cs/~misutka ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l