Re: [Wiki-research-l] Wikipedia mathematical search engine

2012-04-04 Thread Dirk Riehle

Hi Daniel,

if you want precise extraction, try the Sweble parser, http://sweble.org

Cheers,
Dirk

On 03.04.2012 00:15, Daniel Mietchen wrote:

Hi Jozef,

I just played around a bit and liked what I saw, though I didn't see
much, as the site was very slow.

How did you strip the dump of the non-mathematical articles? I am
asking because one of the major uses that I have in mind for a good
mathematical search engine would be to identify areas around topic A
(say, theoretical biology) that use the same concepts as those in
topic B (say, economics). Very often such distant fields are only
weakly connected, but solutions or approaches that work in one of them
are not infrequently transferable. In order to be useful for such
purposes, your corpus would still have to contain the economics/
theoretical biology articles (at least those that use equations), but
I couldn't find evidence for that.

Daniel

On Mon, Apr 2, 2012 at 2:21 PM, Jozef Misutkamisu...@ksi.mff.cuni.cz  wrote:

Hi,

I want to introduce a *mathematical* search engine working over English
Wikipedia dump. The key advantage is simple - *it works* ;).
Better than a nice speech is a real demo which can be found here:
http://egomath.projekty.ms.mff.cuni.cz

If you are somehow interested or just want to share your thoughts do not
hesitate to contact me.

Best regards,
Jozef Misutka
__
Charles University in Prague,
Department of Software Engineering,
www: http://www.ksi.mff.cuni.cz/cs/~misutka


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


--
Website: http://dirkriehle.com - Twitter: @dirkriehle
Ph (DE): +49-157-8153-4150 - Ph (US): +1-650-450-8550


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Wikipedia mathematical search engine

2012-04-03 Thread Jozef Misutka
Hi Daniel,


On Tue, Apr 3, 2012 at 12:15 AM, Daniel Mietchen 
daniel.mietc...@googlemail.com wrote:

 Hi Jozef,

 I just played around a bit and liked what I saw, though I didn't see
 much, as the site was very slow.


it was a HW failure (RAID5 I think...). Anyway, it was fixed several hours
ago.




 How did you strip the dump of the non-mathematical articles?


Very simply: a mathematical article is an article which contains
lt;/math inside.

I do not claim to have a perfect Wikipedia tag parser but the vast majority
of the formulae in Wikipedia are typeset using standard Wikipedia rules and
are simply inside text which is fine.


I am
 asking because one of the major uses that I have in mind for a good
 mathematical search engine would be to identify areas around topic A
 (say, theoretical biology) that use the same concepts as those in
 topic B (say, economics). Very often such distant fields are only
 weakly connected, but solutions or approaches that work in one of them
 are not infrequently transferable.


That is exactly one of the interesting applications for a mathematical
search engine.

I wanted to reply to you with something interesting, so I called my friend
asking him about interesting formulae from economy. He told me about
Vasicek model, so I tried to search for the formula
dr_t = a(b-r_t) dt + \sigma dW_t
which resulted in 2 hits at no abstraction level - no big deal. But then I
tried to abstract it and another hit came which is imho interesting
(different variables used but the same formula).

Vasicek 
modelhttp://egomath.projekty.ms.mff.cuni.cz/index.php?q=math=dr_t+%3D+a%28b-r_t%29+dt+%2B+%5Csigma+dW_thide_snippets=0
Vasicek model and
similarhttp://egomath.projekty.ms.mff.cuni.cz/index.php?q=math=dr_t+%3D+a%28b-r_t%29+dt+%2B+%5Csigma+dW_tlevel=9hide_snippets=0



 In order to be useful for such
 purposes, your corpus would still have to contain the economics/
 theoretical biology articles (at least those that use equations), but
 I couldn't find evidence for that.


See the number of documents (and categories) when you search for simple
text e.g.,
economy
http://egomath.projekty.ms.mff.cuni.cz/index.php?math=q=economy
biology
http://egomath.projekty.ms.mff.cuni.cz/index.php?math=q=biology

Jozef



 Daniel

 On Mon, Apr 2, 2012 at 2:21 PM, Jozef Misutka misu...@ksi.mff.cuni.cz
 wrote:
  Hi,
 
  I want to introduce a *mathematical* search engine working over English
  Wikipedia dump. The key advantage is simple - *it works* ;).
  Better than a nice speech is a real demo which can be found here:
  http://egomath.projekty.ms.mff.cuni.cz
 
  If you are somehow interested or just want to share your thoughts do not
  hesitate to contact me.
 
  Best regards,
  Jozef Misutka
  __
  Charles University in Prague,
  Department of Software Engineering,
  www: http://www.ksi.mff.cuni.cz/cs/~misutka
 
 
  ___
  Wiki-research-l mailing list
  Wiki-research-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Wikipedia mathematical search engine

2012-04-03 Thread Daniel Mietchen
Dear Jozef,

a good example - abstractions like that from
dr_t = a(b-r_t) dt + \sigma dW_t
in
http://en.wikipedia.org/wiki/Vasicek_model
to
dx_t = \theta (\mu-x_t)\,dt + \sigma\, dW_t
in
http://en.wikipedia.org/wiki/Ornstein%E2%80%93Uhlenbeck_process
are indeed very useful and an invitation to play.

Thank you!


Daniel

On Wed, Apr 4, 2012 at 3:03 AM, Jozef Misutka misu...@ksi.mff.cuni.cz wrote:
 Hi Daniel,


 On Tue, Apr 3, 2012 at 12:15 AM, Daniel Mietchen
 daniel.mietc...@googlemail.com wrote:

 Hi Jozef,

 I just played around a bit and liked what I saw, though I didn't see
 much, as the site was very slow.


 it was a HW failure (RAID5 I think...). Anyway, it was fixed several hours
 ago.




 How did you strip the dump of the non-mathematical articles?


 Very simply: a mathematical article is an article which contains lt;/math
 inside.

 I do not claim to have a perfect Wikipedia tag parser but the vast majority
 of the formulae in Wikipedia are typeset using standard Wikipedia rules and
 are simply inside text which is fine.


 I am
 asking because one of the major uses that I have in mind for a good
 mathematical search engine would be to identify areas around topic A
 (say, theoretical biology) that use the same concepts as those in
 topic B (say, economics). Very often such distant fields are only
 weakly connected, but solutions or approaches that work in one of them
 are not infrequently transferable.


 That is exactly one of the interesting applications for a mathematical
 search engine.

 I wanted to reply to you with something interesting, so I called my friend
 asking him about interesting formulae from economy. He told me about Vasicek
 model, so I tried to search for the formula
 dr_t = a(b-r_t) dt + \sigma dW_t
 which resulted in 2 hits at no abstraction level - no big deal. But then I
 tried to abstract it and another hit came which is imho interesting
 (different variables used but the same formula).

 Vasicek model
 Vasicek model and similar



 In order to be useful for such
 purposes, your corpus would still have to contain the economics/
 theoretical biology articles (at least those that use equations), but
 I couldn't find evidence for that.


 See the number of documents (and categories) when you search for simple text
 e.g.,
 economy
 http://egomath.projekty.ms.mff.cuni.cz/index.php?math=q=economy
 biology
 http://egomath.projekty.ms.mff.cuni.cz/index.php?math=q=biology

 Jozef



 Daniel

 On Mon, Apr 2, 2012 at 2:21 PM, Jozef Misutka misu...@ksi.mff.cuni.cz
 wrote:
  Hi,
 
  I want to introduce a *mathematical* search engine working over English
  Wikipedia dump. The key advantage is simple - *it works* ;).
  Better than a nice speech is a real demo which can be found here:
  http://egomath.projekty.ms.mff.cuni.cz
 
  If you are somehow interested or just want to share your thoughts do not
  hesitate to contact me.
 
  Best regards,
  Jozef Misutka
  __
  Charles University in Prague,
  Department of Software Engineering,
  www: http://www.ksi.mff.cuni.cz/cs/~misutka
 
 
  ___
  Wiki-research-l mailing list
  Wiki-research-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Wikipedia mathematical search engine

2012-04-02 Thread Daniel Mietchen
Hi Jozef,

I just played around a bit and liked what I saw, though I didn't see
much, as the site was very slow.

How did you strip the dump of the non-mathematical articles? I am
asking because one of the major uses that I have in mind for a good
mathematical search engine would be to identify areas around topic A
(say, theoretical biology) that use the same concepts as those in
topic B (say, economics). Very often such distant fields are only
weakly connected, but solutions or approaches that work in one of them
are not infrequently transferable. In order to be useful for such
purposes, your corpus would still have to contain the economics/
theoretical biology articles (at least those that use equations), but
I couldn't find evidence for that.

Daniel

On Mon, Apr 2, 2012 at 2:21 PM, Jozef Misutka misu...@ksi.mff.cuni.cz wrote:
 Hi,

 I want to introduce a *mathematical* search engine working over English
 Wikipedia dump. The key advantage is simple - *it works* ;).
 Better than a nice speech is a real demo which can be found here:
 http://egomath.projekty.ms.mff.cuni.cz

 If you are somehow interested or just want to share your thoughts do not
 hesitate to contact me.

 Best regards,
 Jozef Misutka
 __
 Charles University in Prague,
 Department of Software Engineering,
 www: http://www.ksi.mff.cuni.cz/cs/~misutka


 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l