(perhaps more appropriate on solr-user@) It sounds like you want to make a MathML filter? Check out the analyzer packages...
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters simple example: https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/analysis/LengthFilterFactory.java ryan 2010/4/14 <m...@gjgt.sk>: > Hello everybody, > > I'm new to all this so I hope this isn't too noob a question and that it > isn't very inappropriate here. > > I'm currently working on a indexing/searching application based on Apache > Lucene core, that can process mathematical formulae in MathML format > (which is extension to XML) and store it in the index for searching. No > troubles here, since I'm making everything above Lucene. > > But I started to think it would be nice to write this mathematical > extension so it could be incorporated into Solr as easy as possible in the > future. The thing is I looked into Solr's sources and I'm all confused to > be honest and don't know which way to do this. > > Basic workflow of the whole math processing would be: > Check the input document for any math->if found, mathematical unit needs > to process it and produce many string-represented formulae with different > boosts->put these into index not tokenized furthermore. > > That's about it. > Any ideas? Any help will be appreciated. > > Thank you > > Martin > >