Hi Gian Maria, OpenGrok <http://opengrok.github.io/OpenGrok/> has a bunch of JFlex-based computer language tokenizers for Lucene: <https://github.com/OpenGrok/OpenGrok/tree/master/src/org/opensolaris/opengrok/analysis>. Not sure how much work it would be to use them in another project, though.
There's a bunch of JFlex grammars listed here, though most (almost all?) are not integrated with Lucene: <http://sourceforge.net/apps/mediawiki/jflex/index.php?title=ExternalJFlexGrammars> Looks like at least the Jsyntaxpane and RSyntaxTextArea projects have multiple programming language lexers. Steve On Jun 13, 2013, at 1:40 PM, Gian Maria Ricci <alkamp...@nablasoft.com> wrote: > Thanks for the suggestions, I’ll try with the WordDelimiterFilterFactory. My > aim is not to have a perfect analysis, just a way to quick search for words > in the whole history of a codebase. J > > -- > Gian Maria Ricci > Mobile: +39 320 0136949 > > > > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Thursday, June 13, 2013 1:24 PM > To: solr-user@lucene.apache.org; Gian Maria Ricci > Subject: Re: analyzer for Code > > Well, WordDelimiterFilterFactory would split on the punctuation, so > you could add it to the analyzer chain along with StandardAnalyzer. > > You could use one of the regex filters to break up tokens that make it > through the analyzer as you see fit. > > But in general, this will be a bunch of compromises since programming > languages are, shall we say, not standard <G> > > Best > Erick > > > On Thu, Jun 13, 2013 at 4:19 AM, Gian Maria Ricci <alkamp...@nablasoft.com> > wrote: > I did a little search around and did not find anything interesting. Anyone > know if some analyzers exists to better index source code (es C#, C++. Java > etc)? > > Standard analyzer is quite good, but I wish to know if there are some more > specific analyzers that can do a better indexing. Es I did a little try with > C# and the full class name was indexed without splitting by dots. So > MyLib.Helpers.Myclass becomes one token and when I search for MyClass I did > not find matches. > > Thanks in advance. > > -- > Gian Maria Ricci > Mobile: +39 320 0136949 > > >