Hi Gian Maria,

OpenGrok <http://opengrok.github.io/OpenGrok/> has a bunch of JFlex-based 
computer language tokenizers for Lucene: 
<https://github.com/OpenGrok/OpenGrok/tree/master/src/org/opensolaris/opengrok/analysis>.
  Not sure how much work it would be to use them in another project, though.

There's a bunch of JFlex grammars listed here, though most (almost all?) are 
not integrated with Lucene: 

  
<http://sourceforge.net/apps/mediawiki/jflex/index.php?title=ExternalJFlexGrammars>

Looks like at least the Jsyntaxpane and RSyntaxTextArea projects have multiple 
programming language lexers.

Steve

On Jun 13, 2013, at 1:40 PM, Gian Maria Ricci <alkamp...@nablasoft.com> wrote:

> Thanks for the suggestions, I’ll try with the WordDelimiterFilterFactory. My 
> aim is not to have a perfect analysis, just a way to quick search for words 
> in the whole history of a codebase. J
>  
> --
> Gian Maria Ricci
> Mobile: +39 320 0136949
>    
>  
>  
> From: Erick Erickson [mailto:erickerick...@gmail.com] 
> Sent: Thursday, June 13, 2013 1:24 PM
> To: solr-user@lucene.apache.org; Gian Maria Ricci
> Subject: Re: analyzer for Code
>  
> Well, WordDelimiterFilterFactory would split on the punctuation, so
> you could add it to the analyzer chain along with StandardAnalyzer.
>  
> You could use one of the regex filters to break up tokens that make it
> through the analyzer as you see fit.
>  
> But in general, this will be a bunch of compromises since programming
> languages are, shall we say, not standard <G>
>  
> Best
> Erick
>  
> 
> On Thu, Jun 13, 2013 at 4:19 AM, Gian Maria Ricci <alkamp...@nablasoft.com> 
> wrote:
> I did a little search around and did not find anything interesting. Anyone 
> know if some analyzers exists to better index source code (es C#, C++. Java 
> etc)?
>  
> Standard analyzer is quite good, but I wish to know if there are some more 
> specific analyzers that can do a better indexing. Es I did a little try with 
> C# and the full class name was indexed without splitting by dots. So 
> MyLib.Helpers.Myclass becomes one token and when I search for MyClass I did 
> not find matches.
>  
> Thanks in advance.
>  
> --
> Gian Maria Ricci
> Mobile: +39 320 0136949
>    
>  
>  

Reply via email to