The following describes the code as of rev 4288779e2. This posting 
documents how the new importers work. Feel free to skip it.

I'm not going to show much code in this post, so it would be best if you 
are looking at importers/basescanner.py and importers/perl.py in 
LeoPlugins.leo as you read this.

I hope to convince you that the new code is clearly the simplest thing that 
could possibly work. 

*Executive Overview*

1. The javascript and perl importers simply copy *entire lines* from a text 
file to Leo nodes. This makes the new importers much less error prone than 
the legacy (character-by-character) importers.

2. These importers know *nothing *about parsing javascript or perl.  They 
know *only* about how to scan tokens *accurately.*  Again, this makes the 
new importers more simple and robust than the legacy importers.

3 Importers are simple to write because base classes handle all complex 
details. Importers override just three methods. The scan_line method is the 
most important of these.  It encapsulates *all* language-specific details.  
It is typically about one page of straightforward token-scanning code.

*Overview of the code*

leo/plugins/basescanner.py now contains two new classes: BaseLineScanner 
(BLS) and ScanState classes. The BLS class replaces the horribly complex 
BaseScanner class. 

The ScanState class encapsulate all knowledge relating to scan state, that 
is, tokens.  

Using ScanState methods, the BLS.scan method breaks input lines into 
Blocks, continuous input lines that will end up in separate Leo nodes.  
This is necessarily a complex algorithm.

Happily, importers use the BLS class without knowing *anything* about how 
it works, or even that BLS.scan exists!  Leo's import infrastructure, in 
leoImport.py, calls BLS.run, which calls BLS.scan.

*Writing a new importer*

The perl and javascript importers now consist of nothing but:

1. A subclass of BaseLineScanner class. This subclass will typically 
override only two methods:

- The ctor.  The ctor just sets a few easy-to-understand language-dependent 
options.

- An optional clean_headline method.  If this method exists, it tells how 
to massage headlines.

2. A subclass of the ScanState class that just overrides 
ScanState.scan_line.

ScanState.scan_line updates the net number of curly brackets and parens at 
the *end *of each line.  scan_line must compute these numbers *accurately*, 
taking into account constructs such as multi-line comments, strings and 
regular expressions that might contain curly brackets or parens.

*The Perl importer*

The perl importer consists of the PerlScanState and PerlScanner classes.  
See leo/importers.perl.py

PerlScanner.__init__ just sets a few arguments for the BaseLineScanner 
class.

Here is PerlScanner.clean_headline:

def clean_headline(self, p):
    '''Return a cleaned up headline for p, or None for no change.'''
    m = re.match(r'sub\s+(\w+)', p.h)
    return 'sub ' + m.group(1) if m else None

This replaces "sub name whatever" by "sub name".

The PerlScanState.scan_line is straightforward, but it absolutely must be 
accurate. Take a look at it.



*Importing Python*If we were to convert the Python importer to use the new 
scheme, the entire ScanState class would have to be rewritten.  The reason 
should be clear--Python uses indentation levels to indicate structure, not 
curly brackets.

Happily, rewriting the ScanState class is *all* that would be required.  
The BLS class would remain completely unchanged, and the importer would be 
just as simple as the perl and javascript importers.



*Summary*Writing a new importer is easy because the ScanState and 
BaseLineScanner classes hide the details of a complex algorithm.  Importers 
don't even know that BLS.scan exists. Leo's import machinery in 
leoImport.py calls BLS.scan.

The overridden scan_line method encapsulates *all* language-specific 
knowledge.  This is *token-level* knowledge.  No parsing is ever done 
anywhere.

Overriding the scan_line method is the simplest possible way of providing 
*all* needed language-specific knowledge to the BLS class. The entire 
scheme is the simplest thing that could possibly work.

The perl and javascript importers are just a couple of pages of code each.  
All new importers will be easy to write. A Python importer would use a 
completely rewritten ScanState class.

All comments and questions are welcome.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To post to this group, send email to leo-editor@googlegroups.com.
Visit this group at https://groups.google.com/group/leo-editor.
For more options, visit https://groups.google.com/d/optout.

Reply via email to