The following describes the code as of rev 4288779e2. This posting documents how the new importers work. Feel free to skip it.
I'm not going to show much code in this post, so it would be best if you are looking at importers/basescanner.py and importers/perl.py in LeoPlugins.leo as you read this. I hope to convince you that the new code is clearly the simplest thing that could possibly work. *Executive Overview* 1. The javascript and perl importers simply copy *entire lines* from a text file to Leo nodes. This makes the new importers much less error prone than the legacy (character-by-character) importers. 2. These importers know *nothing *about parsing javascript or perl. They know *only* about how to scan tokens *accurately.* Again, this makes the new importers more simple and robust than the legacy importers. 3 Importers are simple to write because base classes handle all complex details. Importers override just three methods. The scan_line method is the most important of these. It encapsulates *all* language-specific details. It is typically about one page of straightforward token-scanning code. *Overview of the code* leo/plugins/basescanner.py now contains two new classes: BaseLineScanner (BLS) and ScanState classes. The BLS class replaces the horribly complex BaseScanner class. The ScanState class encapsulate all knowledge relating to scan state, that is, tokens. Using ScanState methods, the BLS.scan method breaks input lines into Blocks, continuous input lines that will end up in separate Leo nodes. This is necessarily a complex algorithm. Happily, importers use the BLS class without knowing *anything* about how it works, or even that BLS.scan exists! Leo's import infrastructure, in leoImport.py, calls BLS.run, which calls BLS.scan. *Writing a new importer* The perl and javascript importers now consist of nothing but: 1. A subclass of BaseLineScanner class. This subclass will typically override only two methods: - The ctor. The ctor just sets a few easy-to-understand language-dependent options. - An optional clean_headline method. If this method exists, it tells how to massage headlines. 2. A subclass of the ScanState class that just overrides ScanState.scan_line. ScanState.scan_line updates the net number of curly brackets and parens at the *end *of each line. scan_line must compute these numbers *accurately*, taking into account constructs such as multi-line comments, strings and regular expressions that might contain curly brackets or parens. *The Perl importer* The perl importer consists of the PerlScanState and PerlScanner classes. See leo/importers.perl.py PerlScanner.__init__ just sets a few arguments for the BaseLineScanner class. Here is PerlScanner.clean_headline: def clean_headline(self, p): '''Return a cleaned up headline for p, or None for no change.''' m = re.match(r'sub\s+(\w+)', p.h) return 'sub ' + m.group(1) if m else None This replaces "sub name whatever" by "sub name". The PerlScanState.scan_line is straightforward, but it absolutely must be accurate. Take a look at it. *Importing Python*If we were to convert the Python importer to use the new scheme, the entire ScanState class would have to be rewritten. The reason should be clear--Python uses indentation levels to indicate structure, not curly brackets. Happily, rewriting the ScanState class is *all* that would be required. The BLS class would remain completely unchanged, and the importer would be just as simple as the perl and javascript importers. *Summary*Writing a new importer is easy because the ScanState and BaseLineScanner classes hide the details of a complex algorithm. Importers don't even know that BLS.scan exists. Leo's import machinery in leoImport.py calls BLS.scan. The overridden scan_line method encapsulates *all* language-specific knowledge. This is *token-level* knowledge. No parsing is ever done anywhere. Overriding the scan_line method is the simplest possible way of providing *all* needed language-specific knowledge to the BLS class. The entire scheme is the simplest thing that could possibly work. The perl and javascript importers are just a couple of pages of code each. All new importers will be easy to write. A Python importer would use a completely rewritten ScanState class. All comments and questions are welcome. Edward -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To unsubscribe from this group and stop receiving emails from it, send an email to leo-editor+unsubscr...@googlegroups.com. To post to this group, send email to leo-editor@googlegroups.com. Visit this group at https://groups.google.com/group/leo-editor. For more options, visit https://groups.google.com/d/optout.