The urweb-mode for Emacs has some very slow syntax highlighting, to the point of being a real hindrance to the development of non-trivial projects. I know exactly which code is to blame, but the harder question is how the same goal may be accomplished more efficiently. In the past, I've sent pleas to this list, asking for help on the issue, with no response. I'm going to try again, and this time I'm able to give more information on the problem.

The issue comes only from detecting text that is literal XML CDATA; that is, normal text that, in the case of HTML, should be passed on directly to the user. I built urweb-mode by modifying sml-mode. I presume sml-mode is doing syntax highlighting in a standard way, but, in any case, it's based on regular expressions identifying spans of text that should have particular Emacs font faces associated with them.

The crux of the problem, then, is that, in Ur/Web, being XML CDATA is a context-free property, but not a regular property (in the sense of regular languages and regular expressions). An XML sequence appears within <xml>...</xml> brackets, and within there may be "antiquoted" Ur sequences appearing within {...} brackets, within which there may be further XML, and so on, up to unbounded depth.

My current urweb-mode code uses a regular expression to identify maximal segments of text that could possibly be CDATA. Then, a custom Elisp function is called to search backward from that point, counting open and close brackets to figure out whether we are in XML. This search process may proceed arbitrarily far back in the buffer, and the process is repeated for each sequence of CDATA between tags/antiquotes. That can be a lot of different calls to this not-particularly-efficient recursive function, with no reuse of results!

I've tried to bumble my way through Emacs mode authorship without sitting down to learn Elisp properly, and I'm hoping to stay on that path! Would any Emacs wizards do us the favor of reworking this part of the code to improve the efficiency? For instance, it wouldn't surprise me if there is an easy way to examine formatting already set on some text segments to speed up the decision for later segments.

All the relevant source code is in urweb/src/elisp/urweb-mode.el. Function 'urweb-in-xml' is where I hypothesize most time is spent. It's called from one of the actions in 'urweb-font-lock-keywords'.

Thanks in advance to anyone who can help fix this long-standing problem!

_______________________________________________
Ur mailing list
[email protected]
http://www.impredicative.com/cgi-bin/mailman/listinfo/ur

Reply via email to