Re: [whatwg] Provding Better Tools

J. King Sun, 03 Dec 2006 19:12:31 -0800

On Sun, 03 Dec 2006 20:10:34 -0500, Michel Fortin<[EMAIL PROTECTED]> wrote:

My experience optimizing PHP Markdown, and building the custom mixedMarkdown/HTML-block pesudo-tokenizer of PHP Markdown Extra, tells methat it'll probably stay very slow as long as the implementation is madeof PHP code.

Yeah, it is. I'm not much of a programmer, but I thought the algorithmtoo useful not to try and implement.

Assuming you've implemented the algorithm in the spec as PHP code, youcould probably make it faster by using regular expressions in thetokenization steps instead of iterating character by character. Forinstance, you could implement many of the tokenizer states by matchingfrom the start of a string with a regex. And maybe then it'll also bepossible to combine a couple of states within the same regex too.

This is precisely what I've done. Before I did said optimization, theparser would crash more often than not on a document larger than a fewkilobytes on my machine.

The more we replace PHP code by regular expressions, the faster it'llgo, but further we deviate from the processing algorithm described inthe spec. I wonder how far we could go while keeping the exact samebehaviour.

My pattern optimization is pretty simple: when switching states the parserfirst tries matching whatever range of characters will keep the machine inthe same state, and then acts as normal on the first character thatdoesn't match. There is, effectively, next to no deviation from the specshort of emitting one char token per unbroken string rather than one tokenper character. Since the tokens are merged into one text node in the treebuilder anyway, the deviation is essentially nil.

The true good solution would be to have a parser implemented in C andavailable through every standard installation of PHP. It could be usedby other languages too.

I am keeping my fingers crossed, hoping that someone much moreknowledgable than I will do this. :)


--
J. King
http://jking.dark-phantasy.com/

Re: [whatwg] Provding Better Tools

Reply via email to