On 14/05/15 20:27, Martynas Jusevičius wrote:
Andy,
I took a crack at it:
https://github.com/Graphity/graphity-core/blob/master/src/main/java/org/graphity/core/riot/lang/RDFPostReader.java
https://github.com/Graphity/graphity-core/blob/master/src/main/java/org/graphity/core/riot/lang/TokenizerText.java
TokenizerRDFPost
I'd drop the "extends TokenizerText" or at least write
AbstractTokenizerText with the machinery you want and
"abstract protected Token parseToken"
Throw out all unused code and so it won't accidentally get in the way in
the future.
(If you do this, please contribute it - it would be useful and maybe
should have been done originally if it makes no speed difference.)
It was surely one of the more labor-intensive pieces of code in a while...
That means you are on the right track! When a parser isn't tedious it
is either not helpful or slow :-)
Works with the example from RDF/POST spec, but I need to do more
testing. Probably could be more DRY as well. If you have some advice,
please let me know.
For grammars and tokenizers, comprehensive testing of each pays big
rewards. Theer is not much worse than chasing bugs when the core
machinery is not doping the right thing. Tests pin that down and make
you think of every case that can come up.
For speed, the tokenizer is more likely to be the bottleneck.
PeekReader should do reasonable (for Java) speed I/O for one character
lookahead tokenizing.
Andy