Hi,
adapting the current implementation to the "extra optional
parameter" was rather trivial; please consider the attached patch,
whose bulk is unit tests and (certainly improvable) documentation.
On Mon, Jun 2, 2014 at 8:07 PM, Alex Rousskov
<[email protected]> wrote:
> On 06/02/2014 11:27 AM, Kinkie wrote:
>>>> + //fixme: account for buf_.size()
>>>> + bool neg = false;
>>>> + const char *s = buf_.rawContent();
>>>> + const char *end = buf_.rawContent() + buf_.length();
>>>
>>> Could you please explain what the above comment means? There is no
>>> SBuf::size() AFAICT.
>>
>> length() is an alias of size(). Changed to size() to avoid confusion
>
> The above code already accounts for .length() or .size(), right? What is
> there to fix? Is it a bug (XXX) or just an future improvement note (TODO)?
>
>
>>>> + * At least one terminating delimiter is required. \0 may be passed
>>>> + * as a delimiter to treat end of buffer content as the end of token.
>>>
>>>> + if (tokenLen == SBuf::npos && !delimiters['\0']) {
>>>
>>> The above is from the actually committed code (not the code posted for
>>> review). Please remove \0 as a special case. As far as I can tell, the
>>> reason it was added is already covered well enough by the existing API.
>>> The addition of this special case is likely to just confuse and break
>>> things. For example, adding \0 to delimiters for the purpose of
>>> disabling this special case will also tell Tokenizer to skip \0
>>> characters in input, which is often wrong.
>>
>> I'll let Amos argue for this as he's the one who has the clearest
>> picture of the use case.
>> Given your objection, my 2c suggestion to address the need is to
>> either have two token() methods, one which requires a separator at the
>> end of a token and one which doesn't, or to have a bool flag passed to
>> the method.
>
> I hope neither is needed, but I will wait for the "use case" discussion.
>
> BTW, if possible, please work on your proposal to avoid these kind of
> rushed commits of knowingly contested code.
>
>
> Thank you,
>
> Alex.
>
--
Francesco
=== modified file 'src/parser/Tokenizer.cc'
--- src/parser/Tokenizer.cc 2014-06-02 17:52:31 +0000
+++ src/parser/Tokenizer.cc 2014-06-02 18:59:23 +0000
@@ -29,12 +29,12 @@
#endif
bool
-Parser::Tokenizer::token(SBuf &returnedToken, const CharacterSet &delimiters)
+Parser::Tokenizer::token(SBuf &returnedToken, const CharacterSet &delimiters, const bool requireComplete)
{
const SBuf savebuf(buf_);
skip(delimiters);
const SBuf::size_type tokenLen = buf_.findFirstOf(delimiters); // not found = npos => consume to end
- if (tokenLen == SBuf::npos && !delimiters['\0']) {
+ if (tokenLen == SBuf::npos && requireComplete) {
// no delimiter found, nor is NUL/EOS/npos acceptible as one
buf_ = savebuf;
return false;
=== modified file 'src/parser/Tokenizer.h'
--- src/parser/Tokenizer.h 2014-06-02 00:15:10 +0000
+++ src/parser/Tokenizer.h 2014-06-02 18:59:24 +0000
@@ -42,12 +42,16 @@
*
* Want to extract delimiters? Use prefix() instead.
*
- * At least one terminating delimiter is required. \0 may be passed
- * as a delimiter to treat end of buffer content as the end of token.
- *
+ * \param returnedToken output parameter, returns the token if tokenization
+ * was successful; untouched otherwise
+ * \param delimiters the delimiters to be found (optionally before) and
+ * after the token.
+ * \param requreComplete if true, requires that a delimiter be found after
+ * the token for a successful parsing. If false or not provided,
+ * end of parse buffer will be considered as an implicit delimiter
* \return false if no terminal delimiter is found.
*/
- bool token(SBuf &returnedToken, const CharacterSet &delimiters);
+ bool token(SBuf &returnedToken, const CharacterSet &delimiters, const bool requireComplete = false);
/** Accumulates all sequential permitted characters up to an optional length limit.
*
=== modified file 'src/parser/testTokenizer.cc'
--- src/parser/testTokenizer.cc 2014-06-02 00:15:10 +0000
+++ src/parser/testTokenizer.cc 2014-06-02 19:28:27 +0000
@@ -82,19 +82,45 @@
void
testTokenizer::testTokenizerToken()
{
- Parser::Tokenizer t(text);
- SBuf s;
-
- // first scenario: patterns match
- CPPUNIT_ASSERT(t.token(s,whitespace));
- CPPUNIT_ASSERT_EQUAL(SBuf("GET"),s);
- CPPUNIT_ASSERT(t.token(s,whitespace));
- CPPUNIT_ASSERT_EQUAL(SBuf("http://resource.com/path"),s);
- CPPUNIT_ASSERT(t.token(s,whitespace));
- CPPUNIT_ASSERT_EQUAL(SBuf("HTTP/1.1"),s);
- CPPUNIT_ASSERT(t.token(s,whitespace));
- CPPUNIT_ASSERT_EQUAL(SBuf("Host:"),s);
-
+ {
+ // match
+ Parser::Tokenizer t(text);
+ SBuf s;
+
+ // first scenario: patterns match
+ CPPUNIT_ASSERT(t.token(s,whitespace));
+ CPPUNIT_ASSERT_EQUAL(SBuf("GET"),s);
+ CPPUNIT_ASSERT(t.token(s,whitespace));
+ CPPUNIT_ASSERT_EQUAL(SBuf("http://resource.com/path"),s);
+ CPPUNIT_ASSERT(t.token(s,whitespace));
+ CPPUNIT_ASSERT_EQUAL(SBuf("HTTP/1.1"),s);
+ CPPUNIT_ASSERT(t.token(s,whitespace));
+ CPPUNIT_ASSERT_EQUAL(SBuf("Host:"),s);
+ }
+ {
+ // complete / incomplete tokens
+ SBuf token;
+ Parser::Tokenizer t(token);
+
+ t.reset(SBuf(" foo "));
+ CPPUNIT_ASSERT(t.token(token,whitespace));
+ CPPUNIT_ASSERT_EQUAL(SBuf("foo"),token);
+ token.clear();
+
+ t.reset(SBuf(" foo "));
+ CPPUNIT_ASSERT(t.token(token,whitespace,true));
+ CPPUNIT_ASSERT_EQUAL(SBuf("foo"),token);
+ token.clear();
+
+ t.reset(SBuf(" foo"));
+ CPPUNIT_ASSERT(t.token(token,whitespace));
+ CPPUNIT_ASSERT_EQUAL(SBuf("foo"),token);
+ token.clear();
+
+ t.reset(SBuf(" foo"));
+ CPPUNIT_ASSERT(!t.token(token,whitespace,true));
+ CPPUNIT_ASSERT_EQUAL(SBuf(""),token);
+ }
}
void