Re: Extracting tokens from an expression and matching an object against that expression without parsing twice

Andrus Adamchik Mon, 01 Dec 2014 03:24:32 -0800

To be clear, 'expression.match(..)' is a perfectly valid use of Cayenne. 
Anyways, good to hear that you found a solution.


Andrus
 

> On Dec 1, 2014, at 1:10 PM, Davide Vecchi <[email protected]> wrote:
> 
> Just for the record, I solved my problem by keeping the first step of 
> creating the tokens, and in the second step (matching objects against that 
> expression) I use the existing tokens to do the match myself if the 
> expression is simple enough (basically if it doesn't contain parentheses 
> grouping) otherwise I still call Expression.fromString which will recreate 
> the tokens.
> 
> The execution time of the test suites in the Jenkins build went from ~45' 
> back to the usual ~20', so in this specific case avoiding the double parsing 
> was a substantial optimization.
> 
>> Yeah, I still don't understand why would the code care to poke inside the 
>> parser and deal directly with tokens.
> 
> I had not explained that.
> As I said, the design of the application I'm modifying was already based on 
> tokens and I was not supposed to redesign the application; I was just asked 
> to improve the parsing, which in my opinion makes a lot of sense, whether 
> this is right or wrong from one's perspective.
> 
> I don't think it should necessarily be considered wrong that an application 
> that parses expressions also wants to f.ex. show or store the resulting 
> tokens, especially without knowing the purpose of the application.
> 
> However I accept that although the token-related methods I'm calling are 
> public (ExpressionParser.getToken(int) and ExpressionParser.getNextToken() ) 
> they were not intended to be called by an application and I probably didn't 
> read the Cayenne doc well enough so I didn't realize that soon enough. Next 
> time I need just parsing and matching I will not use Cayenne which I realized 
> is intended for a much wider purpose than that. But in this case I will keep 
> the Cayenne-based solution because it's doing the job very well.
> 
> 
> 
> -----Original Message-----
> From: Andrus Adamchik [mailto:[email protected]] 
> Sent: Monday, November 17, 2014 13:13
> To: [email protected]
> Subject: Re: Extracting tokens from an expression and matching an object 
> against that expression without parsing twice
> 
>> It's not easy to explain properly why I need the tokens; the general reason 
>> is that the preexisting application, written long ago by several other 
>> persons, is designed to use them, and changing its design would be too big 
>> an undertaking.
> 
> Yeah, I still don't understand why would the code care to poke inside the 
> parser and deal directly with tokens.
> 
>> I will see if I can use Andrus' pointers to extract the tokens from the 
>> Expression instance.
> 
> I am afraid you won't find any *tokens* in an Expression instance. Expression 
> is just a tree of objects that can be used to evaluate stuff. If you need it 
> to match something, you can. But a parsed expression is devoid of any links 
> to the original lexical structure. 
> 
> Andrus
> 
> 
> 
>> On Nov 17, 2014, at 11:46 AM, Davide Vecchi <[email protected]> wrote:
>> 
>> Thanks for your inputs.
>> 
>> I'm probably showing my technological age here, but I certainly admit that I 
>> have this tendency to avoid repeating complex operations as a matter of 
>> principle when it's known in advance that the second process will produce 
>> exactly the same result as the first one. When I catch myself doing that I 
>> always feel that my design is not OK.
>> 
>> However in this case I am quite sure I need to get rid of the double 
>> parsing, although I did not demonstrate in a particularly strict way that 
>> that's the cause of the slowdown. It's more like a qualified (in my opinion) 
>> guess, reinforced by the fact that method Expression.fromString(String) has 
>> a TODO saying "TODO: cache expression strings, since this operation is 
>> pretty slow" (I'm using version 3.0.2). So it looks like the Cayenne coders 
>> too had reasons to worry to some extent about optimization in this area.
>> 
>> I just used JVisualVM to profile the execution and two of the methods where 
>> by far most of the time is spent are Expression.fromString(String) and 
>> ExpressionParser.getNextToken() . Since I have to cut down the processing 
>> time I do have to focus on them first.
>> 
>> The situation here is that I modified a preexisting application which was 
>> doing some basic parsing, and after creating the tokens from the parsing it 
>> was using them to match the expression against objects. That parsing is 
>> basic in that it can only parse simple expressions, f.ex. it doesn't support 
>> parentheses grouping.
>> 
>> My changes consisted of removing that parsing code from the application and 
>> replacing it with calls to Cayenne, because we need real parsing. Of course 
>> the parsing done by Cayenne is way more powerful and that might be the real 
>> and fair reason why it takes longer, but even if this is the case it's 
>> important for me not to do that parsing twice.
>> 
>> It's not easy to explain properly why I need the tokens; the general reason 
>> is that the preexisting application, written long ago by several other 
>> persons, is designed to use them, and changing its design would be too big 
>> an undertaking. Since all that needs to be improved is the parsing and 
>> matching I thought I'd just use a powerful tool to replace only those parts.
>> 
>> I will see if I can use Andrus' pointers to extract the tokens from the 
>> Expression instance.
>> 
>> 
>> 
>> -----Original Message-----
>> From: Andrus Adamchik [mailto:[email protected]]
>> Sent: Sunday, November 16, 2014 14:57
>> To: [email protected]
>> Subject: Re: Extracting tokens from an expression and matching an 
>> object against that expression without parsing twice
>> 
>> I second John's assessment. 
>> 
>> BTW, what are the tokens for? Do you actually need to have access to the 
>> lexical structure of the String? As of course parsed Expression object is a 
>> tree itself and gives you access to its own structure either directly 
>> ('getOperand(int)') or via 'traverse' and 'transform' methods.
>> 
>> Andrus
>> 
>>> On Nov 14, 2014, at 9:54 PM, John Huss <[email protected]> wrote:
>>> 
>>> This looks like a serious micro optimization.  Is the performance for 
>>> this really that critical?  Have you demonstrated that this is your 
>>> application's crucial hot spot?
>>> 
>>> On Fri, Nov 14, 2014 at 7:35 AM, Davide Vecchi <[email protected]> wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I have an expression in a string, and I use Cayenne to parse the 
>>>> expression into tokens, which are needed for a specific purpose.
>>>> 
>>>> However in addition to having the tokens I also need to evaluate an 
>>>> object against that expression, to see if that object matches the 
>>>> expression.
>>>> 
>>>> My problem is that the way I'm doing it causes the parsing to be 
>>>> done twice on the same expression, and I would like to avoid to 
>>>> parse the same expression twice.
>>>> 
>>>> The token creation I'm doing it like this:
>>>> 
>>>> -----------------------------------
>>>> String where = "myField=0";
>>>> 
>>>> Reader reader = new StringReader(where);
>>>> 
>>>> ExpressionParser parser = new ExpressionParser(reader);
>>>> 
>>>> List<Token> tokens = new ArrayList<>();
>>>> 
>>>> Token token = parser.getNextToken();
>>>> 
>>>> while (token != null) {
>>>> 
>>>>   tokens.add(token);
>>>> 
>>>>   token = parser.getNextToken();
>>>> }
>>>> -----------------------------------
>>>> 
>>>> The object matching I'm doing it like this:
>>>> 
>>>> -----------------------------------
>>>> String where = "myField=0";
>>>> 
>>>> Expression expression = Expression.fromString(where);
>>>> 
>>>> boolean matches = expression.match(object);
>>>> -----------------------------------
>>>> 
>>>> The call to Expression.fromString made in the object matching 
>>>> operation performs a parsing, but the parsing of the same expression 
>>>> had already been done in the token creation operation.
>>>> 
>>>> Is there a way to redesign this process in order to get the tokens 
>>>> and also match an object against the expression without parsing the 
>>>> same expression twice ?
>>>> 
>>>> For example, I believe that the call to Expression.fromString must 
>>>> have created the tokens, because it has parsed the string. So I 
>>>> thought I could reverse the order and do the object matching first, 
>>>> keep the Expression instance created in that process and use it to 
>>>> extract the tokens. But I can't see how to extract the tokens from 
>>>> an Expression instance instead of from an ExpressionParser instance as I'm 
>>>> currently doing.
>>>> 
>>>> Or another possibility could be that I keep creating the tokens 
>>>> first, and then I match my object against them, instead of against 
>>>> the string expression that generated those tokens. But I can't see 
>>>> how to match an object against tokens.
>>>> 
>>>> So I'm looking for some ideas.
>>>> 
>>>> Thanks in advance.
>>>> 
>>>> Davide Vecchi
>>>> 
>> 
>> 
> 
>

Re: Extracting tokens from an expression and matching an object against that expression without parsing twice

Reply via email to