[jira] [Comment Edited] (TORQUE-364) RecordMapper very slow on many columns in table

Max Philipp Wriedt (Jira) Sat, 20 Apr 2024 15:24:05 -0700


    [ 
https://issues.apache.org/jira/browse/TORQUE-364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17839298#comment-17839298
 ]


Max Philipp Wriedt edited comment on TORQUE-364 at 4/20/24 10:23 PM:
---------------------------------------------------------------------

To point 3. we finally got a slightly nicer working prototype to share;
{code:java}
### recordMapperBase.vm
---+
// at the top import the new runtime MappingStrategy class
import org.apache.torque.om.mapper.MappingStrategy;
---
---+
// Should this be cached per RecordMapper (Thread safety/Multi query safety?)
private MappingStrategy<${dbObjectClassName}> strategy;

// A sample prepareStrategy implementation, still missing the new HashSet 
optimization!
public MappingStrategy<${dbObjectClassName}> prepareStrategy(Criteria criteria, 
int offset)
{
    MappingStrategy<${dbObjectClassName}> strategy = new 
MappingStrategy<${dbObjectClassName}>();
    if (criteria == null)
    {
        return strategy;
    }

    int nextOuterOffset = offset + 1;
    List<Column> selectColumns = criteria.getSelectColumns();
    List<Column> columnsWithoutOffset = selectColumns.subList(
            offset,
            selectColumns.size());

    for (Column column : columnsWithoutOffset)
    {
        // Create a final local copy for this iteration to be passed into the 
lambdas
        final int nextOffset = nextOuterOffset;
        String columnExpression = column.getSqlExpression();
    #set ( $else = "" )
    #foreach ($columnElement in $torqueGen.getChildren("column"))
        #set ( $setter = $columnElement.getAttribute("setter") )
        #set ( $getter = $columnElement.getAttribute("getter") )
        #set ( $peerColumnName = $columnElement.getAttribute("peerColumnName") )
        #set ( $fieldType = $columnElement.getAttribute("fieldType") )
        ${else}if (${peerColumnName}_EXPRESSION.equals(columnExpression))
        {
            strategy.addColumn(nextOffset, (res, inst) -> 
inst.${setter}(this.${getter}(res, nextOffset)));
        }
        #set ( $else = "else ")
    #end
    nextOuterOffset += 1;
    }
    strategy.finish($torqueGen.getChildren("column").size());
    return strategy;
}
---
---+
    // in processRow()
    // this could theoretically replace BOTh if cases (criteria == null AND != 
null)
    // However criteria == null might still be slightly faster than this!

    // Wherever the MappingStrategy is ultimately stored should probably handle 
invalidation
    // Storing it in Criteria for example would allow invalidating whenever a 
select columns is added/modified
    //  However, for testing purposes we just placed it directly in processRow 
to show viability.
    if (this.strategy == null) {
        this.strategy = prepareStrategy(criteria, offset);
    }
    this.strategy.execute(resultSet, $field);
---


### MappingStrategy.java

package org.apache.torque.om.mapper;

import org.apache.commons.lang3.tuple.Pair;
import org.apache.commons.lang3.function.FailableBiConsumer;
import org.apache.torque.TorqueException;

import java.sql.ResultSet;
import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;

public class MappingStrategy<T> {
    private final List<Pair<Integer, FailableBiConsumer<ResultSet, T, 
TorqueException>>> strategy;
    private boolean allSet;

    public MappingStrategy()
    {
        this.strategy = new ArrayList<>();
        this.allSet = false;
    }

    public void addColumn(int offset, FailableBiConsumer<ResultSet, T, 
TorqueException> setter)
    {
        this.strategy.add(Pair.of(offset, setter));
    }

    public void finish(int num_fields)
    {
        // The list should already be in the correct order because Criteria 
loops through the columns
        // in the same order in which they are added to the SQL statement but 
just in case something weird
        // is being done this gets us closer to the desired contract of 
ResultSet of looping over monotonically
        // increasing indices of columns only.
        this.strategy.sort(Comparator.comparing(Pair::getLeft));
        this.allSet = this.strategy.size() == num_fields;
    }

    public boolean isEmpty()
    {
        return this.strategy.isEmpty();
    }

    public boolean isAllSet() {
        return this.allSet;
    }

    public void execute(ResultSet result, T instance) throws TorqueException
    {
        for (Pair<Integer, FailableBiConsumer<ResultSet, T, TorqueException>> 
strategy : this.strategy)
        {
            strategy.getRight().accept(result, instance);
        }
    }
}
{code}
This code adds a single MappingStrategy<T> class which holds a list of lambdas 
which know how to best query a ResultSet to construct a T from it as well as 
the necessary code for a RecordMapper to utilize this MappingStrategy to speed 
up object construction significantly!
However, the actual strategy object is obviously tied to a given 
Criterias/Queries selected columns and thus can't be kept by the 
RecordMapperXXX objects themselves. (This would lead to exceptions with multple 
threads doing queries and would require extensive cache invalidation checks for 
all processRow invokations to create a new strategy even in single threaded 
execution!)

As such I would like to open the discussion wether or not this implementation 
seems worthwhile and in case it does, how best to store the MappingStrategy 
object so it can be passed to the relevant RecordMapper, is kept up to date 
with the given Criteria/ResultSet passed to its processRow and is invalidated 
correctly.

[~refarb] [~mwriedt] (Sorry for still not having my actual account back up yet)


was (Author: JIRAUSER299058):
To point 3. we finally got a slightly nicer working prototype to share;
{code:java}
### recordMapperBase.vm
---+
// at the top import the new runtime MappingStrategy class
import org.apache.torque.om.mapper.MappingStrategy;
---
---+
// Should this be cached per RecordMapper (Thread safety/Multi query safety?)
private MappingStrategy<${dbObjectClassName}> strategy;

// A sample prepareStrategy implementation, still missing the new HashSet 
optimization!
public MappingStrategy<${dbObjectClassName}> prepareStrategy(Criteria criteria, 
int offset)
{
    MappingStrategy<${dbObjectClassName}> strategy = new 
MappingStrategy<${dbObjectClassName}>();
    if (criteria == null)
    {
        return strategy;
    }

    int nextOuterOffset = offset + 1;
    List<Column> selectColumns = criteria.getSelectColumns();
    List<Column> columnsWithoutOffset = selectColumns.subList(
            offset,
            selectColumns.size());

    for (Column column : columnsWithoutOffset)
    {
        // Create a final local copy for this iteration to be passed into the 
lambdas
        final int nextOffset = nextOuterOffset;
        String columnExpression = column.getSqlExpression();
    #set ( $else = "" )
    #foreach ($columnElement in $torqueGen.getChildren("column"))
        #set ( $setter = $columnElement.getAttribute("setter") )
        #set ( $getter = $columnElement.getAttribute("getter") )
        #set ( $peerColumnName = $columnElement.getAttribute("peerColumnName") )
        #set ( $fieldType = $columnElement.getAttribute("fieldType") )
        ${else}if (${peerColumnName}_EXPRESSION.equals(columnExpression))
        {
            strategy.addColumn(nextOffset, (res, inst) -> 
inst.${setter}(this.${getter}(res, nextOffset)));
        }
        #set ( $else = "else ")
    #end
    nextOuterOffset += 1;
    }
    strategy.finish($torqueGen.getChildren("column").size());
    return strategy;
}
---
---+
    // in processRow()
    // this could theoretically replace BOTh if cases (criteria == null AND != 
null)
    // However criteria == null might still be slightly faster than this!

    // Wherever the MappingStrategy is ultimately stored should probably handle 
invalidation
    // Storing it in Criteria for example would allow invalidating whenever a 
select columns is added/modified
    //  However, for testing purposes we just placed it directly in processRow 
to show viability.
    if (this.strategy == null) {
        this.strategy = prepareStrategy(criteria, offset);
    }
    this.strategy.execute(resultSet, $field);
---


### MappingStrategy.java

package org.apache.torque.om.mapper;

import org.apache.commons.lang3.tuple.Pair;
import org.apache.commons.lang3.function.FailableBiConsumer;
import org.apache.torque.TorqueException;

import java.sql.ResultSet;
import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;

public class MappingStrategy<T> {
    private final List<Pair<Integer, FailableBiConsumer<ResultSet, T, 
TorqueException>>> strategy;
    private boolean allSet;

    public MappingStrategy()
    {
        this.strategy = new ArrayList<>();
        this.allSet = false;
    }

    public void addColumn(int offset, FailableBiConsumer<ResultSet, T, 
TorqueException> setter)
    {
        this.strategy.add(Pair.of(offset, setter));
    }

    public void finish(int num_fields)
    {
        // The list should already be in the correct order because Criteria 
loops through the columns
        // in the same order in which they are added to the SQL statement but 
just in case something weird
        // is being done this gets us closer to the desired contract of 
ResultSet of looping over monotonically
        // increasing indices of columns only.
        this.strategy.sort(Comparator.comparing(Pair::getLeft));
        this.allSet = this.strategy.size() == num_fields;
    }

    public boolean isEmpty()
    {
        return this.strategy.isEmpty();
    }

    public boolean isAllSet() {
        return this.allSet;
    }

    public void execute(ResultSet result, T instance) throws TorqueException
    {
        for (Pair<Integer, FailableBiConsumer<ResultSet, T, TorqueException>> 
strategy : this.strategy)
        {
            strategy.getRight().accept(result, instance);
        }
    }
}
{code}
This code adds a single MappingStrategy<T> class which holds a list of lambdas 
which know how to best query a ResultSet to construct a T from it as well as 
the necessary code for a RecordMapper to utilize this MappingStrategy to speed 
up object construction significantly!
However, the actual strategy object is obviously tied to a given 
Criterias/Queries selected columns and thus can't be kept by the 
RecordMapperXXX objects themselves. (This would lead to exceptions with multple 
threads doing queries and would require extensive cache invalidation checks for 
all processRow invokations to create a new strategy even in single threaded 
execution!)

As such I would like to open the discussion wether or not this implementation 
seems worthwhile and in case it does, how best to store the MappingStrategy 
object so it can be passed to the relevant RecordMapper, is kept up to date 
with the given Criteria/ResultSet passed to its processRow and is invalidated 
correctly.
[~refarb] [~mwriedt] (Sorry for still not having my actual account back up yet)

> RecordMapper very slow on many columns in table
> -----------------------------------------------
>
>                 Key: TORQUE-364
>                 URL: https://issues.apache.org/jira/browse/TORQUE-364
>             Project: Torque
>          Issue Type: Improvement
>          Components: Runtime, Templates
>    Affects Versions: 5.1
>            Reporter: Max Philipp Wriedt
>            Priority: Major
>              Labels: criteria_api, om, performance, recordmapper, templates
>
> When "doSelect()" a large quantity of columns in a table the default 
> RecordMappers generated by Om-Templates (processRow()) cause an  
> !https://wikimedia.org/api/rest_v1/media/math/render/svg/4441d9689c0e6b2c47994e2f587ac5378faeefba!
>  Problem. (technically O(rows * columns))
> Specifically, constantly generating the SQL expression of all possible 
> columns for every row in the result causes excessive use of StringBuilders 
> which slow the mapping process to a crawl.
> I currently have two ideas on how best to tackle this problem:
>  # Either generate all SQL column expressions once when a (template 
> generated) RecordMapper is first created (using final static String fields) 
> thus reducing the cost for every row to generating all selected column SQL 
> expressions  once(instead of every selected column times every available 
> column)
>  # Or (in case the first approach generates unacceptably excessive number of 
> fields for RecordMappers) adjust the RecordMapper API to allow a 
> "prepare(Criteria, int offset)" method to be called once before processing 
> any rows and implement it on generated RecordMappers to scan the Criteria and 
> build two lists: One containing references to the setXXX methods of the 
> mapper in the order they appear in the ResultSet (via the order in the 
> Criteria) and a second list containing the corresponding column offsets. This 
> would allow the processRow method to only iterated over both lists 
> simultaneously and call the referenced methods with the result set and offset 
> immediately. (Alternatively one list using lambdas could be used but I am 
> currently not sure about the stance or impact of these lambdas in the Torque 
> project.)
> credits to [~refarb] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (TORQUE-364) RecordMapper very slow on many columns in table

Reply via email to