I wish to define custom aggregate functions in Tcl, but this capability is currently not exposed through the Tcl interface. Thus I am thinking about how best to add it. Here's a first crack at a design proposal:
Extend the [db function] command to accept an -aggregate switch that makes the new function be an aggregate function. Otherwise, the function defaults to being a scalar function. When an aggregate Tcl function is called from SQL, it is invoked in two phases: step and final. The step phase receives the input data, and the final phase produces the result. During step, the function is invoked once for each row (in the group). For the first row (in the group), the first argument to the function will be empty string, and subsequent arguments are the SQL values from the row being processed. For each additional row (in the group), the first argument is the Tcl value returned by the prior invocation, and subsequent arguments are as above. During final, the function is invoked one last time (at the end of each group). Its sole argument is the return value of the last step invocation, or empty string if there were no rows. The return value of the Tcl function is used as the return value of the SQL function. If there were no rows and GROUP BY was used, the function is not invoked at all, because there were no groups. The Tcl function can tell whether it is in the step or final phase by how many arguments it receives. If it receives multiple, it is in step. If it receives only one, it is in final. Depending on how it is written, the Tcl function may be able to tell if it's being called for the first time (within the group) by checking if its first argument is empty string. If non-empty, it definitely has been called before (within the group). If empty, it is most likely on the first row and, if in step, may need to initialize. However, if the aggregate function wishes to disregard some of its input data, it may choose to return empty string to discard any state data arising from the current and prior rows (in the group). This will result in empty string being the first argument to the next invocation. This mode of operation is ill-advised but not illegal, though maybe some valid use cases may exist, provided that ORDER BY is being used. It is an error to pass an aggregate Tcl function zero arguments. Here's an example that will return the entire Fossil global_config table as a key/value dict: db function list -deterministic -aggregate { apply {{state args} { concat $state $args } } db onecolumn { SELECT list(name, value) FROM global_config ORDER BY name } As for implementation, I think the existing code can be leveraged to a great extent. Adapt the existing tclSqlFunc() to be the backend to new tclSqlFuncScalar(), tclSqlFuncStep(), and tclSqlFuncFinal() functions, and adjust the DB_FUNCTION code to recognize -aggregate and pass different function pointers accordingly. Use sqlite3_aggregate_context() to hold the Tcl_Obj * resulting from each invocation, then let it be the first argument to the next invocation. I would be happy to implement this myself, since this is functionality I will be needing soon. Once I'm satisfied with it and have test suite updates, what's the recommended method for sharing my patch? Question: does xFinal() get called if an error occurs during (or between) calling xStep()? Are errors even possible? I'm curious if there's any way to leak the Tcl_Obj pointed to by the aggregate context. _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users