Hi folks, I was thinking how we can simplify Ignite clusters troubleshooting and the best of course if the cluster can do self-healing, like transaction cancellation if tx blocks exchange or note restart on OOM error. However, sometimes those mechanisms don't work well or user interaction is required. Not all errors are obvious for users and it's not clear what actions required to restore the cluster. If you google exceptions or error messages and the results can be ambiguous and not certain because different errors can have similar exceptions and you need to analyze stack trace to distinguish them. So googling isn't a straight and easy process in this case. Almost all major DBs have error codes[1][2][3] Let's do the same for Ignite, error codes easy to google, so user/dev list will be significantly more useful. We can have documentation with an error code registry and solutions for the errors.
To implement this we need to do the following: 1. all error messages/exceptions must have a unique error code(so, all new PR must NOT be accepted if any exceptions/errors don't have error codes.) 2. to avoid error code duplication, all error codes will be stored as files under some folder. 3. those files can be a source of documentation for this error code. All this files can be empty, but futher, if exception will apper on user list and someone will find solution, first, other people can easialy google it by error code, and second, we can build documentation for this error code base on user-list thread/stackoverflow/other source. Any thoughts? [1] Mysql https://dev.mysql.com/doc/refman/8.0/en/error-message-elements.html [2] OracleDB https://docs.oracle.com/pls/db92/db92.error_search [3] PostgreSQL https://www.postgresql.org/docs/10/errcodes-appendix.html Thanks, Mike.