Hi folks,

I was thinking how we can simplify Ignite clusters troubleshooting and the
best of course if the cluster can do self-healing, like transaction
cancellation if tx blocks exchange or note restart on OOM error. However,
sometimes those mechanisms don't work well or user interaction is required.
Not all errors are obvious for users and it's not clear what actions
required to restore the cluster.
If you google exceptions or error messages and the results can be
ambiguous and not certain because different errors can have similar
exceptions and you need to analyze stack trace to distinguish them. So
googling isn't a straight and easy process in this case.
Almost all major DBs have error codes[1][2][3]
Let's do the same for Ignite, error codes easy to google, so user/dev list
will be significantly more useful. We can have documentation with an error
code registry and solutions for the errors.

To implement this we need to do the following:
1. all error messages/exceptions must have a unique error code(so, all new
PR must NOT be accepted if any exceptions/errors don't have error codes.)
2. to avoid error code duplication, all error codes will be stored as files
under some folder.
3. those files can be a source of documentation for this error code.

All this files can be empty, but futher, if exception will apper on user
list and someone will find solution, first, other people can easialy google
it by error code, and second, we can build documentation for this error
code base on user-list thread/stackoverflow/other source.

Any thoughts?

[1] Mysql
https://dev.mysql.com/doc/refman/8.0/en/error-message-elements.html
[2] OracleDB https://docs.oracle.com/pls/db92/db92.error_search
[3] PostgreSQL https://www.postgresql.org/docs/10/errcodes-appendix.html

Thanks,
Mike.

Reply via email to