I saw this project on the wiki, and it reminds me of a problem I have been trying to understand better: https://wiki.netbsd.org/projects/project/language-neutral-interfaces/
I am a retired compiler engineer. I used to work on LLVM and other compilers, including LLVM's code generator and register allocators. I brought up the sparc64 support and implemented the System V ABI for that architecture in Clang and LLVM. I haven't contributed directly to NetBSD before, but my name appears once or twice in src/external. I am building an Ada compiler as a hobby project. I realize this is a huge undertaking that I will probably never finish. It's fun anyway. The compiler will generate native code for various platforms and architectures, but to help with bootstrapping, I also want it to be able to generate portable C++11/POSIX code. This means I have to think about the difference between API and ABI on POSIX platforms. As I see it, there are three levels of interface definition to consider: 1. The API level, or source code compatibility level. Standards like POSIX tend to describe interfaces in terms of what your C source code should look like: #include <sys/stat.h> #include <errno.h> int check_dir(const char *pathname) { struct stat buffer; if (stat(pathname, &buffer) != 0) return errno; if (S_ISDIR(buffer.st_mode)) return 0; else return ENOTDIR; } POSIX doesn't specify the exact contents of struct stat nor the value of ENOTDIR. It only defines that you can use those symbols and struct members in C code after including the right headers. 2. The Pure-C level. The C compiler lowers the C code to something self-contained. It will: - Run the preprocessor, - Expand typedefs, and - Expand inline functions. (I'm not saying compilers work this way, but they work as-if this happened). This leaves code consisting of only C primitives: struct stat { int st_mode; ... }; extern __tls int errno; int check_dir(const char *pathname) { struct stat buffer; if (stat(pathname, &buffer) != 0) return errno; if ((buffer.st_mode & 0170000) == 0040000) return 0; else return 20; } This Pure-C code is not portable. There are differences between platforms, architectures, and even some compiler flags can affect this code. Note that Pure-C also doesn't have to be standard C. It is common to use vendor-specific extensions like __attr__ to get the right alignments and calling conventions. I see NetBSD sources are using a __RENAME macro to change linkage names. 3. The ABI level. Documents like the System V ABI describe how Pure-C is translated to machine code calls. This includes size and alignment of primitive types, layout of structs, and how arguments and return values are passed in function calls. The translation from Pure-C to ABI is a pretty well understood problem. There are ABI documents describing the standard stuff, and you may have to deal with a few compiler extensions for alignment, SIMD types, and alternate calling conventions. This is all stuff that compilers need to deal with anyway. The translation from API level to Pure-C level is more difficult (for me, anyway). It seems like you basically have to run C code through the system C compiler to make sure you covered all the corner cases. It is not very satisfying for an Ada compiler to have to depend on the system C compiler in order to generate binary code that interacts with the system. Ada actually has a standardized foreign function interface that can be used to interface with C and Fortran. The problem is that it interfaces to the Pure-C level, not the API level. I can't use it to call stat() in a portable way. I am interested in an Interface Description Language that can be used to: a. Define the source-level API in a way that is detached from the C language. b. Define stronger types than C allows: S_ISDIR() is only supposed to work on a mode_t returned from stat(), not any old integer. c. Define data flow better than C allows: The struct stat* argument to stat() is only meant to move data out of the function. The pathname pointer isn't captured by the function call. d. Describe how the API level gets translated to the Pure-C level or similar. This is different for different platforms and architectures. This IDL would make it possible for me to generate portable Ada bindings for POSIX and other APIs without having to rely on the system C compiler. I think it could also be useful for a project like NetBSD to be able to track source compatibility and binary compatibility individually. I haven't done anything concrete yet, and I agree that it is a good idea to research prior art. There is a lot of it: - Wikipedia as a long list of IDLs: https://en.wikipedia.org/wiki/Interface_description_language - CORBA IDL: https://www.omg.org/spec/IDL - ASN.1: https://www.itu.int/en/ITU-T/asn1/Pages/asn1_project.aspx - Apache Thrift: https://thrift.apache.org - Google protobuf: https://developers.google.com/protocol-buffers/ - Zephyr ASDL: https://www.usenix.org/conference/dsl-97/zephyr-abstract-syntax-description-language - SWIG: https://swig.org - Rust's bindgen: https://rust-lang.github.io/rust-bindgen/ I don't know if any of these are usable for us. IDLs are often associated with a specific protocol or data format, and they tend to expose the quirks of those use cases. They are good for defining new things, but not so good for describing existing things. This is why there are so many IDLs. It seems to me that we have slightly different use cases for an IDL, but there is a lot of overlap, and it could be possible to use the same language for both. I am not looking to do a GSOC project or anything lke that, but I would like to research this some more, and perhaps learn from your expertise. In terms of existing IDLs, do you have anything in mind that you think could work? Best, Jakob