Monday, January 10, 2022

C++20 migration woes

 I have an extensive list of subjects I can work on in the Egel language.  Among them: performance (it needs to become an order faster),  the introduction of quotation and local combinator definitions, fixing small unfinished corners,  mobile code, and -last but not least- various language experiments around features not easily expressible in typed languages.

But I also want to migrate to C++20 modules and introduce an egel namespace. And I also want to do that before all the above since that should simplify and sanitize the interpreter; modules should allow for a more principled approach to the implementation and I'll be visiting, subsequently cleaning, a lot of code not touched in years. I prefer single-file (complete) modules for a bit of a silly reason: I mostly program in vim and it's just faster/easier to modify declarations not spread over different files.

Egel is intended to be an interpreted language implemented in C++, closely following the C++ standard and not much more (except for Unicode/libicu) and to be easily extendable and embeddable in C++. This already gives problems since C++20 compilers (gcc, clang, msvc) are at various stages of supporting the C++20 module system. This adds to the problems I already have supporting gcc/clang, the interpreter usually only builds on up-to-date systems due to various moving targets, i.e., cmake and libraries.

I am at the start of the process, and I decided to document a number of problems I encountered.

  • Naming. A silly observation but do I go for `.ixx`, `.cppm`, or `.cpp` files? I decided on `.ixx`. Rationale: I don't like 4 letter extensions,  and `.cpp` is too overloaded for my taste. This is weird because I don't support msvc at the moment, I only work with clang since my shift to a Mac M1, and I want to primarily support the venerable gcc.
  • Macros.  I make extensive use of macros. A number of them I have been phasing out (like casts) but there are two types of macros I would prefer to keep.  Assertions (like `PANIC(m)`, abort with a message) and multiple assignment helpers (like `SPLIT_LAMBDA(l, vv, t)`, split a lambda AST object into variables and a term).  Then there are boiler-plate macros (like DYADIC_PREAMBLE which sets up all the boilerplate of a dyadic combinator.) Where do they go? Do I need to include a `macro.hpp` header file in every module?
  • Constants.  A number of constants are defined as macros in the interpreter.  Like `#define LPAREN '('`. I can easily switch over to constexpr but that implies I need to decide on a type, in my case char or UChar32.  I would prefer not to.  It gets worse for string constants: char*, string, or icu::UnicodeString? Sometimes it makes sense to treat something as text instead of a typed expression.  This is a minor detail but switching over to constexpr actually makes the code more tedious to support long term,  instead of the opposite.
  • Inclusion of other libraries.  Where are `<iostream>` and the libicu c++20 module equivalents? Do I include, do I import, if I import, then what to do I import? Where do I find the information, i.e., what website documents where the objects are defined?
  • CMake.  I have no clue how to write cmake modules that support c++20 modules, not even considering in a portable manner.
These are the problems I have now. What I need are migration guides.

No comments:

Post a Comment