Being a foss enthusiast I can configure most of my software in way too many ways. However I noticed that this is not true for most compilers. Which got me thinking: why isn't that the case. In gcc (or your favorite compiler tool) I have a shitload of options about what are errors and warnings and how the code should be compiled and tons of other options. But not on how the code should be interpreted and what the code should look like.
Why can't I simply add a module to a build process to make it [objective oriented | have indentation for brackets | automatically allocate memory | automatically assume types | auto forward-declarate | some other thing that differentiates one language from another]* ? Its so weird that I have a pdf reader that has an option to set the window icon, a mail client that lets me specify regex to search for a mentioned but forgotten attachment and play a game that lets me set my texture picmip resolution. But that the tool (gcc) to build these things has not even got a config file build in. We have build tools around them to supply arguments.
This could look like the following: ( oversimplified )
- preprocess
- compile
- assemble
- link
v
- add brackets from indentation
- preprocess
- check if objective oriented constraints are all satisfied
- do something else
- compile
- assemble
- run assembly through as an example ai for antivirus scanning
- link
- run test
There could also be a fork in this process: sending for example the source code both to a compiler and an interpreter to detect edge case behavior while compiling. Or compile with both automatic typing and your defined typing so that when rounding errors are big you can instantly compare with a dynamically typed version of your program. Or the other way around, maybe you want different parts of your code to be handled with different preprocessors.
The build process should be configured per project for things about the input like syntax and per computer for things about the output like optimizations.
There are of course some drawbacks, one being a trust issue where someone pulls in a obscure module to build malicious releases. It probably also is harder to maintain stability when you have to keep in mind that your preprocessor isn't the first to be run. And your compiling process can take a lot longer if you have to go through multiple pre, post or even compilation phases.
If you know such a build tool, or c (: haha :) some obvious reasons that this should not exist, please let me know. Thank you for reading this lenghty post.
Thanks for the comments, based on them I think I can better explain what I want. I would like a language that has got minimal specification so its preprocessor, compiler, assembler and linker are a collection of plugins rather than one chunky program.
So the compiler reads for example a line. void main(int argc, char argv) and then all main body plugins get a event_newline. The function plugin reads this and creates a new object that contains the function main. Then sets an event_functionBody that is caught by other plugin(s) to read the contents of main and return what it has to do.
https://en.m.wikipedia.org/wiki/Unix_philosophy
Yes, not sure what you mean by this but its indeed what I'm getting at, our compilers aren't built enough in unix fashion to my liking. gcc handles preprocessing, compilation and linking. but I wouldn't know how to run a second preprocessor after the first one in gcc, just did a quick search apparently gcc -E handles this, but that doesn't seem that intuitive to run gcc -E on all files to some temporary directory, there run some other program on all the code then compile and link. A pipeline would be nicer and I also don't know any tools that can do additional preprocessing.
LLVM is designed in a very modular way and the LLVM IR allows you to specify e.g. if memory management should be manual/garbage collected.
You could make a frontend (design a language) for LLVM that exposes those options through some compiler directives.
In general I'd heavily recommend looking into LLVM's documentation.
Wow I knew some about LLVM IR but I had no idea it had high level options like garbage collection.
Oh yeah, it's actually pretty extensive and expressive. If you're interested in this sort of stuff it's worth checking out the IR language reference a bit. Apparently you can even specify the specific garbage collection strategy on a per-function basis if you want to. They do however specify the following: "Note that LLVM itself does not contain a garbage collector, this functionality is restricted to generating machine code which can interoperate with a collector provided externally" (source: https://llvm.org/docs/LangRef.html#garbage-collector-strategy-names )
If you're interested in this stuff it's definitely fun to work through a part of that language reference document. It's pretty approachable. After going through the first few chapters I had some fun writing some IR manually for some toy programs.
LLVM really looks like something that I need to look into
LLVM is the engine everything compiles to. The problem is there's no car, it's just the engine lol.
And other than Rust (which uses LLVM) the existing cars are not very configurab--well I mean they're configurable but not at the extreme level of configuration you're talking about.