Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add an assembler to the toolchain #21169

Open
Tracked by #16270
andrewrk opened this issue Aug 22, 2024 · 13 comments
Open
Tracked by #16270

add an assembler to the toolchain #21169

andrewrk opened this issue Aug 22, 2024 · 13 comments
Labels
enhancement Solving this issue will likely involve adding new logic or components to the codebase.
Milestone

Comments

@andrewrk
Copy link
Member

Prerequisite for #16270.

Builds of Zig that do not link against LLVM and Clang still need to be able to compile assembly files.

The existing commands already work, and they already support compiling assembly files: zig build-obj, zig build-exe, zig build-lib. The logic needs to be modified to use Zig's own assembler rather than invoking Clang as a subprocess.

For the x86 family specifically, let us jump on the intel syntax train, embracing that as the better syntax. However, we also want to be able to compile the multitude of existing files from the wild without any changes. So it will need to support AT&T syntax as well.

I suggest we start by borrowing LLVM's CPU instruction data via another tool in the tools/ directory. At some point the backends should start using this data as well instead of using an ad-hoc parser, but that will be a follow-up issue.

In order to close this issue, Zig must use its own assembler for all input files, never calling the clang binary for assembly.

Related:

@alexrp
Copy link
Member

alexrp commented Dec 13, 2024

Should this include a C preprocessor? A lot of assembly files in the wild (.S) are written with the assumption that they'll be run through one.

@andrewrk
Copy link
Member Author

Yes I think so. Aro implements a C preprocessor.

@Slackadays
Copy link

Is a RISC-V assembler in the scope of this issue?

@Rexicon226
Copy link
Contributor

Is a RISC-V assembler in the scope of this issue?

Yes, all targets that Zig supports are in the scope of this issue.

@Slackadays
Copy link

I'm already writing a RISC-V assembler to make my own project independent of GCC/LLVM because there are absolutely no others out there, so I'd love to help with the same here. However, it's in C++ and I don't know any Zig, so porting might be the best strategy. Here's a direct link to it: https://github.com/Slackadays/Chata/blob/main/libchata/src/assembler.cpp

@alexrp
Copy link
Member

alexrp commented Dec 24, 2024

Yes I think so. Aro implements a C preprocessor.

But this would have implications for whether the assembler is in-tree or in a separate repo like ziglang/translate-c, right? What's the thinking there?

@Slackadays
Copy link

But this would have implications for whether the assembler is in-tree or in a separate repo like ziglang/translate-c, right? What's the thinking there?

Why would this matter? The preprocessor could easily be its own thing since it doesn't actually need to know any C, just the C preprocessor language. Then, the assembler could choose to use it or not depending on the input file, and all's good.

@alexrp
Copy link
Member

alexrp commented Dec 24, 2024

It matters because Aro (and its preprocessor) is not going to keep being an in-tree dependency.

@Slackadays
Copy link

So let's assume Aro is no longer an in-tree dependency. Then it is now a separate repo, which doesn't change anything because Aro's preprocessor can be its own binary or library, say zigcpp for Zig C PreProcessor. At this point, whether the preprocessor is a binary or library is merely an implementation detail because it doesn't change the end result. But since the preprocessor isn't something users typically run on their own it might be simpler to just have it as a separate library.

@andrewrk
Copy link
Member Author

I've just pushed the sans-aro branch. I hope that helps to provide guidance to this discussion.

@alexrp
Copy link
Member

alexrp commented Dec 24, 2024

Thanks, that's helpful. Seems like a reasonable direction.

@andrewrk
Copy link
Member Author

andrewrk commented Dec 24, 2024

Assemblers can start as independent processes (lib/compiler/foo.zig) and then we can determine how to integrate them into new inline assembly (#10761).

They should parse into MIR and use the common MIR lowering code because that will be the method of integration with the compiler.

Instruction data (i.e. arch/x86_64/encodings.zig) should take advantage of ZON as soon as possible (#20271) since it will provide a faster and more memory efficient representation than a large zig source file with the same data.

@alexrp
Copy link
Member

alexrp commented Dec 26, 2024

They should parse into MIR and use the common MIR lowering code because that will be the method of integration with the compiler.

Hmm, I don't know if I agree that MIR is at the right level of abstraction for this - at least as it is today.

For inline assembly, it's probably fine, since we likely don't want to allow a lot of the nonsense that you can get away with in GCC-style inline assembly. I imagine that for #10761, for the most part, we will want to limit inline assembly to just machine code and data embedded directly in between instructions.

But for a full assembler, you're kind of in crazy land. You can be emitting machine code in a function and then do .pushsection into some completely unrelated section, emit whatever into it, do .popsection, and go right back to emitting machine code where you were previously. And of course, you can manipulate symbol state like ELF visibility at any point. (You might enjoy reading this page.)

As I understand it, MIR currently has a function view, but a full assembler really needs a whole-object view, and it doesn't seem to me like MIR is the right tool for the job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Solving this issue will likely involve adding new logic or components to the codebase.
Projects
None yet
Development

No branches or pull requests

4 participants