FASM
The flat assembler (FASM) is a fast and efficient self-assembling x86 assembler for DOS, Windows and *nix (Linux, BSD, MacOS etc.) operating systems. Currently it supports all x86/x64 instructions with MMX, 3DNow!, SSE up to SSE4, AVX, AVX2 and XOP extensions, can produce output in binary, MZ, PE, COFF or ELF format. It includes a powerful but easy to use macro language and compiles in multiple passes to optimize the instruction codes for size. The flat assembler is self-compilable and the full source code is included.
There's a version called FASM-ARM that generates AArch32 and AArch64 native instructions (on x86, so this is a cross-compiler) but uses the same macro infrastructure as FASM.
The next generation of FASM is called FASMG, which uses even more sophisticated macro infrastructure, and also macros to describe the instructions to be generated. Therefore it can be used to generate literally anything (macro definitions are available for x86, AArch64, Z80, MOS 6502, WebAssembly, Java and Dalvik bytecode etc.)