User:MessiahAndrw/LLVM OS Specific Toolchain

These instructions are on building an OS Specific Toolchain using LLVM and clang instead of GCC.

THIS IS A WORK IN PROGRESS - DON'T FOLLOW THESE YET!

Note that LLVM by default builds cross-compilers for all targets, the right target simply have to be activated (why is there a LLVM Cross-Compiler page?) These instructions add your operating system as a new target. By following these instructions, hopefully you will have a LLVM-based toolchain for building applications that run under your OS!

Note that LLVM is developed in C++11 and uses many of the language's modern features, and there have been reported difficulties compiling some components on other compilers (such as GCC). If you run into trouble, you can try compiling LLVM with Clang.

Tools

These are the tools (built on LLVM) that we will be building and using:

clang - A C/C++/Objective-C compiler frontend.
libc++ - A standard C++ library with all of the C++11 bells and whistles.
libc++abi - The portable ABI behind libc++.
lld - A linker.
LLVM-as - An assembler that comes with LLVM.
LLVM Developers' Meeting - Presentation about porting llvm to new os.(2016 year)
osquery-toolchain - Script in repository is used to build the LLVM/Clang toolchain which is used in the osquery project to create portable binaries of it.

Examples

Porting llvm+clang - Artical about porting llvm to new os
llvm LF OS changes - Git diff for small support new os
Kolibri OS LLVM - Example porting llvm+clang+lld to Kolibri OS(IN DEVELOPMENT)

Checking out

First we must check out the source code and create the basic structure. We'll assume we want two directories - a 'llvm' directory containing the source code, and 'build' containing our build toolchain.

Check out the source LLVM+Clang source code:

git clone https://github.com/llvm/llvm-project.git

Make a build directory

mkdir llvm-project/build
cd llvm-project/build

Cmake files

NEED REWRITE TO CMAKE

[TOOLCHAIN??]Add you os to clang/lib/Driver/CMakeLists.txt

 ...
 ToolChains/Hexagon.cpp
 ToolChains/Hurd.cpp
 ToolChains/Linux.cpp
 ToolChains/Myos.cpp #add this
 ToolChains/MipsLinux.cpp
 ToolChains/MinGW.cpp
 ToolChains/Minix.cpp
 ...

CUSTOM LINKER

Add you os to lld/tools/lld/CMakeLists.txt

 ...
 lldMachO2
 lldMinGW
 lldWasm
 lldMYOS
 ...
 )
 ...

Add you os to lld/CMakeLists.txt

 ...
 add_subdirectory(MachO)
 add_subdirectory(MinGW)
 add_subdirectory(wasm)
 add_subdirectory(MYOS)
 ...

Modifying LLVM

The first step is to get LLVM to recognize your OS as a platform. Like the other tutorial, we'll assume your OS is called MyOS. Of course, you'd replace MyOS with your own operating system's name (unless your OS is called MyOS).

llvm/include/llvm/ADT/Triple.h

In the enum called OSType (around line 120), add your OS:

 enum OSType {
   ...
   MyOS,
   ...
 };

llvm/lib/Support/Triple.cpp

In the function Triple::getOSTypeName (around line 135) add your OS:

 case MyOS: return "myos";

In the function parseOS (around line 326), add your OS:

 .StartsWith("myos", Triple::MyOS)

Around line 414 is getDefaultFormat that returns the default executable format type for a platform. The fallback is ELF, but if you want to use PE or MachO (or maybe your own) you can stick it here.

llvm/lib/Support/* Notes

There is some platform specific stuff in llvm/lib/Support/* (particularly Hosts.cpp and the subdirectories), but they appear to be support files for the platform the compiler runs on, not targets.

Modifying LLD

lld/tools/lld/lld.cpp

   ...
   .CasesLower("wasm", "ld-wasm", Wasm)
   .CasesLower("myos", "ld-myos", MyOS)
   .CaseLower("link", WinLink)
   ...
   case Wasm:
     return !lld::wasm::link(args, exitEarly, stdoutOS, stderrOS);
   case Myos:
     return !myos::link(args, exitEarly, stdoutOS, stderrOS);
   ...
   die("lld is a generic driver.\n"
       "Invoke ld.lld (Unix), ld64.lld (macOS), lld-link (Windows), wasm-ld"
       "Invoke ld.lld (Unix), ld64.lld (macOS), lld-link (Windows), wasm-ld"
       " (WebAssembly) instead");
       " (WebAssembly), myos-ld (MyOS) instead");
   ...

lld/include/lld/Common/Driver.h

  ...
  namespace myos {
    bool link(llvm::ArrayRef<const char *> args, bool canExitEarly,
         llvm::raw_ostream &stdoutOS, llvm::raw_ostream &stderrOS);
  }
  ...

ADD EXAMPLE LLD KOLIBRI

If your operating system uses its own executable format, you can find the relevant code under llvm/projects/lld/lib/ReaderWriter/, but this is a much more difficult job than porting using a common executable format like ELF, MachO, or PE.

NOTE: I am a different contributor (not MessiahAndrw), and my OS does use a different executable format. I will document my progress on that later.

Modifying Clang

clang/lib/Basic/Targets/OSTargets.h

We need to create a target so Clang knows a little bit about the platform it's compiling for, so we will create a TargetInfo object called MyOSTargetInfo. You can override some compiler internals here (such as setting the size of long ints) - look at what the other targets do for example.

Somewhere in this file, above AllocateTarget, create your target object:

... //other targets
 // MyOS target
template<typename Target>
class LLVM_LIBRARY_VISIBILITY MyOSTargetInfo : public OSTargetInfo<Target> {
 protected:
  void getOSDefines(const LangOptions &Opts, const llvm::Triple &Triple,
                    MacroBuilder &Builder) const override {
    Builder.defineMacro("_MYOS");
  }

 public:
   MyOSTargetInfo(const llvm::Triple &Triple)
       : OSTargetInfo<Target>(Triple) {
     this->WIntType = TargetInfo::UnsignedInt;
     switch (Triple.getArch()) {
      default:
       break;
        /*case llvm::Triple::mips:
        case llvm::Triple::mipsel:
        case llvm::Triple::mips64:
        case llvm::Triple::mips64el:
        case llvm::Triple::ppc:
        case llvm::Triple::ppcle:
        case llvm::Triple::ppc64:
        case llvm::Triple::ppc64le:
        this->MCountName = "_mcount";
          break;*/
        case llvm::Triple::x86:
        case llvm::Triple::x86_64:
         this->HasFloat128 = true;
         break;
     }
   }
   const char *getStaticInitSectionSpecifier() const override {
    return ".text.startup";
   }
};
... //other targets

clang/lib/Basic/Targets.cpp

In AllocateTarget, you'll need to add your OS in switch(Triple.getArch()):

switch (Triple.getArch()) {
   ...
   case llvm::Triple::x86: // and/or llvm::Triple::x86_64
      ...
      switch (os) {
         ...

        case llvm::Triple::MyOS:
            return new MyOSTargetInfo<X86TargetInfo>(Triple,Opts); // or MyOSTargetInfo<X86_64TargetInfo>
         ...
    ...
 }

clang/lib/Driver/ToolChains/Myos.h

Example: https://raw.githubusercontent.com/lexasub/llvm-project-kos/main/clang/lib/Driver/ToolChains/Kolibri.h

Next, we have to create a toolchain object that Clang uses to figure out how to connect to the other toolchain components (namely the linker and assembler) for our target.

Add this somewhere:

 class LLVM_LIBRARY_VISIBILITY MyOS : public Generic_ELF {
 public:
   MyOS(const Driver &D, const llvm::Triple &Triple,
        const llvm::opt::ArgList &Args);

    /*bool HasNativeLLVMSupport() const override;

    void
    AddClangSystemIncludeArgs(const llvm::opt::ArgList &DriverArgs,
                              llvm::opt::ArgStringList &CC1Args) const override;
    void addLibStdCxxIncludePaths(
        const llvm::opt::ArgList &DriverArgs,
        llvm::opt::ArgStringList &CC1Args) const override;
    void AddCudaIncludeArgs(const llvm::opt::ArgList &DriverArgs,
                            llvm::opt::ArgStringList &CC1Args) const override;
    void AddHIPIncludeArgs(const llvm::opt::ArgList &DriverArgs,
                           llvm::opt::ArgStringList &CC1Args) const override;
    void AddIAMCUIncludeArgs(const llvm::opt::ArgList &DriverArgs,
                             llvm::opt::ArgStringList &CC1Args) const override;
    CXXStdlibType GetDefaultCXXStdlibType() const override;
    bool
    IsAArch64OutlineAtomicsDefault(const llvm::opt::ArgList &Args) const override;
    bool isPIEDefault() const override;
    bool isNoExecStackDefault() const override;
    bool IsMathErrnoDefault() const override;
    SanitizerMask getSupportedSanitizers() const override;
    void addProfileRTLibs(const llvm::opt::ArgList &Args,
                          llvm::opt::ArgStringList &CmdArgs) const override;
    std::string computeSysRoot() const override;

    std::string getDynamicLinker(const llvm::opt::ArgList &Args) const override;

    void addExtraOpts(llvm::opt::ArgStringList &CmdArgs) const override;

    std::vector<std::string> ExtraOpts;

    llvm::DenormalMode getDefaultDenormalModeForType(
        const llvm::opt::ArgList &DriverArgs, const JobAction &JA,
        const llvm::fltSemantics *FPType = nullptr) const override;
    */
 protected:
   Tool *buildAssembler() const override;
   Tool *buildLinker() const override;
   /*Tool *buildStaticLibTool() const override;
   std::string getMultiarchTriple(const Driver &D,
                                 const llvm::Triple &TargetTriple,
                                 StringRef SysRoot) const override;*/
 };

Note that we're inheriting from the Generic_ELF toolchain, but you can look at some other examples (Windows, Mac OS) for alternatives.

clang/lib/Driver/ToolChains/Myos.cpp

Example: https://raw.githubusercontent.com/lexasub/llvm-project-kos/main/clang/lib/Driver/ToolChains/Kolibri.cpp

Here's the code for the toolchain object, insert it somewhere in this file:

 /// MyOS MyOS tool chain which can call as(1) and ld(1) directly.

MyOS::MyOS(const Driver &D, const llvm::Triple& Triple, const ArgList &Args)
  : Generic_ELF(D, Triple, Args) {
   // Fill this in with your default library paths one day..
   //getFilePaths().push_back(getDriver().Dir + "/../lib");
   //getFilePaths().push_back("/usr/lib");
}

Tool *MyOS::buildAssembler() const {
  return new tools::myos::Assemble(*this);
}

Tool *MyOS::buildLinker() const {
  return new tools::myos::Link(*this);
}

Note in the constructor that we have the ability to add default include paths, which are sent to our assembler and linker. We'll comment them out now so our system doesn't automatically try to add our host system's libraries when we compile code for our OS.

clang/lib/Frontend/InitHeaderSearch.cpp

In the function InitHeaderSearch::AddDefaultCIncludePaths, add this somewhere, so we don't automatically add /usr/local/include as an include path:

  case llvm::Triple::MyOS:
    // Fill this in with your default include paths...
    // AddPath("/usr/local/include", System, false);
    break;

If you want your target to automatically add default include paths, you can customize this file. Add your target under AddDefaultCIncludePaths, AddDefaultCPlusPlusIncludePaths, AddDefaultIncludePaths, etc.

clang/lib/Driver/Driver.cpp

We need make LLVM use our toolchain object when it targets our OS, so in Driver::getToolChain, add your OS to switch (Target.getOS()):

...
#include "ToolChains/Hurd.h"
#include "ToolChains/Myos.h"
#include "ToolChains/Lanai.h"
...
    case llvm::Triple::MyOS:
      /*if (Target.getArch() == llvm::Triple::hexagon)
          TC =
            std::make_unique<toolchains::HexagonToolChain>(*this, Target, Args);
        else if ((Target.getVendor() == llvm::Triple::MipsTechnologies) &&
               !Target.hasEnvironment())
          TC = std::make_unique<toolchains::MipsLLVMToolChain>(*this, Target,
                                                             Args);
        else if (Target.isPPC())
          TC = std::make_unique<toolchains::PPCLinuxToolChain>(*this, Target,
                                                             Args);
        else if (Target.getArch() == llvm::Triple::ve)
          TC = std::make_unique<toolchains::VEToolChain>(*this, Target, Args);
        else*/
      TC = std::make_unique<toolchains::MyOS>(*this, Target, Args);
      break;
...

clang/lib/Driver/Tools.h

WHAT FILE IN NEW LLVM?MAY BE NEED REPLACED TO LLD CUSTOM

In here, we define the Assemble and Link classes that our toolchain object references:

/// myos -- Directly call GNU Binutils assembler and linker
namespace myos {
  class LLVM_LIBRARY_VISIBILITY Assemble : public GnuTool  {
  public:
    Assemble(const ToolChain &TC) : GnuTool("myos::Assemble", "assembler",
                                         TC) {}

    bool hasIntegratedCPP() const override { return false; }

    void ConstructJob(Compilation &C, const JobAction &JA,
                      const InputInfo &Output,
                      const InputInfoList &Inputs,
                      const llvm::opt::ArgList &TCArgs,
                      const char *LinkingOutput) const override;
  };
  class LLVM_LIBRARY_VISIBILITY Link : public GnuTool  {
  public:
    Link(const ToolChain &TC) : GnuTool("myos::Link", "linker", TC) {}

    bool hasIntegratedCPP() const override { return false; }
    bool isLinkJob() const override { return true; }

    void ConstructJob(Compilation &C, const JobAction &JA,
                      const InputInfo &Output,
                      const InputInfoList &Inputs,
                      const llvm::opt::ArgList &TCArgs,
                      const char *LinkingOutput) const override;
  };
} // end namespace myos

TODO: We're inheriting from GnuTool, use LLD and LLVM-as.

clang/lib/Driver/Tools.cpp

WHAT FILE IN NEW LLVM?MAY BE NEED REPLACED TO LLD CUSTOM

Here's the code for our Assemble and Compile - they invoke 'as' and 'ld'.

 void myos::Assemble::ConstructJob(Compilation &C, const JobAction &JA,
                                   const InputInfo &Output,
                                   const InputInfoList &Inputs,
                                   const ArgList &Args,
                                   const char *LinkingOutput) const {
  ArgStringList CmdArgs;

  Args.AddAllArgValues(CmdArgs, options::OPT_Wa_COMMA, options::OPT_Xassembler);

  CmdArgs.push_back("-o");
  CmdArgs.push_back(Output.getFilename());

  for (const auto &II : Inputs)
    CmdArgs.push_back(II.getFilename());

  const char *Exec = Args.MakeArgString(getToolChain().GetProgramPath("as"));
  C.addCommand(llvm::make_unique<Command>(JA, *this, Exec, CmdArgs));
 }

 void myos::Link::ConstructJob(Compilation &C, const JobAction &JA,
                               const InputInfo &Output,
                               const InputInfoList &Inputs,
                               const ArgList &Args,
                               const char *LinkingOutput) const {
  const Driver &D = getToolChain().getDriver();
  ArgStringList CmdArgs;

  if (Output.isFilename()) {
    CmdArgs.push_back("-o");
    CmdArgs.push_back(Output.getFilename());
  } else {
    assert(Output.isNothing() && "Invalid output.");
  }

  /* if (!Args.hasArg(options::OPT_nostdlib) &&
      !Args.hasArg(options::OPT_nostartfiles)) {
      CmdArgs.push_back(Args.MakeArgString(getToolChain().GetFilePath("crt1.o")));
      CmdArgs.push_back(Args.MakeArgString(getToolChain().GetFilePath("crti.o")));
      CmdArgs.push_back(Args.MakeArgString(getToolChain().GetFilePath("crtbegin.o")));
      CmdArgs.push_back(Args.MakeArgString(getToolChain().GetFilePath("crtn.o")));
  }*/

  Args.AddAllArgs(CmdArgs, options::OPT_L);
  Args.AddAllArgs(CmdArgs, options::OPT_T_Group);
  Args.AddAllArgs(CmdArgs, options::OPT_e);

  AddLinkerInputs(getToolChain(), Inputs, Args, CmdArgs);

  addProfileRT(getToolChain(), Args, CmdArgs);

  if (!Args.hasArg(options::OPT_nostdlib) &&
      !Args.hasArg(options::OPT_nodefaultlibs)) {
    if (D.CCCIsCXX()) {
      getToolChain().AddCXXStdlibLibArgs(Args, CmdArgs);
      CmdArgs.push_back("-lm");
    }
  }

  // We already have no stdlib...
  /*if (!Args.hasArg(options::OPT_nostdlib) &&
      !Args.hasArg(options::OPT_nostartfiles)) {
    if (Args.hasArg(options::OPT_pthread))
      CmdArgs.push_back("-lpthread");
    CmdArgs.push_back("-lc");
    CmdArgs.push_back("-lCompilerRT-Generic");
    CmdArgs.push_back("-L/usr/pkg/compiler-rt/lib");
    CmdArgs.push_back(
         Args.MakeArgString(getToolChain().GetFilePath("crtend.o")));
  }*/

  const char *Exec = Args.MakeArgString(getToolChain().GetLinkerPath());
  C.addCommand(llvm::make_unique<Command>(JA, *this, Exec, CmdArgs));
 }

TODO: Replace with LLD and LLVM-as.

Compiling the toolchain

   cmake -G <generator> [options] ../llvm
   prefered: cmake -GNijna [options] ../llvm
   Some common build system generators are:
       Ninja --- for generating Ninja build files. Most llvm developers use Ninja.
       Unix Makefiles --- for generating make-compatible parallel makefiles.
       Visual Studio --- for generating Visual Studio projects and solutions.
       Xcode --- for generating Xcode projects.

Some Common options:

-DLLVM_ENABLE_PROJECTS='...' --- semicolon-separated list of the LLVM sub-projects you'd like to additionally build. Can include any of: clang, clang-tools-extra, libcxx, libcxxabi, libunwind, lldb, compiler-rt, lld, polly, or debuginfo-tests.

For example, to build LLVM, Clang, libcxx, and libcxxabi, use -DLLVM_ENABLE_PROJECTS="clang;libcxx;libcxxabi".

-DCMAKE_INSTALL_PREFIX=directory --- Specify for directory the full path name of where you want the LLVM tools and libraries to be installed (default /usr/local). -DCMAKE_BUILD_TYPE=type --- Valid options for type are Debug, Release, RelWithDebInfo, and MinSizeRel. Default is Debug. -DLLVM_ENABLE_ASSERTIONS=On --- Compile with assertion checks enabled (default is Yes for Debug builds, No for all other build types).

   cmake --build . [-- [options] <target>] or your build system specified above directly.

The default target (i.e. ninja or make) will build all of LLVM.

The check-all target (i.e. ninja check-all) will run the regression tests to ensure everything is in working order.

CMake will generate targets for each tool and library, and most LLVM sub-projects generate their own check-<project> target.

Running a serial build will be slow. To improve speed, try running a parallel build. That's done by default in Ninja; for make, use the option -j NNN, where NNN is the number of parallel jobs, e.g. the number of CPUs you have.

Installation

sudo cmake install

Compiling your first program for your OS

Let's test out the compiler! Create a simple C file:

 int do_something(int a, int b) {
   return a * b;
 }

And compile it with the Clang system you just built: (Use --target=x86_64-myos if you added your OS as a 64-bit platform.)

clang -target x86-myos -c -o test.o test.c

TODO: The -c option tells us to just compile, because we haven't finished with the linker yet.

This is great, but you will notice you can't just start including <stdio.h> because you'll need to port a C library first. If you don't want to use the C library (you won't be able to use LibC++/the C++ library - and many of the fancier C++11 features that require runtime support) you can stop here.

Porting a C Library

At some point you'll likely want a C Library. Libc++ depends on a functioning C library.

TODO: Any Clang specific stuff here.

Hosting on our OS

This section is for compiling the LLVM toolchain to run under your OS, rather than just build programs for your OS.

TODO: I haven't gotten this far yet!