Cross-Porting Software

From OSDev Wiki
Jump to navigation Jump to search
Difficulty level
Difficulty 3.png
Advanced

This page is notes on how to port software to your operating system by cross-compiling it. Many core packages use autoconf-generated ./configure script that provide a convenient interface for our purposes. This assumes your operating is somewhat Unix-like. There is a lot of subtle semantics that first-time porters often get wrong and particular packages occasionally misbehave. This tutorial assumes you are porting a package using a conventional configure script (generated with autoconf), but the principles can be adapted to other build systems.

Prerequisites

This is an advanced matter and your operating system needs to have an established user-space and a dedicated toolchain:

System Root

Please read this article's section on sysroots.

Your build process needs to involve a system root, a fake root directory for your operating system. Your cross-compiler must be set up to search this directory tree for libraries and headers. I recommend for your build system to look through each of your subprojects and make install them into this system root, then do the same for ports, and finally your build system extracts the desired files from the system root and makes a boot-able image. We will be installing ports into this directory tree. Futher ports may depend on previous ones and it's crucial the system root is searched.

Prefixes

The build systems of the software installs the software somewhere. This is normally in ${PREFIX}/bin, ${PREFIX}/lib and so on. The packages often defaults to /usr/local, but this is generally used for site-specific files rather than vendor-provided files (you are a vendor now). Note how if you want to install packages into /bin and /lib, you set the prefix to the empty string, not a single slash. A prefix is not a directory path in its own right, but something that is added in front of a real directory path.

Dependencies

Before porting a package, you need to port its dependencies first (and transitively their dependencies as well). You can often find a list of dependencies in the documentation (perhaps look for README or INSTALL files). Try also to run ./configure --help, this will often list --with-foo options if the package depends on libfoo, it will also list which dependencies are optional. See also guides like Beyond Linux from Scratch as it contains useful dependency information.

You should be able to construct a nice directed acylic graph of packages and their dependencies (some edges being optional) and use it to decide the order packages are built. If you are not interested in optional dependencies, you can skip them, but beware: Later package may hard-depend on the package you skipped, and may assume a library was built with support for the packager you skipped.

Patches

It is very likely that you will need to patch the packages you port. You need to set yourself up such that you can easily patch the packages with no to low cost in effort. You can help other people by hosting your patch collection somewhere public. It is often nicer to actually fix misbehaving packages than working around the problem in your build system in complex manners.

Source Code

You need to find the source code for the package you wish to port. This is rather obvious. Generally, it's best to find the latest stable tarball of the package use that. This is preferable to using a git checkout of the package. It's a good idea to save a copy of the original tarball.

It's advisable to check whether this is a real release or whether it has been maliciously modified. Many projects provide hash values of their releases or otherwise sign their releases. Configure scripts are highly convenient places to hide malware (if you have the autoconf skills, you can regenerate the configure script and other files and see if they match). Man-in-the-middle attacking in insecure http/ftp download of a tarball is trivial, you can mitigate this danger by downloading from multiple networks and verifying what other people have.

You also definitely wish to to read the license for the software and whether it even permits the efforts we are taking here.

You extract the source code somewhere appropriate:

# Use --extract --file if you have a hard time surviving xkcd 1168.
tar -xf libfoo-4.2.tar.xz

pkg-config

Libraries increasingly provide pkg-config files that describe where the headers are installed and how to link against the library (and private library dependencies if statically linked). Working with pkg-config is preferable to fighting it (see below on packages rolling their own foo-config program) and it nicely supports system roots and is cross-compile aware. It's possible to compile a custom cross-pkg-config, or you can simply wrap your system one. Make a x86_64-myos-pkg-config executable shell script and put it somewhere in your path for the duration of the cross-compilation:

#!/bin/sh
# Fill these in appropriately:
export PKG_CONFIG_SYSROOT_DIR=$MYOS_SYSROOT
export PKG_CONFIG_LIBDIR=$MYOS_SYSROOT/usr/lib/pkgconfig
# TODO: If it works this should probably just be set to the empty string.
export PKG_CONFIG_PATH=$PKG_CONFIG_LIBDIR
# Use --static here if your OS only has static linking.
# TODO: Perhaps it's a bug in the libraries if their pkg-config files doesn't
#       record that only static libraries were built.
exec pkg-config --static "$@"

You then set PKG_CONFIG=x86_64-myos-pkg-config to packages use your custom pkg-config instead, and PKG_CONFIG_FOR_BUILD=pkg-config so packages that wish to compile local programs use the system pkg-config.

Build

This section lists the steps that are sufficient for an ideal port, but see below.

First you wish to find the config.sub file. The GNU coding conventions usually place it at build-aux/config.sub to avoid clutter in the main directory. You wish to add your operating system's target name to it, as you did in the OS Specific Toolchain article. This change is required, or otherwise the --host value will be rejected. This simple fact means that you likely have to have a small patch for every single port.

Secondly, you cross-build the package in this manner:

# Potentially unset CC and such here (see configure --help) to prevent local
# tools from being mistakenly used as cross-tools. Alternatively, set them to
# your cross-tools unconditionally. Note also the existence of CC_FOR_BUILD in
# the case of packages that needs to build local tools to build themselves.

./configure --host=x86_64-myos --prefix=/usr
make
make DESTDIR=$SYSROOT install

This will cross-compile the software and install it under /usr on your system, inside your system root as the temporary installation location. Note how many packages remember --prefix and use it at runtime to locate their data files. You must not do --prefix=$SYSROOT/usr as that means libfoo would look files in /home/myuser/myos/sysroot/usr/share/libfoo while running on your operating system instead of /usr/share/libfoo. The DESTDIR acts as a second prefix for the purpose of installation, it's not revealed to the package before the install step, so it won't mistakenly remember it.

Alternatively, instead of setting DESTDIR to the system root, you can set it to a temporary location and create an installable binary package from it, and then install the binary package into the system root.

Porting

The make command likely didn't succeed. You'll need to investigate the build errors and improve your standard library, or improve the package if it is unreasonable.

Runtime

After you cross-compiled the package and installed it onto your operating system, it's time to try it out. If you implemented your operating system and standard library well, it should now work. Likely, especially if you made your own custom standard library, your first real ports won't. You now need to learn to debug programs you know nothing about on your new operating system. Have fun! Suddenly it'll work and you have something to talk about in the 'Aww, Yeah!' forum thread.

Problems

Unfortunately, this is the real world and some packages misbehave in manners that breaks our naive cross-compilation. This is a bug in such packages; the practice described so far should be sufficient for well-made packages. Fortunately, community members likely know of such issues and generally work to upstream fixes, see the patch collections below. This section lists common problems and solutions.

In case it's a bug in these packages, it's advisable to patch the packages to fix the bug and upstream the bugfix (or report the issue), rather than poorly working around the issues in your build system (in ways that tend to get more and more complex). Generally, the best approach is to make the packages implement the interface we rely on above.

libtool .la files

Many libraries use libtool and install .la files into the system library directory. These work somewhat like pkg-config files, except often semantically wrong as they are not sysroot aware (but research the --with-sysroot option that libtool-aware configure scripts provide, it might save the system root path in the .la files which is wrong). The files can cause the build system to add library directories to the link command that weren't prefixed with the system root, causing the command link with the /usr/lib files from your local operating system.

It's entirely safe to delete these files, so set up your post-install build steps to delete any .la files on sight. The libraries generally install pkg-config files as well and the programs generally use pkg-config to locate libraries anyways (if not, it's perhaps a bug).

Dumb pkg-config use

Some packages simply invoke pkg-config with the raw pkg-config command, rather than using the PKG_CONFIG and PKG_CONFIG_FOR_BUILD variables. That is a bug. The correct logic is to use the variables if they exist and fall back on the raw command otherwise. The shell expression ${PKG_CONFIG:=pkg-config} is useful as if it is set and non-empty, then it is used, otherwise pkg-config us used.

foo-config

Some packages (like libfreetype, libpng, libsdl, libxml2, and more) install a custom program like freetype-config into the bin directory. The idea is packages depending on libfreetype can run freetype-config --cflags --libs and it gets the compiler options needed to use the library. This scheme fails horribly in practice, as the bin directory of your operating system is not in your PATH (neither should it be!) and instead the freetype-config program of your distribution is used instead. The programs are not even sysroot-aware so it's not added in front of the compiler options. Suddenly your otherwise-nicely cross-compiled programs got linked with a Linux libfreetype with the Linux version of the headers.

The solution is to nuke these programs on sight. They're broken and should never be used or installed on your operating system. Tell the upstream developers to stop providing them and provide pkg-config files, and projects using them to use pkg-config instead. This is supposedly why pkg-config was created, to do this once and for all in a proper way. If you see programs using the foo-config programs, patch the configure script so they use pkg-config instead.

Running Cross-Compiled Programs

Some packages are not cross-compile aware and use the standard compiler (your cross-compiler) to build a local program that generates parts of the program and then runs the program. This doesn't work, it runs a cross-compiled executable and it won't run on your local operating system. The results vary from execve returning an error, infinite loops, or mysterious crashes. The bug is that the package should have used a variable like CC_FOR_BUILD to compile programs for the local system.

This occasionally also takes the form of configure tests that are mistakenly run (not just compiled and linked) when cross-compiling. A while ago, a lot of packages checked for Japanese locales by unconditionally running such tests when cross-compiling leading to fun results. The correct behavior is to attempt running this test, and if we are cross-compiling, then assume something reasonable (or delaying the test until the program is actually run).

Assuming the Worst

Some packages take portability to the levels where they want to support broken operating systems and do compatibility magic in these cases. This is often to work around a bug in a particular release of a particular operating system. This compatibility occasionally takes the form of a configure test that needs to be executed. In the event of cross-compilation, these tests needs to assume something. The developers made the error of assuming unknown systems are terribly broken, instead of just using a heuristic that FooBSD (before release 5) is broken and assuming the best about all other systems.

The correct solution is for packages to assume the best about unknown operating systems. This shifts the punishment from the good systems to the bad systems (so they break and they have the opportunity to fix bugs). This is not a problem for existing systems, because if the developers care about FooBSD, that's not an unknown system.

Gnulib

The GNU portability layer takes the form of a collection of files that everyone copies into their packages and then neglect to update often. These files are often deeply integrated into the package (i.e. hard to disable properly). The principle of replacing standard library functions if broken or missing is not too terribly bad - but Gnulib mixes it with a huge paranoia that the host system is terribly broken and assumes the very worst when cross-compiling. The result is that when you cross-compile these ports, huge amounts of compatibility code gets compiled in, and much of this compatibility code does not even work on unknown platforms. Of particular fun is code that needs to integrate deeply into stdio internals or when it replaces your printf. The result is that as you port packages, you often find yourself fixing the same gnulib code over and over (each time subtly different depending on when it was forked). When you improve your stdio implementation to be more standards-compliant, you find yourself needing to fix all those gnulib stdio-internals-aware files all over again, because some silly internal changed.

The solution is to scream in horror at how troublesome and unnecessary this scheme is as obviously you are capable of implementing a correct operating system. This racketeering scheme has #error statements that tell you to upstream preprocessor conditionals for your operating system, so they can relish in even more complexity that didn't need to exist in the first place.

Sortie has developed a gnulib policy for his OS that describes how to handle gnulib and it has a long list of secret autoconf variables that makes gnulib assume everything is perfect.

Custom Configure Script

Some packages considers autoconf to be hellish and refuses to use it. They occasionally replace an autoconf-generated configure script with a hand-written one. This is well, but they unfortunately often fail to completely implement the same interface and don't support cross-compilation. You can attempt to patch the configure script to it implements the same interface, or do whatever else gets the job done. You probably want to yell at the developers.

No DESTDIR

The package might not support the DESTDIR variable. You should patch the Makefile (perhaps Makefile.in) to support it and tell the upstream developers. Perhaps it's called something else like INSTALLROOT, you can rename it in the Makefile or add INSTALLROOT ?= $(DESTDIR).

Packages Containing Dependencies

Sometimes packages contain copies of their dependencies as subdirectories. If you are really unlucky, they get used instead of the real deal. That means you have to port a two year old version all over again. If you are more unlucky, you get something that actually compiles on your operating system, but contains unfixed bugs and has security issues and this gets silently used. Extra fun is if they fixed bugs or enhanced the library and this is not in the upstream version. Tell upstream projects to get their act together if you see it, it's your responsibility to provide dependencies for a good reason.

Exotic problems

As you start porting packages that rarely get ported or cross-compiled, you'll likely start finding some seriously messed up packages. For instance, I once found a library that stores the path to the compiler and the compiler options in a header, so other libraries/programs depending on it could locate the compiler that way (and they did) - despite the fact they also had a full and generated configure script. At some point, you should reconsider whether this is actually software you want to port.

Upstreaming Local Patches

As you port more packages, you gradually get a patch collection. Many of these patches likely works around issues in your system, but other patches solves general issues in the packages. Ideally, you should not need to patch packages at all. Though, your operating system is not terribly important to support in the upstream version and any patches will likely go stale soon as your operating system improves. However, the patches not related to your operating system is likely of value upstream (or to the other osdever's that also port stuff). You should send such patches upstream or file bug reports. In the long run, it decreases the size of your patch collection and makes it more maintainable.

Patch Collections

It is probable that other community members have already ported a particular package. Their patches likely contain insight into how troublesome it is to do a port and what needs to be done. You can often find a collection of all their local patches in a central location.