Porting Newlib

From OSDev Wiki
Jump to: navigation, search

This page is under construction! This page is a work in progress and may thus be incomplete. Its content may be changed in the near future.

Difficulty level
Difficulty 3.png
Advanced

Newlib is a C library intended for use on embedded systems available under a free software license. It is known for being simple to port to new operating systems. Allegedly, it's coding practices are sometimes questionable. This tutorial follows OS Specific Toolchain and completes it using newlib rather than using another C Library such as your own.

Porting newlib is one of the easiest ways to get a simple C library into your operating system without an excessive amount of effort. As an added bonus, once complete you can port the toolchain (GCC/binutils) to your OS - and who wouldn't want to do that?

Note: This tutorial is out-dated and needs to be updated and reformatted. Some of the advise here seems naive and blatantly wrong.

Contents

Introduction

I decided that after an incredibly difficult week of trying to get newlib ported to my own OS that I would write a tutorial that outlines the requirements for porting newlib and how to actually do it. I'm assuming you can already load binaries from somewhere and that these binaries are compiled C code. I also assume you have a syscall interface setup already. Why wait? Let's get cracking!

Preparation

Download newlib source (I'm using 1.15.0) from this ftp server.

Note: That's probably an old release.

System Calls

First of all you need to support a set of 17 system calls that act as 'glue' between newlib and your OS. These calls are the typical "_exit", "open", "read/write", "execve" (et al). The text below is taken straight from the Red Hat newlib C library documentation:

Note: This text is likely out of date and some of the function prototypes here violation the standard.

Note: Can we legally include the text here? It's not CC0. Is this fair use?

_exit
    Exit a program without cleaning up files.
    If your system doesn't provide this, it is best to avoid linking with subroutines that
    require it (exit, system).

close
    Close a file.

    Minimal implementation:

    int close(int file){
        return -1;
    }

environ
    A pointer to a list of environment variables and their values.
    For a minimal environment, this empty list is adequate:

    char *__env[1] = { 0 };
    char **environ = __env;

execve
    Transfer control to a new process.

    Minimal implementation (for a system without processes):

    #include <errno.h>
    #undef errno
    extern int errno;
    int execve(char *name, char **argv, char **env){
      errno=ENOMEM;
      return -1;
    }

fork
    Create a new process.

    Minimal implementation (for a system without processes):

    #include <errno.h>
    #undef errno
    extern int errno;
    int fork() {
      errno=EAGAIN;
      return -1;
    }

fstat
    Status of an open file.

    For consistency with other minimal implementations in these examples, all files are regarded as 
    character special devices.

    The `sys/stat.h' header file is distributed in the `include' subdirectory for this C library.

    #include <sys/stat.h>
    int fstat(int file, struct stat *st) {
      st->st_mode = S_IFCHR;
      return 0;
    }

getpid
    Process-ID;

    This is sometimes used to generate strings unlikely to conflict with other processes.       
    
    Minimal implementation, for a system without processes:

    int getpid() {
      return 1;
    }

isatty
    Query whether output stream is a terminal.

    For consistency with the other minimal implementations, which only support output to stdout,
    this minimal implementation is suggested:

    int isatty(int file){
       return 1;
    }

kill
    Send a signal.

    Minimal implementation:

    #include <errno.h>
    #undef errno
    extern int errno;
    int kill(int pid, int sig){
      errno=EINVAL;
      return(-1);
    }

link
    Establish a new name for an existing file.

    Minimal implementation:

    #include <errno.h>
    #undef errno
    extern int errno;
    int link(char *old, char *new){
      errno=EMLINK;
      return -1;
    }

lseek
    Set position in a file.

    Minimal implementation:

    int lseek(int file, int ptr, int dir){
        return 0;
    }

open
    Open a file. Minimal implementation:

    int open(const char *name, int flags, int mode){
        return -1;
    }

read
    Read from a file. Minimal implementation:

    int read(int file, char *ptr, int len){
        return 0;
    }

sbrk
    Increase program data space.
    As malloc and related functions depend on this, it is useful to have a working implementation.
    
    The following suffices for a standalone system;
    it exploits the symbol end automatically defined by the GNU linker. 	

    caddr_t sbrk(int incr){
      extern char end;		/* Defined by the linker */
      static char *heap_end;
      char *prev_heap_end;
     
      if (heap_end == 0) {
        heap_end = &end;
      }
      prev_heap_end = heap_end;
      if (heap_end + incr > stack_ptr)
        {
          _write (1, "Heap and stack collision\n", 25);
          abort ();
        }

      heap_end += incr;
      return (caddr_t) prev_heap_end;
    }

stat
    Status of a file (by name).
    
    Minimal implementation:

    int stat(const char *file, struct stat *st) {
      st->st_mode = S_IFCHR;
      return 0;
    }

times
    Timing information for current process.

    Minimal implementation:
     	
    clock_t times(struct tms *buf){
      return -1;
    }

unlink
    Remove a file's directory entry.

    Minimal implementation:

    #include <errno.h>
    #undef errno
    extern int errno;
    int unlink(char *name){
      errno=ENOENT;
      return -1; 
    }

wait
    Wait for a child process.

    Minimal implementation:

    #include <errno.h>
    #undef errno
    extern int errno;
    int wait(int *status) {
      errno=ECHILD;
      return -1;
    }

write
    Write a character to a file.

    `libc' subroutines will use this system routine for output to all files, 
    including stdout---so if you need to generate any output, for example to a serial port for
    debugging, you should make your minimal write capable of doing this. 

    The following minimal implementation is an incomplete example; it relies on a writechar
    subroutine (not shown; typically, you must write this in assembler from examples provided
    by your hardware manufacturer) to actually perform the output.

    int write(int file, char *ptr, int len){
        int todo;
      
        for (todo = 0; todo < len; todo++) {
            writechar(*ptr++);
        }
        return len;
    }

According to the documentation, you should also disable Newlib's macro definition for errno:

#include <errno.h>
#undef errno
extern int errno;

Re-entrant versions of these are a bit harder and are outlined in the documentation.

Note: Is this errno hack really what should be recommended rather than doing it properly?

My kernel exposes all the system calls on interrupt 0x80 (128d) so I just had to put a bit of inline assembly into each stub to do what I needed it to do. It's up to you how to implement them in relation to your kernel.

Porting Newlib

config.sub

Same as for binutils in OS Specific Toolchain.

configure.host

Tell newlib which system-specific directory to use for our particular target. In the section starting 'Get the source directories to use for the host ... case "${host}" in', add a section:

i[3-7]86-*-myos*)
    sys_dir=myos
    ;;

libc/sys/configure.in

Tell the newlib build system that it also needs to configure our myos-specific host directory. In the case ${sys_dir} in list, simply add

  myos) AC_CONFIG_SUBDIRS(myos) ;;

Note: After this, you need to run autoconf in the libc/sys directory.

libc/sys/myos

This is a directory that we need to create where we put our OS-specific extensions to newlib. We need to create a minimum of 4 files. You can easily add more files to this directory to define your own os-specific library functions, if you want them to be included in libc.a (and so linked in to every application by default).

libc/sys/myos/crt0.S

This file creates crt0.o, which is included in every application. It should define the symbol _start, and then call the main() function, possibly after setting up process-space segment selectors and pushing argc and argv onto the stack. A simple implementation is:

.global _start
.extern main
.extern exit
_start:
	call main
	call exit
.wait:
	hlt
	jmp .wait

It is also worth mentioning that crt0 can be written in C instead of Assembly on some platforms. There are a couple of reasons why you may want to do so, including that you would be able to properly find the entry point to programs written in C++, without having to worry about name mangling or using C linkage. It is also easier (marginally) to handle the argc and argv parameters in C.

Note: Probably not best to recommend people do this in C rather than assembly.

Note: This crt0.S isn't entirely right, see Creating a C Library for better examples.

libc/sys/myos/syscalls.c

This file should contain implementations for each of the system calls that newlib depends on. There is a list on the newlib website but I believe it to be slightly out of date as my version had some extra ones not documented there. Generally, each of these system calls should trigger an interrupt or use sysenter/syscall to run a kernel-space system call. As such, they are heavily OS-specific. A non-exhaustive list is:

/* note these headers are all provided by newlib - you don't need to provide them */
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/fcntl.h>
#include <sys/times.h>
#include <sys/errno.h>
#include <sys/time.h>
#include <stdio.h>
 
void _exit();
int close(int file);
char **environ; /* pointer to array of char * strings that define the current environment variables */
int execve(char *name, char **argv, char **env);
int fork();
int fstat(int file, struct stat *st);
int getpid();
int isatty(int file);
int kill(int pid, int sig);
int link(char *old, char *new);
int lseek(int file, int ptr, int dir);
int open(const char *name, int flags, ...);
int read(int file, char *ptr, int len);
caddr_t sbrk(int incr);
int stat(const char *file, struct stat *st);
clock_t times(struct tms *buf);
int unlink(char *name);
int wait(int *status);
int write(int file, char *ptr, int len);
int gettimeofday(struct timeval *p, struct timezone *z);

Note: Again, this list of function prototypes violate the standard.

libc/sys/myos/configure.in

Configure script for our system directory.

AC_PREREQ(2.59)
AC_INIT([newlib], [NEWLIB_VERSION])
AC_CONFIG_SRCDIR([crt0.S])
AC_CONFIG_AUX_DIR(../../../..)
NEWLIB_CONFIGURE(../../..)
AC_CONFIG_FILES([Makefile])
AC_OUTPUT

libc/sys/myos/Makefile.am

A Makefile template for this directory:

AUTOMAKE_OPTIONS = cygnus
INCLUDES = $(NEWLIB_CFLAGS) $(CROSS_CFLAGS) $(TARGET_CFLAGS)
AM_CCASFLAGS = $(INCLUDES)
 
noinst_LIBRARIES = lib.a
 
if MAY_SUPPLY_SYSCALLS
extra_objs = $(lpfx)syscalls.o
else
extra_objs =
endif
 
lib_a_SOURCES =
lib_a_LIBADD = $(extra_objs)
EXTRA_lib_a_SOURCES = syscalls.c crt0.S
lib_a_DEPENDENCIES = $(extra_objs)
lib_a_CCASFLAGS = $(AM_CCASFLAGS)
lib_a_CFLAGS = $(AM_CFLAGS)
 
if MAY_SUPPLY_SYSCALLS
all: crt0.o
endif
 
ACLOCAL_AMFLAGS = -I ../../..
CONFIG_STATUS_DEPENDENCIES = $(newlib_basedir)/configure.host

Note: After this, you need to run autoconf in the libc/sys/ directory, and autoreconf in the libc/sys/myos directory.

Note: autoconf and autoreconf will only run with automake version <= 1.12 and autoconf version 2.64 (exactly) (applies to newlib source pulled from git repository July 31 2013)

Signal handling

Newlib has two different mechanisms for dealing with UNIX signals (see the man pages for signal()/raise()). In the first, it provides its own emulation, where it maintains a table of signal handlers in a per-process manner. If you use this method, then you will only be able to respond to signals sent from within the current process. In order to support it, all you need to do is make sure your crt0 calls '_init_signal' before it calls main, which sets up the signal handler table.

Alternatively, you can provide your own implementation. To do this, you need to define your own version of signal() in syscalls.c. A typical implementation would register the handler somewhere in kernel space, so that issuing a signal from another process causes the corresponding function to be called in the receiving process (this will also require some nifty stack-playing in the receiving process, as you are basically interrupting the program flow in the middle). You then need to provide a kill() function in syscalls.c which actually sends signals to another process. Newlib will still define a raise() function for you, but it is just a stub which calls kill() with the current process id. To switch newlib to this mode, you need to #define the SIGNAL_PROVIDED macro when compiling. A simple way to do this is to add the line:

newlib_cflags="${newlib_cflags} -DSIGNAL_PROVIDED"

to your host's entry in configure.host. It would probably also make sense to provide sigaction(), and provide signal() as a wrapper for it. Note that the Open Group's definition of sigaction states that 1) sigaction supersedes signal, and 2) an application designed shouldn't use both to manipulate the same signal.

Compiling

You can build newlib in this manner:

cd $HOME/src
mkdir build-newlib
cd build-newlib
../newlib-x.y.z/configure --prefix=/usr --target=i686-myos
make all
make DESTDIR=${SYSROOT} install

However, if you try linking a previously written program with newlib you'll get undefined references everywhere: You haven't yet put your 'glue' into the newlib yet.

According to Jeff Johnston on the newlib mailing list:

So, you get the majority of the C library from newlib and the rest (syscalls) is usually in libgloss. Using an ld script makes life easy for the end-user as all they have to do is specify -Txxxx.ld. Inside the ld script you can specify all the libraries needed, where the entry point is, etc.... The libgloss library is a separate library and you name it whatever you want. The ld script handles all of this internally and the user doesn't need to know just what libraries there are out there.

Basically, in the libgloss directory you will find 17 files, all of which are the syscalls we wrote earlier. Put your code into these files, configure, build and then you'll have another library. Typically you would rename this library to something like "youros.a" (in my case "mattise.a") and tell all programmers to link with the linker script you write. An added bonus of this is that every executable for your OS uses a link script you've created.

Testing

I suggest writing a simple test program, the following will suffice:

int main()
{
    *((uint16_t*) 0xB8000) = 0x7020; // put a gray block in the top left corner
    return 0;
}

Put a little bit of code into your system call interface to print the function number that has been called and look for any possible calls.

One note about the above... I've already said this but I assume that you can load in an executable binary. Without being able to load an external binary a port of newlib becomes useless - unless you decide to link it into your kernel and use its features (but you still need to write the glue layer, and with different function names from the glue functions).

Conclusion

Well, you've done it. You've ported newlib to your OS! This is a really simple approach to the port but for those who just want to get it done and are happy to put together any special cases later (see the newlib/libc/sys/linux for an example of a special case) then there are heaps of resources out there that can help you out.

There is one obvious advantage to porting newlib: you can now port the toolchain and run binutils and GCC on your own OS. Almost self-hosting, how do you feel?

Good luck!

See Also

Articles

Personal tools
Namespaces
Variants
Actions
Navigation
About
Toolbox