User:H0bby1
|
Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn! The content of this article or section is Lovecraftian. |
The factual accuracy of this article or section is disputed. |
how to prevent dependence on build environment in C
When a program is compiled to produce executable machine code, there is some part of the building process that is not described in the C language, but are left to the build chain to generate the executable code for a particular architecture.
what part of the build can be specific
- calling convention and data types: Calling convention are the way function are called, that specify how return address and parameters are passed to the function, as well as how the return value will be passed back to the caller. There are some calling convention that most compiler will recognize, but they are cpu dependent, and compilers might not use them by default in all cases, some compiler can allow for compiler specific calling conventions, using register instead of the stack to pass parameters for example. The way the compiler will deal with structure can also vary, regarding member alignment, and packing of members. Some compiler can also use specific data type declaration, for example for 64 bit integer, or have specific 16 bit char type for unicode strings.
- the C runtime : (crt, libgcc for linux), They are functions that are not part of the C standard, often made in assembler for the specific cpu target.The compiler uses them to perform an operation with an internal routine specific to the compiler and to the target architecture. It is also used to perform system specific application initialization before the 'main' function is entered and after the main function return to handle system specific process shutdown.
- the standard C library : (stdc, glibc for linux), The C Library is used to provide a set of standard functions that C program can use to access system specific resources. It is part of the build environment for the particular target, all C compilers who want to enable program to use the set of standard C functions must use an implementation of the C Library for the target they want to produce working executable for. This include the string and memory management functions (strcpy,memcpy,strtol,itoa) , memory allocation functions (malloc,calloc,realloc,free), floating point math and the ALU, i/o function (printf/sscanf/fopen/fread/fwrite/fclose) , as well as functions used to program sockets. The C Library is generally build on top of the kernel, and is used to provide a standard interface to make in sort C program are independent of any kernel specific functions.
Only the C runtime is generally linked statically because it's small enough, and those functions are totally specific to the compiler. The C library is generally linked dynamically, and has to be present and loaded by the target system at runtime.
The C library can also be linked statically alongside with the executable, in which case the exe will depend only on lower level system api or kernel that the C Library used is programmed for.As the C Library is made for the purpose to avoid direct kernel dependencies from the executable, and to provide a standard API to C program, it's more meant to be linked dynamically, to make it possible for the executable to be run on different system that implement the set of standard C functions. If it's linked statically, the executable will then depend on any lower level functions that the C Library use internally.
- On windows before visual studio 2005, the C library of visual studio was shipped in windows by default, so you didn't have to worry about it, but since vc2005, applications have to distribute the C library (msvc.lib) with them, either as a dll or linked statically in the application if they want to use visual studio C library.
- On linux and bsd, application are generally compiled and linked on the host plateform before to be used, so it make sure all the functions definition are compiled according to the host configuration.
different configurations
programming regular application
If you are programming an application, what you want is to take advantage of a particular development environment , and to use a maximum of compiler specific features and rely entirely on the C building suit to compile optimized code for the platform targeted by the build. The application need to be linked statically with the compiler's C runtime, and to either be distributed with the C library used to to build it, or to rely on the C Library implementation present on the target system to be compatible with the definitions of the C Function from the C Library used to build it.
If you are programming a library, or a framework, the 'guest' C program (the application using the library) will have to use shared data structures passed as function arguments with the framework, then you want to make sure the compiler use the specified calling convention to use the functions of the framework as well as identical structure packing if shared structure are involved.
Eventually you may want to avoid using Standard C functions to avoid the client application to have to include the C Library used to compile the framework as a dependence.
If the framework has to be linked statically, the C runtime can create conflicts between the framework library and the C runtime of the application's compiler, because both the framework and the application need their respective compiler's version of the runtime to be linked statically in the executable.
programming a kernel
A kernel generally cannot use the standard C library of the compiler, but some of the functions can be linked statically or implemented fully in the kernel code, in which case it shouldn't be a problem because it's unlikely the kernel's C routines will be compiled with a different building environment than the kernel itself.
However if those kernel functions have to be used as 'functions of the standard C library' by other programs, need to make sure the declaration of the standard C functions contained in the standard C header (stddef.h, string.h, math.h) used to build the client program are consistent with how the functions are defined in the kernel.
If the client program's compiler need to use kernel specific declaration of the function of the C Library, need to make sure it treat the call to those functions according to the explicit declaration of the header used , compilers often expects a specific implementation of the standard C library functions, so need to make sure the compiler follow the explicit declaration of the functions present in the header files and not the implicit declaration that the compiler might expect.
programming a operating system
Applications will have to use some data type and functions defined in the os source code, and to be able to use them in it's own source code. The calling convention attribute and parameters of how structure are aligned and packed, and how parameters and return address are expected to be passed to function of the os need to be specified explicitly in the shared function and data type declarations.
The compiler used to build the 'guest' application will expect that his own version of the standard C library will be used at runtime on the target system. So either the os will have to load the application's version of the C Library to run it, or either assume that the definitions of the C functions used by the application are consistent with the C library used at runtime.
If specific declaration of the function of the C Library are used, need to be sure the guest compiler use those definitions to compile the application, sometime compilers can override the definitions of the standard C functions with the type it expects from the standard C library used to build it, because it has certain specific way to optimize or deal with the call to those functions.
The configuration for an operating system is similar to the one of a framework, but need to take extra care about compiler specifics, because several version of a framework can be distributed to fit a particular building environment, but it's much harder to ship a version of the OS for each building environment that can be used to build an application or a component/module for it.
how C compiler deal with call to the standard lib C
As mentioned previously, compilers can sometime have a specific behavior when they encounter a call to a functions of the standard C library, and won't always behave exactly as expected from function declaration. This behavior is very compiler specific, and depend on options used for compilation, but it can either totally inline code, or use specific functions of the runtime to replace the call, or even optimize successive call in function or whole program level.
For example, given the following code
float c,s,a;
a=1.0;
c=cos(a);
s=sin(a);
The compiler can optimize into
fld [a]
fsincos
fstp [c]
fstp [s]
And use the fsincos instruction instead of two successive call to the standard C sin and cos functions.
The compiler can just inline some assembler routine to optimize call at function or whole program level, and/or insert call to its own C runtime library. It can use implicit declaration of the standard C function instead of the one explicitly defined in the header , or issue warnings if some of it's internal functions are redefined. If the program compiled has to be inter - compiler compatible on a binary level with the OS implementation of the C Library, it's safer to use functions with a different name than the functions of the compiler's standard C library.
There are many problems that can arise on the assembly level regarding what a compiler will do when encountering a call to a function of the C library. What it will expect of the C library code is not always defined explicitly, and that can break compatibility between an application and the host system due to the C library of the host used at runtime being incompatible with what the compiler expects.Even if the application use functions declarations consistent with the definition of the function in the host system.
how to avoid problems
always using the headers and C library specific to a build environment to build everything
The system will contain a single C library, and all programs compiled must use the corresponding C headers to declare function of the C library present in the system.
- pro
- Can use the C library and headers of the building environment.
- Can take full advantage of compiler and building options.
- The compiler will be able to optimize the code that use functions of the C Library.
- con
- C library headers often contain compiler specific directive, so it's most likely you need to use the same compiler to use the same headers file to declare the functions of the C library.
- Errors or inconsistencies can happen if two programs that need to use the same C library use different C Library headers, or use same C Library headers but are compiled by different compilers or with different options.
- Need to make sure the C Library used at runtime is compatible with the one that is used to build it.
not using the build environment specific headers and C library nor runtime
Not using any functions or data type declaration that depend on the build environment specific implementation of the standard C function.
- pro
- all programs can be compiled with any compilers without having inconsistencies and can include the same cross compiled headers to declare the equivalent of the C functions.
- Executable file will be smaller because they don't need to incorporate the runtime
- Two program compiled with different compilers can be linked statically with each others
- Can control entierly the way the compiler will process calls to the C library functions
- con
- Compiler can't optimize the calls to the function of the C library.
- Need to rewrite the function of the C library that program uses and eventually part of the runtime of each compilers that you want to be compatible with, or making explicit call to specific function to perform the operation instead of using the compiler runtime library implicitly.
- Compiler's C library are generally more optimized, and can be adapted automatically by compiler depending on compilation options.
- Can make the port of application a bit harder, but most of the job can be done using preprocessor directives, and lot of application and library that are 'portable' give an easy way to use system specific function to replace the standard C library.
the make up of a typical C Library
Even if you want to get rid of building environment dependence, you still want to implement functions with an identical behavior to be able to port easily applications that use the standard C library to your os, but without depending on anything specific to the build environment for declaring , implementing and compiling them.
It is also preferable to declare the replacement for the standard C functions in header file which name are different from the standard library include file name (stdlib.h,string.h,math.h etc) , because the wrong one could be selected if include path are not configured properly, so it's better to use a different name for both the name of the header file in which the function are declared, and a different name for the C function themselves, to make sure the compiler will effectively deal with the functions as defined explicitly in their declarations and that they will not conflict with compiler internal/built in definitions.
the standard C library
stddef.h
This file is mainly used to define basic type that other functions definitions will use, it must be included first as all the declaration of the C standard library depend on them.
It often include compiler and system detection using preprocessor conditional compilation.
Each compiler will define a certain number of preprocessor variables automatically, and some code can be compiled conditionally depending on the target arch, operating system, version of the compiler , that are expected to be set by the compiler or the building environment. Like this, the same header files can be used to compile for different platforms.
The exact interpretation of the C library header depend on many variables that are expected to be set by the compilers and that are specific to each compiler.
malloc.h
This file contain declaration for functions related to memory allocation, such as malloc, calloc, realloc, and free. According to the C standard, this file should not be included directly, and the declaration of those function should be used by including stdlib.h, as malloc.h will not always be present in all C Library.
string.h
This file contain declaration for functions related to string and memory manipulation, such as memcpy, strcpy, memset, strcat, strtol, itoa etc
Most C Library will use a very complex syntax to declare these functions, because the good declaration depend on variable set for a specific build, and will contain lot of compiler specific directive that will often be hard to use with other compilers. It can be use to toggle feature used for debugging, or feature of the compiler's C runtime, which tend to make it hard to have standard C functions declaration that are consistent across different version of C headers from different implementation of the C Library, and it's most likely each building environment will require his own version of the C function header declaration to function properly with all possible compilations options.
This is an example of string.h from android glibc2.7-4.6
#define strcpy(dest, src) \
((__ssp_bos (dest) != (size_t) -1) \
? __builtin___strcpy_chk (dest, src, __ssp_bos (dest)) \
: __strcpy_ichk (dest, src))
static inline __attribute__((__always_inline__)) char *
__strcpy_ichk (char *__restrict__ __dest, const char *__restrict__ __src)
{
return __builtin___strcpy_chk (__dest, __src, __ssp_bos (__dest));
}
math.h
This file contain declaration for functions related to math functions, either to deal with floating point numbers, trigonometric functions, or arithmetic/logical operations
stdio.h
This file contain declaration for functions related to i/o functions, mostly used to deal with file and text formatting functions, such as fopen, fread, fwrite, fclose, printf, sscanf etc
the C runtime
It often take the form of a static library that will be linked statically into the executable file generated. Any specific functions the compiler need to generate executable code for a particular plateform that is not part of the set of standard C functions will be part of the compiler's C runtime.
Features of C compilers that will require some functions from the C runtime:
- memory and stack runtime security check
- performance profiling
- RTTI (run time type information for c++ class using virtual functions)
- 64 bit integer arithmetic or logical operation on 32 bit arch
- system specific functions to initialize and terminate the application before and after the execution of the 'main' function.
to have compiler independent executable either :
- Those feature must be disabled.
- Write an implementation of the compiler's runtime, the source of the runtime is often shipped with compilers, so you can copy / paste the code of the runtime function used, and declare them either statically in the exe/shared lib, or resolve reference to them at runtime as imported function.
- Make your own version of the functions to replace the compiler's specific runtime function in sort that it doesn't have to generate any call to its runtime during compilation.
It's not absolutely necessary to avoid dependence on a C Compiler runtime for executable or shared library, as the code will be linked statically with them, but they can include code that reference system specific functions, specially for the code that relate to application initialization and termination, and it can be a bit of overhead in the exe file. Some of the runtime function can be ok, but it's hard to make sure which one the compiler will use unless you prevent it to link it's runtime in the exe, in which case you'll see the reference to the runtime as unresolved symbol at link time. And you might not want to have to deal with specific initialization code that compiler's runtime may need to run the executable.
For static library, it is necessary if the static library has to be linked with a program compiled with a different compiler. Because the runtime of the library's compiler will conflict with the runtime of the other program's compiler.
how to write build environment independent code
Most program made in C will use some functions of the standard C Library, the policy can either be :
- Making it easy to port dependence on the functions of the C library in existing application. Then need to implement a library that is as compatible as possible with other C Libraries, but without using anything specific to the build environment to implement it.
- To provide a totally different internal api to manipulate memory, strings , math and file, and then need to totally rewrite code that use them in an application to port it.
- To provide a specific internal api, but also providing the functions compliant with standard C library.
The simplest is to declare functions with the same parameters and return type than the ones of the standard C Library, but with a different name. Using a suffix or prefix in the name of the function declaration is generally convenient, application will have to use this name instead of the standard C function name.
Those functions should be defined in header files that have the same organisation than the standard C library to make porting easier.
The header file of the standard C library used to compile the applications will not be the ones the C compiler expect as the C Library header, but an os specific version of the header containing the declaration of the standard C functions replacement, and all dependences to the functions of Standard C Library in the application will be switched to this specific library that the compiler will not recognize as functions of the standard C Library.
intel calling conventions
Each cpu architecture have their own set of calling convention, most intel compiler will support any of those in a standard manner:
- for intel i386
- __cdecl : arguments are passed on the stack, and the stack is restored by the caller after the call, integer return value in eax
- __stdcall : the argument are passed on the stack, and the stack is restored by the function ,integer return value in eax
__cdecl is safer, because the functions have no way to know how the caller passed the argument, but __stdcall is faster and more compatible even with older compilers. on i386 linux, gcc will use a calling convention identical to __cdecl by default.
- for intel x64
- sysv : calling convention used by default in intel 64 bit sysV platform.
- microsoft : calling convention supported by microsoft 64 bit compilers and 64 bit system.
creating simple header file to replace the standard C library declarations for i386
so this declaration in string.h from gcc C Library header file
#define strcpy(dest, src) \
((__ssp_bos (dest) != (size_t) -1) \
? __builtin___strcpy_chk (dest, src, __ssp_bos (dest)) \
: __strcpy_ichk (dest, src))
static inline __attribute__((__always_inline__)) char *
__strcpy_ichk (char *__restrict__ __dest, const char *__restrict__ __src)
{
return __builtin___strcpy_chk (__dest, __src, __ssp_bos (__dest));
}
would become this in build independent os_string.h
char *os_strcpy (char *dest, const char *src);
adding compiler specific directive
now it still need to specify the functions attribute like calling convention , if the function is local, imported or exported, in a way that every compilers will understand and apply, so you need the equivalent of an stddef.h
os_def.h
if compiling for x64, the calling convention for x64 need to be used instead of __cdecl. the Microsoft calling convention will be supported as well by gcc, so it would be preferable to use, as Microsoft compilers do not support the sysV calling convention.
#ifdef MSVC_VER
#define OS_API_CALL __cdecl
#endif
#ifdef GNU_C
#define OS_API_CALL attribute('__cdecl')
#endif
The further declaration of the C functions then use preprocessor variable to generate the code specific to the compiler used to compile it.
os_string.h
char * OS_API_CALL os_strcpy (char *dest, const char *src);
implementing the function
Compiler specific attributes doesn't need to be specified in the function implementation, because the header file containing its declaration is included so the compiler will consider automatically that this function is the implementation of the one being declared in the header, and apply automatically all the attribute present in the declaration. If the return type, name and argument type of the function definition match the declaration, the compiler will apply automatically all the attributes of the declaration to the implementation.
os_string.c
#include "os_def.h"
#include "os_string.h"
char * os_strcpy (char *dest, const char *src)
{
//copy the string and return;
}
using it in an application
main.c
#include "os_def.h"
#include "os_string.h"
int main(int argc,char **argv)
{
char my_string[32];
os_strcpy(my_string,argv[0]);
}
Like this, it make sure the compiler will use the good declaration of os_strcpy , and any specific syntax needed by the compiler for the particular build is handled in os_def.h.
Then program should be compiled with '-nodefaultlibs' with gcc, or 'nodefaultlibrary' (ignore all default library in the linker settings) with visual studio , in sort that it doesn't attempt to link either the runtime or any build specific library in the generated executable code, which make the executable generated as independent as possible on the build environment, and dependent explicitly on your os specific implementation of the C Library functions.
making port easier
Some preprocessor directives can be used to automatically transform call to the functions of the standard C library to call made to the os specific function.
char * OS_API_CALL os_strcpy (char *dest, const char *src);
#define strcpy os_strcpy
Like this, the preprocessor will transform any direct call to strcpy to os_strcpy automatically before the compiler start to parse the file, so all references to standard C functions can easily be translated to call to the os specific functions to make the port of application that make direct call to function of the C library easier.
Using a preprocessor definition is not always reliable, because some library or other programs can redefine the C functions as macros, but normally it should issue a warning if that happen, and either the redefinition in the library should be ignored, or either the definition of those macros should be set to use the os function.
handling import/export
Imports and exports are not essentially handled by the compiler, they are handled by the linker and depend on the ABI used, but compilers often have a way to tell the linker how the functions should be resolved at runtime.
When using imported function, the os will be responsible for resolving the address of all imported symbols at runtime.
If the C library is to be compiled as a shared Library, then the function need to be declared as local exported functions when compiling the C library, and as imported function in all the programs that will be linked with it.
os_def.h
#ifdef MSVC_VER
#define OS_API_CALL __cdecl
#define OS_SHARED __decslspec(dllimport)
#define OS_EXPORT __decslspec(dllexport)
#endif
#ifdef __GNU_C__
#define OS_API_CALL attribute('__cdecl')
/*
All symbols are exported by default in elf and symbols are just set as unresolved at link time if they are not defined, so empty declaration.
*/
#define OS_SHARED
#define OS_EXPORT
#endif
#ifndef OS_LIBC_FUNC
#define OS_LIBC_FUNC OS_SHARED
#endif
os_string.h
OS_LIBC_FUNC char * OS_API_CALL os_strcpy (char *dest, const char *src);
#define strcpy os_strcpy
os_string.c
#define OS_LIBC_FUNC OS_EXPORT
#include "os_def.h"
#include "os_string.h"
char * os_strcpy (char *dest, const char *src)
{
//copy the string and return;
}
main.c
The source of the application shouldn't change, excepted the standard C function name can be used due to the preprocessor definition.
#include "os_def.h"
#include "os_string.h"
int main(int argc,char **argv)
{
char my_string[32];
strcpy(my_string,argv[0]);
}