Java For Starters

From OSDev Wiki
Jump to navigation Jump to search

The factual accuracy of this article or section is disputed.
Please see the relevant discussion on the talk page.

Introduction

Java is a very nice language, it is very easy to learn and also it's used in a lot of projects by millions of programmers, so it seems a good idea to show a way of OS development for all these Java developers and other people interested in Java. But before you start it would be nice to familiarize yourself with the basic OS concepts, described in this wiki.

The problem with OS development in Java

The main problem for people writing an OS in a language without flexible native tools is the lack of the tools. Also Java program needs Java Runtime Environment (JRE) to be run. But during OS startup there's no JRE and we should somehow manage to run Java program without it. As we can see, there's a need in tools and runtime support now. More information on using different languages in OS development is present here.

How to run you without a car?

Well, it's simple - you can use your legs. But yes, it's not very convenient. And it's better to have some kind of a vehicle to help you to get some speed. That's why it is important to understand what tools we have.

First, it is possible to use some combination of tools, written in C or Assembly, with some source code, written according to the Java syntax, and another part of code, written in C or Assembly. It's actually the way many people prefer, and here is the Pascal example, which shows such approach. But it should be noted, there's just a small part of the job done in Pascal. Without tools, written in other languages, there will be no Pascal OS. Also, without code in other languages (mostly assembly) there will be no Pascal OS.

Second, it's possible to write absolutely everything in the preferred language. And of course, it means somebody should write the "everything". Because in case of many languages there's no native tools available, it's the OS developer who should write "everything".

And third, most realistic way. An OS can be written using as much tools in the target language, as available, and all the rest can be borrowed from other languages and environments. It's something in between of the first and the second ways.

Here are some tools available for Java OS developers.

First, of course, it's the standard tool set as described here. It supports the first way of doing things - just some language flavor around the same C and Assembly based environment. And the flavor can be compiled by the GNU Compiler for the Java (GCJ), for example. The GCJ was written in C and supports only standard C related conventions. If a developer prefers this way it should be noted this article is not intended to be of a great help for such approach.

The second option (everything in Java) is described below. But it introduces some tools, already written in Java. The tools here are used as an example and there's no attempt to enforce you to use them. Following article shows how to create a simple bootloader in Java. There is a jump in the end of the bootloader code and all your potential Java OS can start after the jump is performed. So, your OS in Java can start with this article and end in some very interesting form, unseen until you'll have it done.

A bit of theory

Basically it is required to load an OS for it's code to be executed. Because we are talking about Java then it seems a good idea to understand first how Java can load itself (or an OS written in Java). There is a number of ways.

  • First, you can use tools like GRUB and forget about loading OS yourself.
  • Second, you can write a bootloader in another language, but not in Java, like assembly, for example.
  • And third, you can write a bootloader in Java.

The third option is described here. In fact, when the bootloader is actually written in Java, there's some subtle aspect related to the syntax involved. Somebody can say "it's not Java". Well, it's yes and no. But now you have an opportunity to decide it yourself.

First we have to look in the depth of the software execution. For the software to be executed there should be a processor which is able to understand software's instructions. Usually the code you see in many languages is not understandable by the processor. So, there's a translation layer between the code and the executable software. But the translation can be implemented in many forms. First it was the assembly. People had written short textual commands and software was required to translate the commands in the form the processor understands. Next people understood it's very tedious and time consuming to write software in such a form of short textual commands, so the high level languages were born. And as we know, Java is the one. But the translation level between Java and the actually executed code hides a lot of details. It is done for a reason. It helps a developer to concentrate on the goal instead of gory details. And with the goal achieved faster we, unfortunately, lose some flexibility the low level commands have. And finally people managed to mix the high level with the low level in a consistent manner. It was called "inline assembly". It's not ideal and there's still a way for better compilers with the ability to understand such a mixture without requiring a programmer to jump between two languages. But unfortunately, such compilers are still far away from Java developers. However, the idea of inline assembly works perfectly even in Java.

Now we can try to look at the way Java can inline the assembly. First, of course, there should be the assembly for Java. It's not the only possible implementation, but other variants are far too different from the assembly syntax. Because the assembly syntax is widely used in the hardware documentation and software examples it is a good idea to have the inline version which is as close to the generally accepted assembly syntax as possible.

And as we mentioned above there can be voices like "it's not Java!", we have to clear the subject a bit. If we remember the way a program goes along before being executed we can see some intermediary steps. It's the translation phase. The translation phase hides low level details and produces an executable file. But if we forget about the translation then it really can look like this - we write a program and (after some magic applied) it just runs. Yes, we can forget about translation and be happy with the actually running program. But the OS development is a thing that just requires us to go into the low level details. And if we go then it becomes obvious there's no more "the program" and there are just some bytes, produced by tools and executed by a processor. The actual execution details can be very different from what we expect while looking at the initial program. And the following information (hopefully) can show how things actually work between the layer of the programming language abstraction and the hardware layer. It's not Java or not Java talk, it's just way of doing things in Java. The same is true for every C or assembly program - there just must be some tools for the translation phase to work. And the tools can be written in C or assembly or (yes!) in Java.

How it works.

First, it's still a Java program. So, the Java syntax should be respected. But Java syntax allows us a lot of flexibility. And the flexibility was used as such - every assembly instruction was defined in a Java class and instance of this class with the same name, as is used in assembly, is referenced from a successor of the AssemblerProgram class. As a result you can write a standard Java method and use assembly instructions in it.

But it's not the actual "inline" assembly now. What the inline assembly project does is the translation and connection between Java code and the translated assembly code. Here we can see just the translation phase. It's actually the collection of assembly instructions step, which allows to translate the instructions in a binary form later (with the help of the getProgram() method). And the actual way of connection of translated assembly instructions with the Java code is outlined in the end of the article.

The collection of assembly instructions can look like this:

mov.x32(EAX,EDX);
  • Here the "mov" is the class's field name.
  • The "x32" is the Java method, but it also is used as a way to show us the instruction operand size (32 bits here).
  • The "EAX" and "EDX" are also fields of the ancestor class and represent well known in the assembly syntax registers.

The code line above tells us that we have the 32 bit move assembly instruction with operands in EAX (destination) and EDX (source) registers. hopefully, any developer with assembly knowledge can understand this code. But Java adds here type safeness, which (if you use IDE like Eclipse) can highlight errors like wrong register name for selected operand size right at the moment you are editing the code.

Next it is required to understand how the instruction above can be executed. For it to be possible the translation phase is required. But before the translation step it is usually a good idea to have the actual program. The program actually is just a number of text strings in the form described above. It can be good to separate code fragments intended for different purposes in separate methods like this:

protected void writeBootCodeInitialization(short stackAddress, short bootLoaderAddress)
{
	eflags.cld(); // eflags here is the common place for a number of flag manipulaton related instructions
	mov.x16(AX,(short)(bootLoaderAddress>>>4)); // mov WORD AX, immediate
	mov.x16(DS,AX); // ordinary 16 bit mov instruction
	xor.x16(AX,AX);
	mov.x16(ES,AX);
	mov.x16(SS,AX);
	mov.x16(SP,stackAddress);
}

And the class, where code fragments are supposed to be, can look like this:

public class AnAssemblyWriter extends AssemblerProgram
{
	public AnAssemblyWriter(Mode mode, DebugStream debugStream)
	{ super(mode, debugStream); }
	
	... // your code fragment methods
}

Here we see the class's constructor with two parameters, the mode parameter can be one of the following:

 	CODE16 // Generate 16-bit code
 	CODE32 // Generate 32-bit code
 	CODE64 // Generate 64-bit code

And the debugStream is just a wrapper around an arbitrary OutputStream. It can be used to print the resulting assembly program in textual form. The actual print is performed during AssemblerProgram's getProgram() call.

Next you can add a particular code to the bootloader for it to be able to actually load something. You can write a method like this:

protected void loadImageUsingInt13x42(String problemLabel, String diskAddressPacketLabel, int imageSizeInDiskSectors)
{
	String startTransfer="startTransfer", done="doneInt13x42";
	if (imageSizeInDiskSectors>BootstrapConstants.int13x42NumberOfBlocksLimit)
	{
		mov.x16(AX,(short)(imageSizeInDiskSectors-BootstrapConstants.int13x42NumberOfBlocksLimit));
		mov.x16(SI,diskAddressPacketLabel);
		label(startTransfer);
		push.x16(AX);
	}
	else mov.x16(SI,diskAddressPacketLabel);
	mov.x8(AH,(byte)0x42);
	mov.x8(DL,(byte)0x80);
	Int.x((byte)0x13);
	jcc.jc(problemLabel);
	if (imageSizeInDiskSectors>BootstrapConstants.int13x42NumberOfBlocksLimit)
	{
		mov.x16(SI,diskAddressPacketLabel);
		add.x16(i(SI,6),(short)(BootstrapConstants.int13x42NumberOfBlocksLimit<<5));
		add.x32(i(SI,8),BootstrapConstants.int13x42NumberOfBlocksLimit);
		pop.x16(AX);
		sub.x16(AX,(short)BootstrapConstants.int13x42NumberOfBlocksLimit);
		jcc.jae(startTransfer);
		add.x16(AX,(short)BootstrapConstants.int13x42NumberOfBlocksLimit);
		jcc.jz(done);
		mov.x16(i(SI,2),AX);
		xor.x16(AX,AX);
		jmp.near(startTransfer);
		label(done);
	}
}

The the final program (it's main method) can look like this:

private void createBootCode(int imageStartAbsoluteBlockNumber, int imageSizeInDiskSectors,
		short memoryAddressToPlaceBootstrapImageAt, short bootLoaderNewAddress,
		short bootLoaderInitialAddress, short bootLoaderSize)
{
	// define labels in one place for not to copy the same text in many places
	String diskReadProblemLabel="diskProblem", diskAddressPacketLabel="diskAddressPacket";

	// call a few methods to include separate code fragments in the resulting program
	writeBootCodeInitialization(memoryAddressToPlaceBootstrapImageAt,bootLoaderNewAddress);
	loadImageUsingInt13x42(diskReadProblemLabel, diskAddressPacketLabel, imageSizeInDiskSectors);

	mov.x16(AX,memoryAddressToPlaceBootstrapImageAt); // address to jump to
	// in fact this line is just for demonstration, because the actual address should include the segment and developers should keep it in mind
	jmp.near16(AX); // jump to the offset in AX and start loaded image code

	writeProceduresAndData(diskReadProblemLabel,diskAddressPacketLabel,
			imageSizeInDiskSectors,imageStartAbsoluteBlockNumber,memoryAddressToPlaceBootstrapImageAt);
}

And here you can see how helper procedures and data structures are written together with the actual bootloader's code.

private void writeProceduresAndData(String diskReadProblemLabel, String diskAddressPacketLabel,
		int imageSizeInDiskSectors, int imageStartAbsoluteBlockNumber, short memoryAddressToPlaceBootstrapImageAt)
{
	String printMsgProcLabel="printMsgLabel", diskErrorMsgLabel="diskErrorMsg";

	label(diskReadProblemLabel); // assembly label analog
	// it's inline because it's not supposed to be called, only jumps here are expected
	inline_printMessage_haltIfSecondZero(diskErrorMsgLabel,printMsgProcLabel);

	writePrintMessageProcedure(printMsgProcLabel); // it is supposed to ret in the end

	label(diskErrorMsgLabel);
	// the zeros at the end denote the end of string and the halt after shown flag
	writeString("Disk error\0\0"); // just write the string as a sequence of bytes

	label(diskAddressPacketLabel); // int 13*42 data structure follows here
	int imageSize=(imageSizeInDiskSectors>int13x42NumberOfBlocksLimit?int13x42NumberOfBlocksLimit:imageSizeInDiskSectors);
	byte bs[]=new byte[16]; // data structure buffer
	int memoryAddressToPlaceImageAt_offset=memoryAddressToPlaceBootstrapImageAt&0xf;
	int memoryAddressToPlaceImageAt_segment=memoryAddressToPlaceBootstrapImageAt>>4;
	@SuppressWarnings("resource") // this stream requires no close() call
	LittleEndianOutputStream leos=new LittleEndianOutputStream(bs); // it can write bytes to the bs array
	leos.w8(0x10); // write byte
	leos.w8(0);
	leos.w16(imageSize); // write short (16 bit)
	leos.w16(memoryAddressToPlaceImageAt_offset);
	leos.w16(memoryAddressToPlaceImageAt_segment);
	leos.w32(imageStartAbsoluteBlockNumber); // write int (32 bit)
	leos.w32(0);
	writeBytes(bs); // now add the bs buffer to the actual program
}

After you have all inline assembly instructions in place you can first call the methods with the instructions for them to write their binary representation and next call AssemblerProgram's getProgram() method to get the actual machine code stored in the returned byte array. The resulting bytes can be written to the boot sector using appropriate tools.

And, of course, this example shouldn't be considered as a good bootloader. It's just an example of a very simple bootloader, written in Java, which is able to load in memory a small image of 16-bit code and jump to it (do not forget offset register values).

What's next.

For the code above to work it is required to download some code. But the general direction of next step is towards the OS logic being defined as a more familiar Java program. The inline assembly shouldn't be used if there's no other way to define required logic. For it to be the case we need the translation phase. It is possible to compile Java code using GCC (GCJ) compiler, but it leads us to the area of C programmers with it's code formats, call conventions and a lot of compiler settings. The second option is to invent your own compiler. It's not so hard and in it's simplest form can be done just by translation of every Java bytecode in the assembly form. Example of such simple compiler also is included in the download, mentioned above. Beside of the translation the compiler allows to inline some assembly code and to invoke it from pure Java program. Also it worth to note the AssemblerProgramm class is able to produce 32 and 64 bit code and if you look at the package org.jembryos.boot.image._native.x86, and the BootRecordCode class (part of jEmbryoBootstrap project), then it is possible to find some examples that can help you to understand the tool better.