Plain English Programming

From OSDev Wiki
Jump to navigation Jump to search

This page relates to Plain English Programming as developed by the Osmosian Order of Plain English Programmers. Notes primarily pertain to the CAL-4700 standalone IDE and compiler. There is at least one other version, Folds/english on Github, 'an authorized "dynamic fork"'. (The major differences are in UI, with Folds/english offering the standard Win32 UI elements.) Language differences, if any, should be noted below.

Español Llano is a Spanish-language equivalent.

The language is very much like pseudocode. It allows a small degree of arbitrary word choice, but the code understood by the compiler must precisely describe the program's operation. It's strongly typed with a tiny runtime; less than 2KB. It natively targets only 32-bit Windows, but is so simple that retargetting is likely no harder than any other aspect of OS development.

Far from "Hello, world," the first thing a user learns to do with CAL-4700 is recompile CAL itself. Then, the user is walked through making an app which takes a string, downloads an image from Google Image Search, and renders it in the style of famous painter Monet. It's a rather broad introduction to the language's features.

Note that a sense of humor will be an advantage when reading CAL's instructions.

Language

The language and its documentation are almost entirely free of jargon and symbols other than a subset of English punctuation. Only the period, comma, colon, and semicolon are commonly used. The minus sign functions as a unary minus for negating variables.

The Order's blog makes a case for the language being no more verbose than C or C++. This works if you're a good typist, finding whole words no more trouble to type than symbols. (Having a lot of practice in typing English, I (eekee) find it much less trouble to type than other languages. I find it hard to omit the spaces in both CamelCase and underscore-joined words, and I dislike symbols as I find using the shift key to be uncomfortable. CAL plain English only requires me to use 4 shifted symbols; colon double-quotes and the two parentheses.)

"Not" and the suffix "n't" are recognized and understood as you'd expect.

Some words are ignored. Development of the language started with the realization that human infants ignore a lot of words when they're starting to learn to understand language.

Specifying parameters is more verbose than in human English. Example: a number and another number and a third number and a fourth number. It could be shortened, but typically, you set up a structure and pass that to a routine. For instance, you set up a box and then "Draw the box with the black color." Routines with more than 2 parameters are rare in CAL-4700.

Wording tends to differ from English in some other ways too. For example, "Draw the box with the black color", rather than "Draw the box in black." FIXME: check if "in/into/to" is interpreted differently from "with". If not, the latter example is possible.

Definition order doesn't matter. (The Order likes to sort definitions by name and rely on incremental search to find them.)

Types

It has a very strict type system, surprisingly strict for a language intended to be plain English. "A buffer is a string", the compiler is told, and you can "Read the file into a buffer", but you can't "Read the file into a string." (The instructions are wrong on this point. eekee intends to raise the issue with the Order.) This example could be worked around with a record containing "a buffer and a string at the buffer," similar to the example of a box below.

Numbers are 32-bit signed integers. Unsigned numbers are not well-supported, but appear in ratios.

Floating-point support is not included for philosophical reasons. (Something to do with the philosopher Kronecker.)

Ratios are pairs of numbers, dividend/divisor.

Pointers with arithmetic permitted.

Strings are referenced with start and end pointers. Derived types include substrings and buffers. Riders are a more complex type which works on these. Memory allocation is fully automatic.

Records do the work of structs and unions. For example:

A box has
  a left coord, a top coord, a right coord, a bottom coord,
  a left-top spot at the left, and a right-bottom spot at the right.

Note the union of a 4-number structure with 2 "spots" -- pairs of numbers. Not shown: Each spot has an x and a y.

Things are records with implicit 'previous' and 'next' pointers for use in doubly-linked lists. These lists may be iterated over with very little code. You write, "get a <type> from the list", and later "Loop."

Many more types are built from these, especially for the GUI.

There is no syntax for genera-purpose arrays as the language designers felt this would not be plain English. This did not prevent them implementing strings as character arrays using pointers and pointer math. The many string routines are good examples, showing how simple this is, though they only cover single-byte pointer math. (Remember the size of numbers and pointers could change if you re-target the compiler.) Memory allocation is built on the routines "To assign a pointer given a byte count", "To reassign a pointer given a byte count", and "To unassign a pointer".

Routines vs. Functions

Like C functions, routines may take parameters and return values, but unlike C functions, they may not be part of an expression. You call one routine per line.

Functions may be part of expressions. Function calls may even look like references to structure elements. In that regard, they seem similar to the methods of a pure OO language but the instructions recommends using them sparingly.

Loops and Conditionals

Nesting of loops or conditionals is not supported. Conditionals may appear within loops, but may not be nested within each other. This was chosen for clarity.

In place of nested loops, testing the remainder from a division is commonly used. Alternatively, the inner loop may be put into another definition.

Conditionals generally call a routine for the true case. Multiple routines may be called, but good style is to fit the conditional onto one line, with just one or two calls. There is no "else" for the false case. Use the construct in the next paragraph.

Switches and "else" are handled with "if" and "exit":

If <decider 1>, <call 1>; exit.
If <decider 2>, <call 2>; exit.
If <decider 3>, <call 3>; exit.
<default case code>.

A decider is a routine which returns a boolean. These are built in a similar way to switches, but the return syntax is unique:

If <other decider 1>, say yes.
If <other decider 2>, say yes.
Say no.

Cross-Compiling with CAL

The compiler, text editor, "finder", and a document editor are all integrated into a single binary. In its original form, compilation is triggered by the Run menu entry in the text editor (Ctrl-R). All the files without an extension in the current directory are combined and compiled. The resultant Windows executable is then run. To compile OS code, CAL must be modified. CAL's subsystems are, on the whole, simple and easy to modify, so this will likely be no harder than many other elements of OS development. However, there are quite a number of different areas to change.

The compiler relies only on its own internals and files in the single source directory, there is no include search path, so there is very little to go wrong. Of course, you may wish to add path search for your own project as the present arrangement makes it hard to maintain a consistent version of the library. Folds/english may have useful code for this; it searches subfolders if a certain folder name is present.

Memory allocation

You will need some sort of allocator to use many of the language's features. The routines are, "To assign a pointer given a byte count", "To reassign a pointer given a byte count", and "To unassign a pointer". These call Win32 functions which need to be replaced. Additionally, initialization calls Win32 function GetProcessHeap, assigning the returned value to the global variable, "heap pointer".

The assembly language issue

CAL doesn't include an assembler, nor does it work with an external assembler. Rather, hex strings of machine code are entered in-place, together with comments showing the assembly language. It looks like this:

To add a number to a pointer;
To add a number to another number:
Intel $8B8508000000. \ mov eax,[ebp+8] \ the number
Intel $8B00. \ mov eax,[eax]
Intel $8B9D0C000000. \ mov ebx,[ebp+12] \ the other number
Intel $0103. \ add [ebx],eax

The good news is that there are only 447 "Intel" lines totalling 1841 bytes of machine code. They're all in the standard library, called the noodle.

The authors of CAL hand-assemble this code, but there are alternatives. You could write a script to assemble a file with raw output format and hexdump the result. Write each snippet to the file and run the script. Or, you could write an inline assembler. There's little need to cover the entire instruction set of your CPU, CAL-4700 only uses 36 different instructions with a small number of addressing modes. Using all the instructions is for full-complexity compiler projects; the sort which aren't likely to leave you with any time for your OS.

Binary format

To target x86-32-ELF, such as for a bootloader, modify the compiler to output the required format.

ABI

To target ABIs which pass arguments in registers, the minimum necessary change is to alter only the Call statement. Internal calls will still use the stack for argument passing.

Alternatively, you could modify the compiler so all calls pass arguments in registers. In CAL-4700, the compiler only emits 2 instructions on its own, to push and pop eax. Some work may be needed around those.

CPU architecture

First, decide if you're changing the ABI. Calling convention is baked into machine code in the noodle.

If you're keeping the ABI, the compiler may only need 4 lines changed; 2 each to push and pop a register. You may also want to check over other binary code emitted by the compiler for endianness and arch-dependent ABI details.

Changing "intel" to another name can be done with a simple search-and-replace on the compiler, replacing 12 instances on 11 lines.

The remaining work is to rewrite the 447 machine-code instructions in the noodle.

Graphics

CAL-4700 includes a basic GUI and, almost separately, turtle graphics. How easy would it be to use them in your OS?

The GUI depends on quite a number of Windows calls, from font rendering to roundy boxes. Obviously, these would need to be replaced or redesigned.

The turtle graphics are much simpler, needing only a line drawing routine.

A 96-character font is implemented in the turtle graphics. It looks better at some PPI values than others, but may be useful for debug messages or as a very retro stylistic choice. (To find the code, start with the routine "to write a string" in the noodle. The routine "to draw a string" uses Windows font routines.)

Coordinate system

The coordinate system works entirely with integers, but is not tied to pixels. The basic unit is the "twip" which is 1/20th of a printer's point. Inches and other units are converted to twips, which in turn are multiplied by a number derived from the PPI (pixels per inch) setting to produce pixel coordinates. Rotation may be specified in degrees or fractions of a circle.

Porting CAL

Porting CAL itself to your OS might seem like a quick way to get a file manager and text editor, but its drawing code relies on some relatively complex operations. At minimum, you'd have to provide routines to draw fonts and lines clipped to boxes, and convert the "roundy boxes" to square. It doesn't help that CAL's native vector font has entirely different interface routines from TrueType fonts.

Porting The Noodle

An odd feature of the compiler makes this relatively easy, initially. The compiler ignores routines which aren't called. Thus, the noodle's many Win32 calls can be replaced piecemeal. For example, to postpone dealing with sockets, comment out the lines to initialize and finalize winsock in the start up and shut down routines. On the other hand, this feature makes cleaning up harder.

CAL looks like an OS...

...with its own UI design and standards, but it's not. :) GUI code is merged, with several routines in the desktop file having code specific to the individual apps. This works well for CAL as a program, but it's an impossible framework for an OS unless you want the most inconvenient kind of Megalithic Kernel.

License

CAL-4700 is not under an open-source license. Each source file begins with a single-line copyright giving only years, "the osmosian order" and a version number. This unfortunately includes all the code which might be called the standard library, and the GUI code which you may (or may not) wish to use. It would be best to contact the Osmosian Order, tell them what you want to do; the programs and systems you want to create and the licensing you would like to release it under, and ask how their copyright will affect it.

Operating systems are amongst the things the Osmosian Order would like to see written in Plain English. This is stated in the Osmosian Manifesto. (PDF) The manifesto also gives an email address for contact.

Links

  • The Osmosian Order's blog — Introduces the language, presents examples and answers many questions.
  • Download - search the blog for "http" to get the latest link.
  • osmosian.com has very little content, just a semi-humorous slideshow and a link to the manifesto. (This server also hosts downloads.)
  • Folds/english on Github — An older version with an open-source licence. Its GUI is more deeply integrated with Windows.
  • Instructions (PDF) for a deeper look at the language. It's a little out of date. The up to date instructions are included in the CAL download.
  • The Osmosian Manifesto (PDF) explains what the Order would like to see, (including operating systems,) and gives a contact email.

People and OSs using PEP

  • eekee says, "I'm very happy to find a language which circumvents my problems with jargon and symbols. I can just read CAL's code without getting confused. I'm also happy with its simplicity and extensibility."