ISO 9660

From OSDev Wiki
Jump to navigation Jump to search
Filesystems
Virtual Filesystems

VFS

Disk Filesystems
CD/DVD Filesystems
Network Filesystems
Flash Filesystems

ISO 9660 is the standard file system for CD-ROMs. It is also widely used on DVD and BD media and may as well be present on USB sticks or hard disks. Its specifications are available for free under the name ECMA-119.

Overview and caveats

ISO 9660 is not a complex file system, but has a few quirks that are worth remembering. It seems that some operating systems also create non-compliant CDs, so beware! The main example of this is the character set that is available for file names. Strictly, filenames may only consist of uppercase letters A-Z, digits, dots, and underscores. Further there is a semicolon which separates the visible file name from its version number suffix. Many operating systems also allow lower case letters and other characters. Linux's VFS displays lower case filenames to the user despite the CD contents actually containing upper case characters.

Sector size

An ISO 9660 sector is normally 2 KiB long. Although the specification allows for alternative sector sizes, you will rarely find anything other than 2 KiB.

Numerical formats

Another quirk of the system is that it has several numbering formats and multi-byte numbers are often represented in both-endian format. The ISO 9660 standard specifies three ways to encode 16 and 32-bit integers, using either little-endian (least-significant byte first), big-endian (most-significant byte first), or a combination of both (little-endian followed by big-endian). Both-endian (LSB-MSB) fields are therefore twice as wide. For this reason, 32-bit LBA's often appear as 8 byte fields. Where a both-endian format is present, the x86 architecture makes use of the first little-endian sequence and ignores the big-endian sequence.

Encoding Description
int8 Unsigned 8-bit integer.
sint8 Signed 8-bit integer.
int16_LSB Little-endian encoded unsigned 16-bit integer.
int16_MSB Big-endian encoded unsigned 16-bit integer.
int16_LSB-MSB Little-endian followed by big-endian encoded unsigned 16-bit integer.
sint16_LSB Little-endian encoded signed 16-bit integer.
sint16_MSB Big-endian encoded signed 16-bit integer.
sint16_LSB-MSB Little-endian followed by big-endian encoded signed 16-bit integer.
int32_LSB Little-endian encoded unsigned 32-bit integer.
int32_MSB Big-endian encoded unsigned 32-bit integer.
int32_LSB-MSB Little-endian followed by big-endian encoded unsigned 32-bit integer.
sint32_LSB Little-endian encoded signed 32-bit integer.
sint32_MSB Big-endian encoded signed 32-bit integer.
sint32_LSB-MSB Little-endian followed by big-endian encoded signed 32-bit integer.


Date/time format

The date/time format used in the Primary Volume Descriptor is denoted as dec-datetime and uses ASCII digits to represent the main parts of the date/time:

Offset Size Datatype Description
0 4 strD Year from 1 to 9999.
4 2 strD Month from 1 to 12.
6 2 strD Day from 1 to 31.
8 2 strD Hour from 0 to 23.
10 2 strD Minute from 0 to 59.
12 2 strD Second from 0 to 59.
14 2 strD Hundredths of a second from 0 to 99.
16 1 int8 Time zone offset from GMT in 15 minute intervals, starting at interval -48 (west) and running up to interval 52 (east). So value 0 indicates interval -48 which equals GMT-12 hours, and value 100 indicates interval 52 which equals GMT+13 hours.

All fields except for the offset from GMT are in ASCII digits. When the date and time is not specified, all string fields are ASCII '0' (for a total of 16 ASCII zeroes) and the last field is binary zero.

String format

Character strings are encoded with ASCII encoding. The specification does not permit all characters. It defines two sets of characters: 'a-characters' and 'd-characters'. You will see these terms used in the descriptor tables throughout this article. The character sets are:

a-characters: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 _ 
              ! " % & ' ( ) * + , - . / : ; < = > ?
d-characters: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 _
Encoding Description
strA String with only ASCII a-characters, padded to the right with spaces.
strD String with only ASCII d-characters, padded to the right with spaces.

Note that not all CDs strictly adhere to the character sets specified in ISO 9660.

Filenames

Filenames must use d-character encoding (strD), plus dot and semicolon which have to occur exactly once per filename. Filenames are composed of a File Name, a dot, a File Name Extension, a semicolon; and a version number in decimal digits. The latter two are usually not displayed to the user.

There are three Levels of Interchange defined. Level 1 allows filenames with a File Name length of 8 and an extension length of 3 (like MS-DOS). Levels 2 and 3 allow File Name and File Name Extension to have a combined length of up to 30 characters.

The ECMA-119 Directory Record format can hold composed names of up to 222 characters. This would violate the specs but must nevertheless be handled by a reader of the filesystem.

Size Limitations

ISO 9660 filesystems can have up to 2 exp 32 blocks, i.e. 8 TiB. Normally they will be restricted to the size of optical media. (Currently up to 100 GiB with 4-layer BD-R.)

The maximum size of data files depends on the Level of Interchange that is intended for the ISO filesystem. Levels 1 and 2 allow for 4 GiB - 1, because a single Directory Record can claim up to that number of bytes. Level 3 allows to have multiple consequtive Directory Records with the same name. They all are to be concatenated to a single data file. This means that a single data file can nearly fill up the full 8 TiB of image size.

System Area

An ISO 9660 filesystem begins by 32 KiB which may be used for arbitrary data. This is often used to store boot information for the case that the ISO 9660 filesystem is not stored on optical media, but rather on a hard-disk-like device, e.g. on a USB stick.

So be prepared to find at that location a Master Boot Record (MBR, for BIOS), a GUID Partition Table (GPT, for EFI), or an Apple Partition Map (APM).

Volume Descriptors

When preparing to mount a CD, your first action will be reading the volume descriptors (specifically, you will be looking for the Primary Volume Descriptor).

Since sectors 0x00-0x0F of the CD are reserved as System Area, the Volume Descriptors can be found starting at sector 0x10 (16). The format of the volume descriptors is as follows:

Offset Length (bytes) Field name Datatype Description
0 1 Type int8 Volume Descriptor type code (see below).
1 5 Identifier strA Always 'CD001'.
6 1 Version int8 Volume Descriptor Version (0x01).
7 2041 Data - Depends on the volume descriptor type.

This means that each volume descriptor is therefore one sector (2 KiB) long.

Volume Descriptor Type Codes

The Volume Descriptor Type field specifies the type of Volume Descriptor:

Value Description
0 Boot Record
1 Primary Volume Descriptor
2 Supplementary Volume Descriptor
3 Volume Partition Descriptor
4-254 Reserved
255 Volume Descriptor Set Terminator

When starting out with a basic CD, we are going to be interested in the Primary Volume Descriptor, which points us to the root directory and path tables, which both allow us to find any file on the CD. Using the path table is ideal for minimal implementations which do not wish to search the directory hierarchy node by node. This is slower (string comparisons across the entire file system) but easier to implement.

The Boot Record

The first type of Volume Descriptor is the "Boot Record". The descriptor format is as follows:

Offset Length (bytes) Field name Datatype Description
0 1 Type int8 Zero indicates a boot record.
1 5 Identifier strA Always "CD001".
6 1 Version int8 Volume Descriptor Version (0x01).
7 32 Boot System Identifier strA ID of the system which can act on and boot the system from the boot record.
39 32 Boot Identifier strA Identification of the boot system defined in the rest of this descriptor.
71 1977 Boot System Use - Custom - used by the boot system.

The most common Boot System Use specification is El Torito. It records at bytes 71 to 74 as little-endian 32-bit number the block address of the El Torito Boot Catalog. This catalog lists the available boot images, which serve as starting points of booting systems.

The Primary Volume Descriptor

This is a lengthy descriptor, but it contains some very useful information for reading the rest of the file system.

Offset Length (bytes) Field name Datatype Description
0 1 Type Code int8 Always 0x01 for a Primary Volume Descriptor.
1 5 Standard Identifier strA Always 'CD001'.
6 1 Version int8 Always 0x01.
7 1 Unused - Always 0x00.
8 32 System Identifier strA The name of the system that can act upon sectors 0x00-0x0F for the volume.
40 32 Volume Identifier strD Identification of this volume.
72 8 Unused Field - All zeroes.
80 8 Volume Space Size int32_LSB-MSB Number of Logical Blocks in which the volume is recorded.
88 32 Unused Field - All zeroes.
120 4 Volume Set Size int16_LSB-MSB The size of the set in this logical volume (number of disks).
124 4 Volume Sequence Number int16_LSB-MSB The number of this disk in the Volume Set.
128 4 Logical Block Size int16_LSB-MSB The size in bytes of a logical block. NB: This means that a logical block on a CD could be something other than 2 KiB!
132 8 Path Table Size int32_LSB-MSB The size in bytes of the path table.
140 4 Location of Type-L Path Table int32_LSB LBA location of the path table. The path table pointed to contains only little-endian values.
144 4 Location of the Optional Type-L Path Table int32_LSB LBA location of the optional path table. The path table pointed to contains only little-endian values. Zero means that no optional path table exists.
148 4 Location of Type-M Path Table int32_MSB LBA location of the path table. The path table pointed to contains only big-endian values.
152 4 Location of Optional Type-M Path Table int32_MSB LBA location of the optional path table. The path table pointed to contains only big-endian values. Zero means that no optional path table exists.
156 34 Directory entry for the root directory - Note that this is not an LBA address, it is the actual Directory Record, which contains a single byte Directory Identifier (0x00), hence the fixed 34 byte size.
190 128 Volume Set Identifier strD Identifier of the volume set of which this volume is a member.
318 128 Publisher Identifier strA The volume publisher. For extended publisher information, the first byte should be 0x5F, followed by the filename of a file in the root directory. If not specified, all bytes should be 0x20.
446 128 Data Preparer Identifier strA The identifier of the person(s) who prepared the data for this volume. For extended preparation information, the first byte should be 0x5F, followed by the filename of a file in the root directory. If not specified, all bytes should be 0x20.
574 128 Application Identifier strA Identifies how the data are recorded on this volume. For extended information, the first byte should be 0x5F, followed by the filename of a file in the root directory. If not specified, all bytes should be 0x20.
702 37 Copyright File Identifier strD Filename of a file in the root directory that contains copyright information for this volume set. If not specified, all bytes should be 0x20.
739 37 Abstract File Identifier strD Filename of a file in the root directory that contains abstract information for this volume set. If not specified, all bytes should be 0x20.
776 37 Bibliographic File Identifier strD Filename of a file in the root directory that contains bibliographic information for this volume set. If not specified, all bytes should be 0x20.
813 17 Volume Creation Date and Time dec-datetime The date and time of when the volume was created.
830 17 Volume Modification Date and Time dec-datetime The date and time of when the volume was modified.
847 17 Volume Expiration Date and Time dec-datetime The date and time after which this volume is considered to be obsolete. If not specified, then the volume is never considered to be obsolete.
864 17 Volume Effective Date and Time dec-datetime The date and time after which the volume may be used. If not specified, the volume may be used immediately.
881 1 File Structure Version int8 The directory records and path table version (always 0x01).
882 1 Unused - Always 0x00.
883 512 Application Used - Contents not defined by ISO 9660.
1395 653 Reserved - Reserved by ISO.

Volume Descriptor Set Terminator

The Volume Descriptor Set Terminator does not currently define bytes 7-2047 of its Volume Descriptor. This means that the only fields in use for the volume set terminator are the type code (255), the standard identifier ('CD001') and the descriptor version (0x01).

Offset Length (bytes) Field name Datatype Description
0 1 Type int8 255 indicates a Volume Descriptor Set Terminator.
1 5 Identifier strA Always "CD001".
6 1 Version int8 Volume Descriptor Version (0x01).

The Path Table

The Path Table contains a well-ordered sequence of records describing every directory extent on the CD. There are some exceptions with this: the Path Table can only contain 65536 records, due to the length of the "Parent Directory Number" field. If there are more than this number of directories on the disc, some CD authoring software will ignore this limit and create a non-compliant CD (this applies to some earlier versions of Nero, for example). If your file system uses the path table, you should be aware of this possibility. Windows uses the Path Table and will fail with such non-compliant CD's (additional nodes exist but appear as zero-byte). Linux, which uses the directory tables is not affected by this issue.

The location of the path tables can be found in the Primary Volume Descriptor. There are two table types - the L-Path table (relevant to x86) and the M-Path table. The only difference between these two tables is that multi-byte values in the L-Table are LSB-first and the values in the M-Table are MSB-first.

The structure of a Path Table Entry is as follows:

Offset Size Description
0 1 Length of Directory Identifier
1 1 Extended Attribute Record Length
2 4 Location of Extent (LBA). This is in a different format depending on whether this is the L-Table or M-Table (see explanation above).
6 2 Directory number of parent directory (an index in to the path table). This is the field that limits the table to 65536 records.
8 (variable) Directory Identifier (name) in d-characters.
(variable) 1 Padding Field - contains a zero if the Length of Directory Identifier field is odd, not present otherwise. This means that each table entry will always start on an even byte number.

The path table is in ascending order of directory level and is alphabetically sorted within each directory level.

Directories

At some point when reading from an ISO 9660 CD, you will need a directory record to locate a file, even if you generally use the path table to locate the directory initially. Unlike the path tables, there is only one version of each directory table, and multi byte numbers are in both-endian format. Every directory will start with 2 special entries: an empty string, describing the "." entry, and the string "\1" describing the ".." entry. A directory record is laid out as follows:

Offset Size Type Description
0 1 int8 Length of Directory Record.
1 1 int8 Extended Attribute Record length.
2 8 int32_LSB-MSB Location of extent (LBA) in both-endian format.
10 8 int32_LSB_MSB Data length (size of extent) in both-endian format.
18 7 see format below Recording date and time.
25 1 see below File flags.
26 1 int8 File unit size for files recorded in interleaved mode, zero otherwise.
27 1 int8 Interleave gap size for files recorded in interleaved mode, zero otherwise.
28 4 int16_LSB-MSB Volume sequence number - the volume that this extent is recorded on, in 16 bit both-endian format.
32 1 int8 Length of file identifier (file name). This terminates with a ';' character followed by the file ID number in ASCII coded decimal ('1').
33 (variable) strD File identifier.
(variable) 1 -- Padding field - zero if length of file identifier is even, otherwise, this field is not present. This means that a directory entry will always start on an even byte number.
(variable) (variable) --

System Use - The remaining bytes up to the maximum record size of 255 may be used for extensions of ISO 9660. The most common one is the System Use Share Protocol (SUSP) and its application, the Rock Ridge Interchange Protocol (RRIP).

Even if a directory spans multiple sectors, the directory entries are not permitted to cross the sector boundary (unlike the path table). Where there is not enough space to record an entire directory entry at the end of a sector, that sector is zero-padded and the next consecutive sector is used. Some of the above fields need explanation. Unfortunately, the date/time format is different from that used in the Primary Volume Descriptor. The Date/Time format is:

Offset Size Description
0 1 Number of years since 1900.
1 1 Month of the year from 1 to 12.
2 1 Day of the month from 1 to 31.
3 1 Hour of the day from 0 to 23.
4 1 Minute of the hour from 0 to 59.
5 1 Second of the minute from 0 to 59.
6 1 Offset from GMT in 15 minute intervals from -48 (West) to +52 (East).

This is quite a contrast to the PVD which contains ASCII encoded decimal values, but this format is presumably used to save disc space over a large number of entries.

The other field that needs some explanation is the File Flags field. This is represented by one bit flags as follows:

Bit Description
0 If set, the existence of this file need not be made known to the user (basically a 'hidden' flag.
1 If set, this record describes a directory (in other words, it is a subdirectory extent).
2 If set, this file is an "Associated File".
3 If set, the extended attribute record contains information about the format of this file.
4 If set, owner and group permissions are set in the extended attribute record.
5 & 6 Reserved
7 If set, this is not the final directory record for this file (for files spanning several extents, for example files over 4GiB long.

Locating Data on the CD

By now, you should be able to see that there are two main ways to navigate to a file record. You an either search the path table, or you can search the full directory structure. You may find it more convenient and faster to cache the path table, loading directories only when necessary.

Searching the Path Table

If you are using the Path Table method, you will still need to know about Directory Records to find the file you are looking for. Basically, you search the path in a reverse order, following the "Parent Directory" links in the Path Table. Once you have located the directory containing the file you want, load that Directory and scan it for the appropriate file name.

Recursing from the Root Directory

Alternatively, you can ignore the Path Table and just cache the root directory from the Primary Volume Descriptor. You then load each directory in turn. For example, for the path '/BOOT/MYLOADER/STAGE2.BIN'

  1. Read the PVD in to memory. Bytes 156-189 contain the root directory entry.
  2. Load the root directory by reading the LBA and Length values in this root directory entry.
  3. Scan the directory entry identifiers for 'BOOT;1'.
  4. If found, use the LBA and length values to load the 'BOOT' directory in to memory.
  5. Repeat steps 3 and 4 for the file identifier 'MYLOADER;1'.
  6. Scan the 'MYLOADER' directory for 'STAGE2.BIN;1'. If found, you can now use the LBA value to load your file in to memory.

Rock Ridge and Joliet

There are two enhancements for ISO 9660 which make it more suitable for the worlds of Unix and of MS-Windows. Both can be combined in the same filesystem. So the reader often has the choice between three file name spaces: Plain ISO, Rock Ridge, Joliet.

ISO and Rock Ridge will show the same tree of files but with different names. Joliet can show a completely different tree than ISO.

Rock Ridge allows for file names of up to 255 characters of 8 bit. Only the 0-byte and the slash ("/") may not be used. Further it adds the file attributes which are specified by POSIX (owner, group, permissions,...) and it allows for symbolic links.

Rock Ridge is an application of SUSP. It may be accompanied by other SUSP applications like zisofs (compression of data files, Linux specific), Apple ISO 9660 Extensions, Amiga AS entries, or Arbitrary Attribute Interchange Protocol (AAIP: Extended Attributes and ACLs). A reader of SUSP entries shall simply ignore all entry types which it does not expect.

Joliet was defined by Microsoft Inc. to allow for filenames with up to 64 UCS-2 characters (16 bit). It is implemented as separate tree of Directory Records which begins by a root record in a Supplementary Volume Descriptor. That descriptor is similar to a Primary Volume Descriptor, but has a Type Code of 2.

See Also

Articles

  • El-Torito, a standard for creating bootable CD-ROMs
  • Mkisofs, about ISO 9660 producing programs: mkisofs, genisoimage, xorriso
  • Optical Drive, an overview about how to operate optical drives and media

External links