User:Oros/FAT

From OSDev Wiki
Jump to: navigation, search
Filesystems
Virtual Filesystems

VFS

Disk Filesystems
CD/DVD Filesystems
Network Filesystems
Flash Filesystems

The File Allocation Table (FAT) file system was introduced with DOS v1.0 (and possibly CP/M). Supposedly written by Bill Gates, FAT is a very simple file system which is, at its most basic level, nothing more than a singular linked list of clusters. FAT file systems use very little memory and is one of, if not the, most basic file system in use today.

Contents

Overview

There are several different versions of the FAT file system. Each version was designed for a different size of storage media.

Name Introduced Max File Size Max Cluster Count Max Volume Size Common Medium
FAT12 <1980 4GB - 1 Byte 4,077 32MB Floppies
FAT16 Nov. 1987 4GB - 1 Byte 65,517 2GB, 4GB Portable storage (SD, MMC, ect...)
FAT32 Aug. 1996 4GB - 1 Byte 268,435,437 2TB, 8TB Hard Drives

Implementation Details

The FAT file system views the storage media as a flat array of clusters. If the physical media does not address its data as a flat list of sectors (really old hard disks and floppy disks) then the cluster numbers will need to be translated before being sent to the disk. The storage media is organized into three basic areas.

  • The boot record
  • The File Allocation Table (FAT)
  • The directory and data area

Boot Record

The boot record occupies one sector, and is always placed in logical sector number zero of the storage media. (which is physically cylinder 0, head 0, sector 1) This is the easiest sector on the disk for the computer to locate when it begins running. If the storage media is a hard disk partition, the boot sector will begin at the start of the partition, not the start of the disk.

Standard Boot Record

Byte Offset Length (bytes) Description
0 3 The first three bytes 6B 3C 90 disassemble to JMP SHORT 3C NOP. The reason for this is to jump over the disk format information. Since the first sector of the disk is loaded into ram at location 0x0000:0x7C00 and executed, without this jump, the processor would attempt to execute data that isn't code.
3 8 OEM Name (padded with spaces). This value determines in which system disk was formatted. MS-DOS checks this field to determine which other parts of the boot record can be relied on. Common values are IBM 3.3 (with two spaces between the "IBM" and the "3.3"), MSDOS5.0, MSWIN4.1 and mkdosfs.
11 2 The number of Bytes per sector (remember, all numbers are in the little-endian format).
13 1 Number of sectors per cluster.
14 2 Number of reserved sectors.
16 1 Number of File Allocation Tables (FAT's) on the storage media. Often this value is 2.
17 2 Number of directory entries (must be set so that the root directory occupies entire sectors).
19 2 The total sectors in the logical volume. If this value is 0, it means there are more than 65535 sectors in the volume, and the actual count is stored in "Large Sectors (bytes 32-35).
21 1 This Byte indicates the media descriptor type.
22 2 Number of sectors per FAT.
24 2 Number of sectors per track.
26 2 Number of heads or sides on the storage media.
28 4 Number of hidden sectors.
32 4 Large amount of sector on media. This field is set if there are more than 65,535 sectors in the volume.

Extended Boot Record

The extended boot record comes right after the standard boot record. It contains different information depending on whether this is a FAT 12, FAT 16, or FAT 32 storage media.

FAT 12 and FAT 16
Byte Offset Length (bytes) Description
36 1 Physical drive number. The values here are identical to the values returned by the BIOS interrupt 0x13. 0x00 for a floppy disk and 0x80 for hard disks.
37 1 Reserved field. In Windows NT bit 0 is a dirty flag to request chkdsk at boot time. bit 1 requests surface scan too.
38 1 Signature (must be 0x28 or 0x29).
39 4 ID (serial number) Used for tracking disks between computers. (For most purposes just ignore this field).
43 11 Volume label, padded with blanks (0x20).
54 8 System identifier string. This field is a string representation of the FAT file system type. It is padded with spaces.
FAT 32
Byte Offset Length (bytes) Meaning
36 4 The size of the File Allocation Table in bytes.
40 2 Flags.
42 1 Signature (must be 0x28 or 0x29).
39 2 FAT version number. The high byte is the major version and the low byte is the minor version. FAT drivers should respect this field.
44 4 The cluster number of the root directory. Often this filed is set to 2.
48 2 The cluster number of the FSInfo structure.
50 2 The cluster number of the backup boot sector.
52 12 Reserved. When the volume is formated these bytes should be zero.
64 1 Drive number. The values here are identical to the values returned by the BIOS interrupt 0x13. 0x00 for a floppy disk and 0x80 for hard disks.
65 1 Flags in windows NT. Reserved otherwise.
66 1 Signature (must be 0x28 or 0x29).
67 4 VolumeID 'Serial' number. Used for tracking volumes between computers. You can ignore this if you want.
71 11 Volume label string. This field is padded with spaces.
82 8 System identifier string. Always "FAT32 ".

File Allocation Table

The File Allocation Table (FAT) is a table stored on the storage media that indicates the status and location of all data clusters that are on the disk. It can be considered the "table of contents" of a disk. The cluster may be available for use, it may be reserved by the operating system, it may be unavailable due to a bad sector on the disk, or it may be in use by a file. The clusters of a file need not be right next to each other on the disk. In fact it is likely that they are scattered widely throughout the disk. The FAT allows the operating system to follow the "chain" of clusters in a file.

FAT 12

FAT 12 uses 12 bits to address the clusters on the disk. Each 12 bit entry in the FAT points to the next cluster of a file on the disk. Given a valid cluster number, here is how you extract the value of the next cluster in the cluster chain:

unsigned char FAT_table[cluster_size];
unsigned int fat_offset = active_cluster + (active_cluster / 2);// multiply by 1.5
unsigned int fat_sector = first_fat_sector + (fat_offset / cluster_size);
unsigned int ent_offset = fat_offset % cluster_size;

//at this point you need to read from sector "fat_sector" on the disk into "FAT_table".

unsigned short table_value = *(unsigned short*)&FAT_table[ent_offset];

if(current_cluster & 0x0001)
   table_value = table_value >> 4;
else
   table_value = table_value & 0x0FFF;

//the variable "table_value" now has the information you need about the next cluster in the chain.

If "table_value" is greater than or equal to (>=) 0xFF8 then there are no more clusters in the chain. This means that the whole file has been read. If "table_value" equals (==) 0xFF7 then this cluster has been marked as "bad". "Bad" clusters are prone to errors and should be avoided. If "table_value" is not one of the above cases then it is the cluster number of the next cluster in the file.

FAT 16

FAT 16 uses 16 bits to address the clusters on the disk. Because of this, it is much easier to extract the values out of a 16 bit File Allocation Table. Here is how it is done:

unsigned char FAT_table[cluster_size];
unsigned int fat_offset = active_cluster * 2;
unsigned int fat_sector = first_fat_sector + (fat_offset / cluster_size);
unsigned int ent_offset = fat_offset % cluster_size;

//at this point you need to read from sector "fat_sector" on the disk into "FAT_table".

unsigned short table_value = *(unsigned short*)&FAT_table[ent_offset];

//the variable "table_value" now has the information you need about the next cluster in the chain.

If "table_value" is greater than or equal to (>=) 0xFFF8 then there are no more clusters in the chain. This means that the whole file has been read. If "table_value" equals (==) 0xFFF7 then this cluster has been marked as "bad". "Bad" clusters are prone to errors and should be avoided. If "table_value" is not one of the above cases then it is the cluster number of the next cluster in the file.

FAT 32

FAT 32 uses 28 bits to address the clusters on the disk. Yes, that is right. FAT 32 only uses 28 of it's 32 bits. The highest 4 bits are reserved. This means that they should be ignored when read and unchanged when written. Besides this small detail, extracting a value from a 32 bit FAT is almost identical to the same operation on a 16 bit FAT:

unsigned char FAT_table[cluster_size];
unsigned int fat_offset = active_cluster * 4;
unsigned int fat_sector = first_fat_sector + (fat_offset / cluster_size);
unsigned int ent_offset = fat_offset % cluster_size;

//at this point you need to read from sector "fat_sector" on the disk into "FAT_table".

//remember to ignore the high 4 bits.
unsigned int table_value = *(unsigned int*)&FAT_table[ent_offset] & 0x0FFFFFFF;

//the variable "table_value" now has the information you need about the next cluster in the chain.

If "table_value" is greater than or equal to (>=) 0x0FFFFFF8 then there are no more clusters in the chain. This means that the whole file has been read. If "table_value" equals (==) 0x0FFFFFF7 then this cluster has been marked as "bad". "Bad" clusters are prone to errors and should be avoided. If "table_value" is not one of the above cases then it is the cluster number of the next cluster in the file.

Directories

A directory entry simply stores the information needed to know where a file's data or a folder's children are stored on the disk. It also holds information such as the entry's name, size, and creation time. There are two types of directories in a FAT file system. Standard 8.3 directory entries, which appear on all FAT file systems, and Long File Name directory entries which are optionally present to allow for longer file names.

Standard 8.3 format

Byte Offset Length Description
0x00 8 DOS file name (padded with spaces)

The first byte can have the following special values:

0x00 Entry is available and no subsequent entry is in use
0x05 Initial character is actually 0xE5. 0x05 is a valid kanji lead byte, and is used for support for filenames written in kanji.
0x2E 'Dot' entry; either '.' or '..'
0xE5 Entry has been previously erased and is available. File undelete utilities must replace this character with a regular character as part of the undeletion process.
0x08 3 DOS file extension (padded with spaces)
0x0b 1 File Attributes
Bit Mask Description
0 0x01 Read Only
1 0x02 Hidden
2 0x04 System
3 0x08 Volume Label
4 0x10 Subdirectory
5 0x20 Archive bit
6 0x40 Device (internal use only, never found on disk)
7 0x80 Unused

An attribute value of 0x0F is used to designate a long file name entry.

0x0c 1 Reserved; two bits are used by NT and later versions to encode case information (see below); otherwise 0.
0x0d 1 Create time, fine resolution: 10ms units, values from 0 to 199.
0x0e 2 Create time. The hour, minute and second are encoded according to the following bitmap:
Bits Description
15-11 Hours (0-23)
10-5 Minutes (0-59)
4-0 Seconds/2 (0-29)

Note that the seconds is recorded only to a 2 second resolution. Finer resolution for file creation is found at offset 0x0d.

0x10 2 Create date. The year, month and day are encoded according to the following bitmap:
Bits Description
15-9 Year (0 = 1980, 127 = 2107)
8-5 Month (1 = January, 12 = December)
4-0 Day (1 - 31)
0x12 2 Last access date; see offset 0x10 for description.
0x14 2 EA-Index (used by OS/2 and NT) in FAT12 and FAT16, High 2 bytes of first cluster number in FAT32
0x16 2 Last modified time; see offset 0x0e for description.
0x18 2 Last modified date; see offset 0x10 for description.
0x1a 2 First cluster in FAT12 and FAT16. Low 2 bytes of first cluster in FAT32. Entries with the Volume Label flag, subdirectory ".." pointing to root, and empty files with size 0 should have first cluster 0.
0x1c 4 File size in bytes. Entries with the Volume Label or Subdirectory flag set should have a size of 0.

Long File Names

Long file name entries always have a regular 8.3 entry to which they belong. The long file name entries are always placed immediately before their 8.3 entry. Here is the format of a long file name entry.

Offset (in bytes) Length (in bytes) Meaning
0 1 The order of this entry in the sequence of long file name entries. This value helps you to know where in the file's name the characters from this entry should be placed.
1 10 The first 5, 2-byte characters of this entry.
11 1 Attribute. Always equals 0x0F. (the long file name attribute)
12 1 Long entry type. Zero for name entries.
13 1 Checksum.
14 12 The next 6, 2-byte characters of this entry.
26 2 Always zero.
28 4 The final 2, 2-byte characters of this entry.

Here is an example of what a regular 8.3 entry with one long file name entry preceding it might look like in a hex editor:

41 62 00 69 00 6E 00 00 00 FF FF 0F 00 7F FF FF FF FF FF FF FF FF FF FF FF FF 00 00 FF FF FF FF
42 49 4E 20 20 20 20 20 20 20 20 10 00 00 F7 01 D5 38 D5 38 00 00 F7 01 D5 38 03 00 00 00 00 00

And in a text editor:

Ab.i.n..........................
BIN        ......8.8.....8......

The first line is the long file name entry (the second line is the regular 8.3 entry). The very first byte (41) tells us two important pieces of information. First, the one (01) tells us that this is the first long file name entry for the regular 8.3 entry. Second the forty (40) part tells us that this is also the last long file name entry for this regular 8.3 entry. The next 10 bytes spell out the first part of the long file name. In this case they read:

b 00 i 00 n 00 00 00 FF FF

Notice that each character is two bytes long and that the name is null terminated. The two FF's at the end are the padding at the end of the long file name. This is also what the other FF's in the long file name entry are. The final important thing to notice about the long file name entry is it's attribute byte at offset 11. the 0x0F attribute allows us to verify that this is indeed a long file name entry.

Programming Guide

This section is intended to give you information about common functions that are preformed on a FAT file system.

Reading the Boot Sector

The Boot Sector is always placed at logical sector number zero. You can either read the boot sector into an array and access it's members that way or you can read it into a structure and access it through the structure. Either way, the values in the boot sector need to be readily available in order to do much of anything with a FAT file system.

Here is an example of some boot sector structures in C.

typedef struct fat_extBS_32
{
	//extended fat32 stuff
	unsigned int		table_size_32;
	unsigned short		extended_flags;
	unsigned short		fat_version;
	unsigned int		root_cluster;
	unsigned short		fat_info;
	unsigned short		backup_BS_sector;
	unsigned char 		reserved_0[12];
	unsigned char		drive_number;
	unsigned char 		reserved_1;
	unsigned char		boot_signature;
	unsigned int 		volume_id;
	unsigned char		volume_label[11];
	unsigned char		fat_type_label[8];

}__attribute__((packed)) fat_extBS_32_t;

typedef struct fat_extBS_16
{
	//extended fat12 and fat16 stuff
	unsigned char		bios_drive_num;
	unsigned char		reserved1;
	unsigned char		boot_signature;
	unsigned int		volume_id;
	unsigned char		volume_label[11];
	unsigned char		fat_type_label[8];
	
}__attribute__((packed)) fat_extBS_16_t;

typedef struct fat_BS
{
	unsigned char 		bootjmp[3];
	unsigned char 		oem_name[8];
	unsigned short 	        bytes_per_sector;
	unsigned char		sectors_per_cluster;
	unsigned short		reserved_sector_count;
	unsigned char		table_count;
	unsigned short		root_entry_count;
	unsigned short		total_sectors_16;
	unsigned char		media_type;
	unsigned short		table_size_16;
	unsigned short		sectors_per_track;
	unsigned short		head_side_count;
	unsigned int 		hidden_sector_count;
	unsigned int 		total_sectors_32;
	
	//this will be cast to it's specific type once the driver actually knows what type of FAT this is.
	unsigned char		extended_section[54];
	
}__attribute__((packed)) fat_BS_t;

Important pieces of information that can be extracted from the boot sector include:


The size of the root directory (unless you have FAT32, in which case the size will be 0):

root_dir_sectors = ((fat_boot->root_entry_count * 32) + (fat_boot->bytes_per_sector - 1)) / fat_boot->bytes_per_sector);

This calculation will round up. 32 is the size of a FAT directory in bytes.


The first data sector (that is, the first sector in which directories and files may be stored):

first_data_sector = fat_boot->reserved_sector_count + (fat_boot->table_count * fat_boot->table_size_16);


The first sector in the File Allocation Table:

first_fat_sector = fat_boot->reserved_sector_count;


The total number of data sectors:

data_sectors = fat_boot->total_sectors_16 - (fat_boot->reserved_sector_count + (fat_boot->table_count * fat_boot->table_size_16) + root_dir_sectors);


The total number of clusters:

total_clusters = data_sectors / fat_boot->sectors_per_cluster;

This rounds down.


The FAT type of this file system (12, 16, or 32):

if(total_clusters < 4085) 
{
   fat_type = 12;
} 
else 
{
   if(total_clusters < 65525) 
   {
      fat_type = 16;
   } 
   else 
   {
      fat_type = 32;
   }
}

Reading Directories

The first step in reading directories is finding and reading the root directory. On a FAT 12 or FAT 16 volumes the root directory is at a fixed position immediately after the File Allocation Tables in the first data sector:

root_cluster_12_or_16 = first_data_sector;

For FAT 32 the cluster of the root directory is given in the extended boot record:

root_cluster_32 = extBS_32->root_cluster;

All cluster numbers for directories that are given in the file system will be relative to the first data cluster. For instance, "extBS_32->root_cluster" often reports that the root cluster is at cluster 2. Here is how you can find the absolute cluster when given the relative cluster.

absolute_cluster = relative_cluster - 2 + first_data_sector;

Note: Although the absolute cluster should be used to read data from the disk, the relative cluster number should be used when reading values from the File Allocation Table.

After the correct cluster has been loaded into memory, the next step is to read and parse all of the entries in it. Each entry is 32 bytes long. For each 32 byte entry this is the flow of execution:

  1. Does the entry even exist? If the first byte of the entry is equal to (==) zero or 0xE5 then the entry does _not_ exist. Otherwise it does exist. Yes, goto number 2. No, goto number 7
  2. Is this entry a long file name entry? If the 11'th byte of the entry equals (==) 0x0F (or in other words, it's attribute says that it is a long file name entry), then it is a long file name entry. Otherwise, it is not. Yes, goto number 3. No, goto number 4
  3. Read the portion of the long filename into a temporary buffer. goto 7
  4. Parse the data for this entry using the table from further up on this page. It would be a good idea to save the data for later. Possibly in a virtual file system structure. goto number 5
  5. Is there a long file name in the temporary buffer? Yes, goto number 6. No, goto 7
  6. Apply the long file name to the entry that you just read and clear the temporary buffer. goto number 7
  7. Increment pointers and/or counters and check the next entry. (goto number 1)

This process should be repeated until all of the entries have been read from the cluster. You should then check to see if there is another cluster following this one in the cluster chain or if this is the last cluster in the chain. See the section called following cluster chains for more information. You should do the above process for each cluster in the chain, following it until there are no more clusters left in the chain. Then you can check if any of the entries that you just read are directories. If the are they should each be read in the same way starting with their first cluster number which is stored in the entry.

Following Cluster Chains

There are two basic steps to following cluster chains. The first step is to find out if there is another "link" (cluster) following the current one in the chain. The second step is to actually use the value read from the FAT to read the next sector. Here is the basic idea:

  1. Extract the value from the FAT for the _current_ cluster. (Use the previous section on the File Allocation Table for details on how exactly to extract the value.) goto number 2
  2. Is this cluster marked as the last cluster in the chain? (again, see the above section for more details) Yes, goto number 4. No, goto number 3
  3. Read the cluster represented by the extracted value and return for more directory parsing.
  4. The end of the cluster chain has been found. Our work here is finished. :)

See Also

Threads

  • from raw bits to directory listing (code posted) in the forum
  • Public Domain FAT32 code in the forum

External Links

Personal tools
Namespaces
Variants
Actions
Navigation
About
Toolbox