Skip to content

Windows (NT) File System


Chapter 22, Section 22.5 in Operating Systems Concepts.

Not covered in OSTEP.

Background

The Windows file system is called NTFS, and was introduced with Windows NT 4.0 and is the standard file system on Windows 2000 and later systems, such as Windows XP. Its goal was to solve the size, performance, reliability, and flexibility limitations in the DOS (aka "FAT" file system).

It has a general similarity to the FAT file system in that all files are described in a single table, called the Master File Table (MFT). However, it has more modern characteristics in that all components are files, including:

  • Master File Table
  • data files
  • directories

  • free list (bit map)

  • boot images
  • recovery logs

The file system also has features to support redundancy and transactions, which we will not discuss. A great reference for details is the book: Inside the Windows NT File System by Helen Custer, published by (not surprisingly) Microsoft Press.


Disk Layout

Disks are divide in fixed size regions:

  • Each region is called a volume.

  • Each volume can contain a different kind of file system, such as NTFS, FAT, or even Unix.

  • Since each volume is a separate file system, it has its own root directory.

  • Multiple volumes allow for fixed limits on the growth of a particular file tree, such as limiting the size of temporary file space.

  • Multiple volumes also allow a single disk to contain multiple, separating bootable operating system.


Master File Table (MFT)

Clusters are the key element to allocation:

  • Logically, the disk consists of allocation units called clusters.

  • A cluster is a power-of-two multiple of the physical disk block size. The cluster size is set when the disk is formatted. A small cluster provides a finer granularity of allocation, but may require more space to describe the file and more separate operations to transfer data to or from memory.

  • The free list is a bitmap, each of whose bits describe one cluster.

  • Clusters on the disk are numbered starting from zero to the maximum number of clusters (minus one). These numbers are called logical cluster numbers (LCN) and are used to name blocks (clusters) on disk.

The MFT is the major, and in some ways, the only data structure on disk:

  • All files, and therefore all objects stored on disk are described by the MFT.

  • All files are logical stored in the MFT and, for small files are physically within the bounds of the MFT. In this sense, the MFT is the file system.

  • The MFT logically can be described as a table with one row per file.

  • The first rows in the table described important configuration files, including for the MFT itself.

Master File Table Overview


MFT Entries

As stated previously, each row or entry in the MFT (called a record) describes a file and logically contains the file. In the case of small files, the entry actually contains the contents of the file.

Each entry is consists of (attribute, value) pairs. While the conceptual design of NTFS is such that this set of pairs is extensible to include user-defined attributes, current version of NTFS has a fixed set. The main attributes are:

  • Standard information: This attribute includes the information that was standard in the MS-DOS world:

  • read/write permissions,

  • creation time,
  • last modification time,
  • count of how many directories point to this this file (hard link count.
  • File Name: This attribute describes the file's name in the Unicode character set. Multiple file names are possible, such as when:

  • the file has multiple links, or

  • the file has an MS-DOS short name.
  • Security Descriptor: This attribute lists which user owns the file and which users can access it (and how they can access it).

  • Data: This attribute either contains the actual file data in the case of a small file or points to the data (or points to the objects that point to the data) in the case of larger files.

Simplified View of MFT Entry

Very interesting fact: For small files, this design is extremely efficient. By looking no further than the MFT entry, you have the complete contents of the file.

However, the Data field gets more complicated in the common case where the data contained in the file is larger than an MFT entry. When dealing with large data, the Data attribute contains pointers to the data, rather than the data itself.

  • The pointers to data are actually pointers to sequences of logical clusters on the disk.

  • Each sequence is identified by three parts:

  • starting cluster in the file, called the virtual cluster number (VCN),
  • starting logical cluster (LCN) of the sequence on disk,
  • length, counted as the number of clusters.
  • The run of clusters is called an extent, following the terminology developed by IBM in the 1960's.

  • NTFS allocates new extents as necessary. When there is no more space left in the MFT entry, then another MFT entry is allocated. This design is effectively a list of extents, rather than the Unix or DEMOS tree of extents.

MFT Entry with Extents


Directories

As with other modern file systems, a directory in NTFS is a file whose data contains a collection of name/file mappings.

  • A directory entry contains the name of the file and file reference. The file references identifies the file on this volume. In other words, it is an internal name for the file.

A reference is a ( file number, sequence number) pair. The file number is the offset of the file's entry in the MFT table. It is similar to the Unix inumber (Inode number).

  • The list of file names in the directories is not stored in a simple list, but rather as a lexigraphically-sorted tree, called a B+ tree (this will be familiar to those with a database background). The data structure is called an index in NFTS (again, following the terminology from databases).

  • The NTFS design specifies that an index can be constructed for any attribute, but currently only file name indices are supported.

  • The name for a file appears both in its directory entry and in the MFT entry for the file itself.

  • As with regular files, if the directory is small enough, it can fit entirely within the MFT entry.

NTFS Directory, Simple Version

If the directory is larger, then the top part of (the B+ tree of) the directory is in the MFT entry, which points to extents that contain the rest of the name/file mappings.

NTFS Directory, with B+ tree extents


Copyright © 2013, 2018, 2020 Barton P. Miller

Non-University of Wisconsin students and teachers are welcome to print these notes their personal use. Further reproduction requires permission of the author.