COSI 31a: Spring 2009
Programming Assignment 3

Contents

Introduction

You are to design and implement a simple file system on top of a simulated disk. Your file system will provide a standard seek/read/write interface and use inodes and single-, double-, and triple-indirection similar to the classic UNIX file system model we learned in class.

Quick Reference

What to create

You must create the following class:

You must finish the following class:

You may not modify the following files:

You may modify other supplied files and create your own classes and files at your discretion.

What it must do

A non-exhaustive list of requirements are as follows:

This list is not a substitute for reading the entire assignment and asking questions.

API reference

You may wish to reference the API docs for the code supplied with this project. Reading the documentation is not a subsitute for studying the code.

Working in groups

You may choose to work in groups of 2.

Simulated Disk

The simulated disk uses a UNIX file named DISK to simulate a disk with NUM_BLOCKS blocks of BLOCK_SIZE bytes per block. It supports three methods:

/**
 * Read a block from the disk into a buffer.
 */
void read(int blockNum, byte[] buffer);

/**
 * Write a block from a buffer onto the disk.
 */
void write(int blockNum, byte[] buffer);

/**
 * Stop the disk and report how many read and writes took place.
 * If removeFile is true, it will also delete the DISK file.
 */
int stop(boolean removeFile);

In each case blockNum is required to be in range 0..Disk.NUM_BLOCKS-1 and buffer should be a byte array of exactly Disk.BLOCK_SIZE bytes. (There are also overloaded versions of read and write as described below.) Note that blocks must be read and written as a whole. If you need to read part of a block, you must read in the entire block and ignore the part you're not interested in. If you need to write part of a block, you must read in the whole block, modify the portion of interest, and write the whole block back out. The constructor looks for a file named DISK in the current directory. If it does not exist, the program will create it and initialize the first block to all zeros. Any other block must be written at least once before it can be read. The stop method prints statistics. It has an optional argument (default true) to indicate whether to remove the DISK file. Since the DISK file can be quite large, you should be sure to remove it before logging off.

You will need to implement files on the simulated disk and some way of allocating disk blocks. You will use an adaptation of the method used by UNIX. (In fact, the scheme described below is very similar to the one used by the original version of UNIX-the so-called "Sixth Edition" version for the PDP-11).

Super Block

Block 0 of the disk is the so-called "super block" , which contains information about the rest of the disk. You will want to keep a copy of this block in memory at all times. It should be read in when the file system starts up, and written back out before shutting down. The super block should hold the following variables:

class SuperBlock {
    public int size;      // Size of file system (in blocks)
    public int msize;     // number of blocks used by the free space map
    public int isize;     // number of inode blocks
    public byte freeMap[] // first bits of free map
}

The size of the file system is recorded in the super block to allow the file system to use less than the whole disk and to support various disk sizes. In all the data structures on disk, a "pointer" to a disk block is an integer in the range 1..NUM_BLOCKS-1. Since block 0 is treated specially, you can use a block number of zero to represent a null pointer.

Remember that the super block (block 0), the free map blocks (1..msize), and the inode blocks (blocks msize+1..msize+isize) are at the beginning of your disk. The rest of the blocks are all data blocks. The contents of the data blocks will be determined by the write commands you issue in your program. When you go to read a block, you will have to know beforehand what type of block it is in order to read it correctly.

Free Space

You will manage free space by keeping track of which blocks are free (not a part of any file) in a bitmap. A bitmap is simply an array of bits; in our case, each bit corresponds to a block on the disk. We will call this the free map. If the bit corresponding to page n is 1, then page n is currently free. If the bit corresponding to page n is 0, then page n is not free.

You cannot use the standard assignment and indexing operators to which you are accustomed in Java when you want to access bits; the smallest unit you can directly address is a byte. So, you must use bitwise operators on bytes to test the values of bits. For your assignment, the bitmap will be stored as a byte array (byte[] freeMap). You may want to brush up on bitwise operators and their use in java. An important point to remember is that in Java, all types are signed, even bytes. You may have to take the sign bit into account depending on what operations you use for manipulating the bytes in the bitmap.

The freeMap instance variable in class SuperBlock is an Array object that will point to a chunk of bytes in memory. On disk, you should pack as many bytes of the the free map into the super block as you can (that is, the first bits of the free map array should be stored next to the other super block properties in the freeMap field of SuperBlock). If the free map is too large to fit entirely in the super block, then you must allocate enough blocks on disk to hold the rest of the free map.

The free space bitmap should be stored in blocks 1..msize. For convenience, a free map block can be read into the following structure:

public class FreeMapBlock {
    public byte[] freeMap;
}

You may not use java.util.BitSet or any other bit-manipulation classes available on Internet. You also may not "simulate" a bitmap using an array of booleans. You must write your own code for reading and writing the bitmap, and it must pack the free status for 8 blocks into each byte.

File Structure

Each file in the system is described by an index node (inode for short)1.

class Inode {
    public final static int SIZE = 64; // size in bytes
    public int flags;
    public int owner;
    public int size;
    public int ptr[] = new int[13];
}

If the flags field is zero, the index block is unused. In a real file system, the bits of this int distinguish different types of file (directories, data files, etc.) and indicate permissions. You do not have to implement these features. Similarly, you may ignore the owner field. The size field indicates the current size of the file, in bytes.

Block 0 of the disk is the super block. Blocks msize+1..msize+isize are packed with inodes1.

class InodeBlock {
    public Inode node[] = new Inode[Disk.BLOCK_SIZE/Inode.SIZE];
}

The remaining blocks may be allocated as direct or indirect blocks, or placed on the free map. They are collectively known as data blocks.

The data blocks that contain the contents of the files (or free map) are called direct blocks. The ptr fields in an inode point (directly or indirectly) to these blocks. The first 10 pointers point to the first 10 direct blocks. The 11th pointer (ptr[10]) points to an indirect block. This block contains pointers to the next Disk.BLOCK_SIZE/4 direct blocks of the file1.

class IndirectBlock {
    public int ptr[] = new int[Disk.BLOCK_SIZE/4];
}

(Note that the blocks on the free map have the same format). Pointer ptr[11] points to a "doubly indirect" block. It is filled with pointers to indirect blocks, each of which contains pointers to data blocks. Similarly, the final pointer points to a "triply indirect" block. The size of the file is determined by the size field of the inode, not by the pointers.

Holes

A null pointer (either in the inode or in one of the indirect blocks) may indicate a hole in the file. For example, if the size field indicates that there should be five blocks, but ptr[2]==0, then the third block constitutes a hole. Similarly, if the file is large enough and ptr[10]==0, then blocks 11 through POINTERS_PER_BLOCK+10 are all holes. Attempts to read from a hole act as if the hole were filled with zeros; an attempt to write into a hole causes the hole to be "filled in": Blocks are allocated as necessary and added to the file. Holes are created by seeking beyond the end of the file and then writing.

Inodes are numbered consecutively starting at 1 (not zero!). Since inodes are packed into blocks, each Inode block contains Disk.BLOCK_SIZE/Inode.SIZE Inodes (remember that Inode 1 will be located in the first Inode block, which is found after the last Free Map block). In this system, files are referenced by these numbers (called "index numbers", or inumbers for short). In a real file system, directory files are used to translate mnemonic names to inumbers, but for this project, we will use inumbers directly.

Other Disk Operations

The data structures SuperBlock, InodeBlock, and IndirectBlock are all the same size (i.e., all blocks are the same size), so any one of them can be written to or read from any disk block. For your convenience, we have added four overloaded versions of read and write to the Disk interface. These versions allow read and write to super blocks, Inode blocks, Indirect blocks, and Free Map blocks.

class Disk {
    public Disk() {
    public void read(int blocknum, byte[] buffer) {}
    public void read(int blocknum, SuperBlock block) {}
    public void read(int blocknum, InodeBlock block) {}
    public void read(int blocknum, IndirectBlock block) {}
    public void read(int blocknum, FreeMapBlock block) {}
    public void write(int blocknum, byte[] buffer) {}
    public void write(int blocknum, SuperBlock block) {}
    public void write(int blocknum, InodeBlock block) {}
    public void write(int blocknum, IndirectBlock block) {}
    public void write(int blocknum, FreeMapBlock block) {}
    public void stop() {}
}

Operations

You must implement the interface FileSystem in a class called MyFileSystem. This interface contains the following methods:

interface FileSystem {
    public int formatDisk(int size, int isize);
    public int shutdown();
    public int create();
    public int inumber(int fd);
    public int open(int inumber);
    public int read(int fd, byte[] buffer);
    public int write(int fd, byte[] buffer);
    public int seek(int fd, int offset, Whence whence);
    public int close(int fd);
    public int delete(int inumber);
}

In the tradition of C programming, each method returns an integer value, with -1 meaning "error" and a non-negative value (0 unless specified otherwise) meaning "success"2.

File Table

This code is already provided in FileTable.java. However, make sure you understand how to use it from FileSystem. It is a data structure to keep track of open files. For each file, you will need a pointer to an in-memory copy of its inode, its inumber, and the current seek pointer. Understand the methods for allocating and freeing slots in this table, determining whether a file descriptor is valid, and accessing the data associated with a file descriptor. FileTable.java has two classes: FileDescriptor and FileTable.

The FileDescriptor is a abstract data type (ADT). It contains private variables (inode, inumber, current seek pointer) and a series of get() and set() methods to either retrieve or update the object's private variables.

class FileDescriptor {
    public FileDescriptor(Inode newInode, int newInumber){}
    public Inode getInode(){}    
    public int getInumber(){}
    public int getSeekPointer(){}
    public void setSeekPointer(int i){}
    public void setFileSize(int newSize){} 
}

The file table uses a byte array called fileMap tells whether the specified index has a file open. fileMap is an array of 0's or 1's. 1 denoting an open file and 0 denoting an open slot. So if no files are open, then fileMap is an array of all zeros.  The FileDescriptor array (fdArray) contains FileDescriptor objects. The index in fileMap corresponds to the fd object in fdArray with the same index (denoted by parameter fd).

Each FileDescriptor object is an instance of the class FileDescriptor that maintains information about the specified file.

class FileTable {
    public FileTable(){}

    /**
     * Returns the index in the file table. Returns -1 if the
     * table is full.
     */
    public int allocate(){}
    
    /**
     * Add an entry to the table. This method will overwrite.
     *
     * Returns 0 for success and -1 for error
     */
    public int add(Inode inode, int inumber, int fd){}

    /**
     * Free an entry in the table.
     */
    public void free(int fd){}

    /**
     * Returns true if the file descriptor is valid, returns
     * false otherwise.
     */
    public boolean isValid(int fd){}

    /**
     * Given a file descriptor, this method returns the
     * corresponding inode.
     *
     * Returns null if no inode.
     */
    public Inode getInode(int fd){}

    /**
     * Given a file descriptor, this method returns the
     * inumber given a file descriptor or 0 if error.
     */
    public int getInumber(int fd){}

    /**
     * Given a file descriptor, this method returns the
     * corresponding seek pointer.
     */
    public int getSeekPointer(int fd){}

    /**
     * Updates the seek pointer given a file descriptor.
     *
     * Returns 1 if update is a success or 0 if failure.
     */
    public int setSeekPointer(int fd, int newPointer) {}
    
    /**
     * Updates the file size given a file descriptor.
     *
     * Returns 1 if update is a success and zero if failure.
     */
    public int setFileSize(int fd, int newFileSize){}

    /**
     * Given an inumber it returns a file descriptor
     * or -1 if failure.
     */
    public int getFDfromInumber(int inumber){ }
}

Implementation Hints

Although this is a large project, it should be manageable if you break it down into small pieces. Here is one way (but not the only possible way!) to decompose the problem. The tasks are listed roughly in the order they are needed, although in some cases they are inter-dependent.

Free-space management.
Write methods for allocating and freeing disk blocks. Also write the portion of formatDisk that builds the free map in the first place.
Block access within a file.
Write a method that takes an Inode and a block-offset within the file and returns a pointer to (the block number of) the corresponding block. The method should have a Boolean argument fillHole which specifies what to do if the indicated block does not exist. If the block does not exist and fillHole is false, the method should simply return 0; if fillHole is true, the method should allocate a block, add it to the file, and return its block number. The first version is appropriate for read and the other is appropriate for write. You might want to first write and debug this method for "small" files (no indirect blocks) and then modify it to handle large files as well. For large files, if fillHole is true, you may need to allocate one or two indirect blocks and add them to the file.
Accessing inodes.
You will need methods to read a specific inode from disk (given its inumber) and writing back a modified inode. Remember that you can only read and write whole blocks, so to write an inode, you will have to read the block containing it, modify the specific inode, and then write the block back out. You will also need a method to allocate inodes.
Reading and writing arbitrary ranges.
At this point, implementing read and write should not be too hard. An individual read request may touch parts of several blocks, so you will need a loop that reads each of the blocks and copies the appropriate portion of it into the appropriate part of the buffer argument of the read call. The implementation of write is slightly more complicated because if a block is only partially modified, you have to read its old value, copy data from the client's buffer into the appropriate portion of the block, and then write it back out.
Miscellaneous.
Fill in the remaining methods of class FileSystem. The only non-trivial remaining piece is delete, which must return all the data and indirect blocks to the free map and clear the flags field of the inode. It must also check that the file is not currently open.

When you get done, you should thoroughly test all the ten required functions, including creating, reading, writing, closing, reopening, and clearing all sorts of files (small, large, filled with holes, etc.) You should also test the error checking in your code. The main program we supply should be very handy in helping you to do this.

Code Provided to You

We have provided several files to help you get started. You may download them in the following zip file: pa3-src.zip. The API docs are included in this zip file, and also available here.

Tests

  • The files test_data/test*.data are test scripts for testing your program.
  • The Command Interpreter

    The main method in class Shell implements a simple command interpreter. You can either use it interactively by invoking the program as:

    java Shell

    or you can have it run a test script by typing, for example:

    java Shell test_data/test1.data

    Input lines starting with "/*" or "//" are ignored (the latter are echoed to the output). Other lines have the format:

    [ var = ] command [ args ]

    The optional prefix var = causes the result of the command to be assigned to a variable. In any case, the result of the command is printed. The there is one command for each of the ten methods of class FileSystem as well as three additional commands: help, vars, and quit. The help command prints a list of commands, the vars command lists the current values of all interpreter variables that have been assigned values, and the quit command terminates the program. With the exception of the second argument to write, each argument can be either an integer or the name of a variable. The command:

    write fd pattern size

    writes size bytes to the indicated file at the current offset. The data is generated by repeating pattern over and over the required number of times.

    How to submit

    Including a README file

    You must prepare a report describing the design and structure of your directory and file system. The report should be not more than two typewritten pages. You should carefully describe all design decisions you made and explain how these decisions affect the performance of your file/directory system. You may assume that this handout is part of the documentation of your program. Thus you need not repeat information that is in this handout.

    Contents of the readme must include:

    1. names of members of the group
    2. email address of each group member
    3. If you have written additional java files, document them well and explain how they are linked to the given files.
    4. Indicate any quirks and bugs of your program. (for example, "this operation does not work that's why test3.data failed").

    Footnotes

    1There is also an artifact of Java here that would not be present in a real operating system. In Java, the Inode structure is stored in memory as three integers followed by a pointer to an array of thirteen more integers. There would also be additional information to indicate the type of the Inode structure and the size of the array. On disk, however, the Inode structure is simply 16 integers in a row, like this C structure:

    struct inode {
        int flags;
        int owner;
        int size;
        int ptr[13];
    };
    

    Unfortunately, there's no easy way to create exactly this structure in memory in Java, but fortunately, you will probably never notice the difference. Similar remarks apply to InodeBlock and IndirectBlock.

    2A real system would need some way to indicate what sort of error occurred. In UNIX, the nature of the error is indicated by an integer error code placed in a global variable called errno. For this project, you can just print an error message. A more "Java-like" design would use exceptions to indicate errors.

    3In UNIX, this array is split into three parts. Each process has its own table of open files. There is a single system-wide table of so-called "in-core inodes" shared among all processes. Each entry in this table has a reference count so that it can be removed when the last process closes the file. Seek pointers are kept in yet another system-wide table so that there can be multiple seek pointers into the same file, and multiple processes can share a seek pointer. For this project, you can combine all this information into one table.

    4In UNIX, the argument is a pathname. The file system uses the directories to translate this name into an inumber.

    5In UNIX, deletion is delayed until all processes that have the file open close it.