Brandeis Computer Science Department

COSI 146a - Fundamentals of Computer Systems December 04, 2007

Quiz 3 Review


B. Fault-tolerance

  • Goal: design modules that compute correct results in the presence of local failures.
  • failure: deviation of the actual behavior of a system from the specified behavior
  • error: the event that causes a failure
  • fault: the cause of an error
  • mean time to failure (MTTF), mean time to repair (MTTR) , MTBF : MTTF + MTTR
  • availability: MMTF/MTBF

    Design Strategy

  • fail-fast : Each module should either operate correctly or stop immediately (also called fail-stop).
  • retry : a technique to make systems fault tolerant, works for transient failures.
  • redundancy : technique to make systems fault tolerant by replicating modules and xor-ing or voting on their outputs (duplex, triplex, pair and spare, duplex plus repair, and triplex plus repair). incremental redundancy - RAID.

    Atomicity of programs

  • Goal: designing storage systems that retain data over power failures or operator shutdown while running active applications.
  • volatile storage: storage medium in which data may disappear after after a power failure of a computer, or the operator signals a shutdown.
  • nonvolatile storage : storage medium in which data is retained after after a power failure of a computer, or the operator signals a shutdown.
  • hard disk : physical device that provides huge amount of non-volatile storage for a low price. Unfortunately, accessing a byte on disk is slow;
  • replication can be used to increase the performance of a disk system but multiple disks increase the probability of having a failed disk so need redundancy to insure fault tolerance (e.g. RAID).

    Atomic operation

  • Problem : make operations atomic both from the perspective of failure and concurrency. Hard: have to think about both at the same time. We focus on failure.
  • atomic operation : an operation is atomic if there is no way to way that an operation is composed from suboperations. In particular, from the perspective of concurrency and failure recovery.
  • atomic disk : a disk that can tolerate partial writes (e.g., atomic disk algorithm : simple algorithm to make disk atomic)
  • Data consistency problem : problem due to the fact that multiple data structures on disk have to be updated atomically.

    Transactions

  • transactions : a stylized form of atomic operations: It consists of precommit operations, a commit point, and postcommit operations, marked by begin-transaction and end-transaction.
  • serializability : a correctness condition for concurrent transactions: if several transactions run at once, the result must be identical to some serial schedule of these transactions, i.e. insures atomicity in presence of concurrency.
  • Three key transaction implementation ideas that achieve atomicity in the presence of failures and concurrency : (1) use versions of variables to control when changes are visible; (2) logs contain authorative information; all other information (e.g., cached database in main memory) can be reconstructed from information stored in the log. (3) construct long high-level atomic operations from short low-level atomic operations (e.g., atomic disk) .
  • log : a stable append-only datastructure containing transaction history
  • Write ahead undo/redo log : a set of rules for implementing transactions with respect to failure recovery using a log

    Reliable storage systems

  • file system : a storage system offering files (linear array of bytes), directories, and file attributes. e.g. Unix FFS, provides atomicity for directory ops but not file ops, recovers with fsck; contrast to recovery in a database.
  • database : a storage system offering records, accessed through queries. Support for high-degree of concurrency and fine-grained access, high performance. Tolerates crash and media failures, provides long-term archival storage. e.g. system R
  • information retrieval system : a system offering full-indexed access and navigation, most read operations, weak update consistency e.g. web

    Security

  • Definitions (Privacy, Security, Authentication, Authorization, Confidentiality)
  • Safety net approach
  • Passive vs active attacks
  • Open vs closed system design
  • TCB (what is inside vs outside)
  • Key-based mechanisms (unbreakable one-time pad, shared-secret keys (DES) and public (RSA) keys)
  • Encryption for confidentiality
  • Encryption for Authentication (Authenticity of message origin and authenticity of content)
  • MACs (message authentication codes) and digital signatures
  • Key distribution (Hierarchy of central authority, decentralized Web of trust)

    Authorization

  • Guard model
  • Ticket system
  • List system
  • Flow-control model (Technique: no reads up, no writes down, please read 11-F)
    Unix security model
  • "Who is who": principles, guards, objects, SETUID.

    Complexity revisited

  • Lampson system design hints applied/ missapplied/ignored in systems we studied (Worse is better, End-to-End, Therac-25, others..)
  • Possibly in the (modest) case of the projects you were assigned in cs146a.


    Go to CS146a Home Page Questions or Comments: liuba@cs.brandeis.edu