{:check ["true"],
 :rank ["review_of_fs" "versioned_fs" "operations" "branches"]}

Index

Versioned Filesystem

https://en.wikipedia.org/wiki/Versioning_file_system

A versioning file system is any computer file system which allows a computer file to exist in several versions at the same time. Thus it is a form of revision control.

- Wikipedia

Review Of Fs

Review of filesystem

  • We can think of a filesystem as a tree.

    • Directories
    • Files
  • Each node has several attributes known as its metadata:

    • Name
    • Permission
    • Modified date
  • Example

    
    <root>
      |
      +-- a
      |   |
      |   +-- b "hello world"
      |   |
      |   +-- c "int main() { ... }"
      |
      +-- d "282349234345"
    

Versioned Fs

Versioned Filesystem

A versioned filesystem (VFS) extends a normal filesystem by allowing multiple versions over one or more timelines.

$t=0$
 |
 |
(*) --- $\mathbf{FS}(t=1)$ <root>
 |           |
 |           +-- a
 |
 |
 |
(*) --- $\mathbf{FS}(t=2)$ <root>
 |           |
 |           +-- a
 |           |   |
 |           |   +-- b
 |           |
 |           +-- c
 |
 $\vdots$

Why do we need versioning?

  • Technology is about trial and error. How do we reliably recover from mistakes?

    Versioning allows reliable recovery to an older version of the FS.

  • Q: how often has assignment_1.py been modified?

    Versioning tracks each version of the file, so we can easily look up the modification history of individual files.

Terminologies

  • Commit

  • Head

Operations

Operations in a versioned filesystem

We have looked at the filesystem operations in UNIX. In this section, we will discuss the operations on versioned filesystems.

Creating a versioned filesystem

init: Create a timeline with an initial commit.

            timeline

=====>      (*)
init         |
             |

Modify the latest version using commit

commit: Creates a new commit along the timeline.

  • Like normal filesystems, we should be allowed to make modifications to a particular version of the filesystem.

  • Let's focus on the latest commit, aka the head of the timeline.

  • Let $T_k$ be the head of the filesystem. We can modify it as we do with a normal UNIX filesystem using one or more of the commands cp, rm, mv, etc. It is important to note that we do not change the FS $T_k$. But rather, the resulting FS forms a new head $T_{k+1}$.

t=0  *                                  * t=0
     |                                  |
t=1  *                                  * t=1
     |                                  |
t=2 (*)  -----+                         * t=2
              | cp                      |
              | rm                      |
              | mv                      |
              | vim     commit          |
              +------>  =======>       (*) t=3

Working with a specific version with checkout

t=0  * 
     |           +-----------+ 
     |           | working   | 
t=1  *           | directory | 
     |           +-----------+ 
     |              ↑
t=2  *              |
     |              |
     |   checkout   |
t=3 (*) ============+

Examine history

In a versioned FS, paths are no longer sufficient to uniquely identify the file. We need to specify a commit.

  • latest copy of /home/joe/assignment_1/main.c
  • previous commit of /home/joe/assignment_1/main.c
  • the first commit of /home/joe/assignment_1/main.c

We should also be able to examine differences between different versions of files.

  • What is the difference between /home/joe/assignment_1/main.c $@\mathrm{latest}$ and /home/joe/assignment_1/main.c $@t=4$.

  • What is the difference in the tree structure between filesystem $@t=4$ and $@t=5$?

Undo

Undo involves forgetting the most recent batch of changes.
It basically deletes the latest commit from the timeline, and sets the previous commit as the new head.

       *            *
       |   undo     |
       *   ----->   *
       |            |
       *           (*)
       |      
      (*)     

Branches

Branches

Branching is a phenomenon that occurs when updates are done on a non-head commit.

 t=0    *
        |
        |
 t=1    *
        |
        |
 t=2    *
        |
        |
 t=3   (*)

Let's make changes to the commit $@t=1$ using the standard commands cp, mv, rm, ...

  1. We don't want to loose anything, we cannnot override comment $@t=2$.
  2. We need to create a branch at $@t=1$.
  3. The new commit should not be $@t=4$ because it is not derived from $@t=3$. Let's call it $@t=1.1$ to indicate that it came from $@t=1$.
 t=0    *
        |
        |
 t=1    *
        |\
        | \
 t=2    * (*) t=1.1
        |
        |
 t=3   (*)

Working Branch

In most situations, we need to focus on one timeline at a time. This means that we focus on one working branch.

  1. Branches are given names.
  2. Branching is considered a temporary divergence from the main branch. Traditionally the main branch is called the master branch, while all other branches are named by the user.
  3. One of the branches is designated as the working branch.
 t=0    *
        |
        |
 t=1    *--
        |  \
        |   \
 t=2    *    *  t=1.1
        |    |
        |    |
 t=3   (*)  (*) t=1.2

    master  dev

Merging Branches

It's difficult to deal with different branches. Eventually, we want to make modifications to all heads of all branches so that they becomes identical. This is called merge.

  1. During a merge, we need to resolve all differences between two commits. In Figure Branches, we need to resolve differences between $\mathrm{master}@t=3$ and $\mathrm{dev}@t=1.2$.

  2. A new commit needs to be created which will be an update of both $\mathrm{master}@t=3$ and $\mathrm{dev}@t=1.2$.

 t=0    *
        |
        |
 t=1    *--
        |  \
        |   \
 t=2    *    *  t=1.1
        |    |
        |    |
 t=3    *    *  t=1.2
        |   /
        |  /
 t=4    *--
    master  dev

Naming commits

  • So far, we have been naming commits by a fictitious timestamp (e.g. $t=1$, $t=2$ etc).

  • While this makes sense in a single timeline, it quickly becomes confusing when we are dealing with multiple branches.

  • In practice, commits are named by a unique hashcode that is generated by the tree filesystem associated with that associated with that commit.