{:check ["true"], :rank ["review_of_fs" "versioned_fs" "operations" "branches"]}
https://en.wikipedia.org/wiki/Versioning_file_system
A versioning file system is any computer file system which allows a computer file to exist in several versions at the same time. Thus it is a form of revision control.
- Wikipedia
We can think of a filesystem as a tree.
Each node has several attributes known as its metadata:
Example
<root>
|
+-- a
| |
| +-- b "hello world"
| |
| +-- c "int main() { ... }"
|
+-- d "282349234345"
A versioned filesystem (VFS) extends a normal filesystem by allowing multiple versions over one or more timelines.
$t=0$
|
|
(*) --- $\mathbf{FS}(t=1)$ <root>
| |
| +-- a
|
|
|
(*) --- $\mathbf{FS}(t=2)$ <root>
| |
| +-- a
| | |
| | +-- b
| |
| +-- c
|
$\vdots$
Technology is about trial and error. How do we reliably recover from mistakes?
Versioning allows reliable recovery to an older version of the FS.
Q: how often has assignment_1.py
been modified?
Versioning tracks each version of the file, so we can easily look up the modification history of individual files.
Commit
Head
We have looked at the filesystem operations in UNIX. In this section, we will discuss the operations on versioned filesystems.
init
: Create a timeline with an initial commit.
timeline
=====> (*)
init |
|
commit
: Creates a new commit along the timeline.
Like normal filesystems, we should be allowed to make modifications to a particular version of the filesystem.
Let's focus on the latest commit, aka the head of the timeline.
Let $T_k$ be the head of the filesystem. We can modify it as we do with a normal UNIX filesystem
using one or more of the commands cp
, rm
, mv
, etc. It is important to note that we do not change
the FS $T_k$. But rather, the resulting FS forms a new head $T_{k+1}$.
t=0 * * t=0
| |
t=1 * * t=1
| |
t=2 (*) -----+ * t=2
| cp |
| rm |
| mv |
| vim commit |
+------> =======> (*) t=3
t=0 *
| +-----------+
| | working |
t=1 * | directory |
| +-----------+
| ↑
t=2 * |
| |
| checkout |
t=3 (*) ============+
In a versioned FS, paths are no longer sufficient to uniquely identify the file. We need to specify a commit.
/home/joe/assignment_1/main.c
/home/joe/assignment_1/main.c
/home/joe/assignment_1/main.c
We should also be able to examine differences between different versions of files.
What is the difference between
/home/joe/assignment_1/main.c
$@\mathrm{latest}$ and
/home/joe/assignment_1/main.c
$@t=4$.
What is the difference in the tree structure between filesystem $@t=4$ and $@t=5$?
Undo involves forgetting the most recent batch of changes.
It basically deletes the latest commit from the timeline, and sets the previous commit as the new head.
* *
| undo |
* -----> *
| |
* (*)
|
(*)
Branching is a phenomenon that occurs when updates are done on a non-head commit.
t=0 *
|
|
t=1 *
|
|
t=2 *
|
|
t=3 (*)
Let's make changes to the commit $@t=1$ using the standard commands
cp
, mv
, rm
, ...
t=0 *
|
|
t=1 *
|\
| \
t=2 * (*) t=1.1
|
|
t=3 (*)
In most situations, we need to focus on one timeline at a time. This means that we focus on one working branch.
master
branch, while all other branches are named by the user. t=0 *
|
|
t=1 *--
| \
| \
t=2 * * t=1.1
| |
| |
t=3 (*) (*) t=1.2
master dev
It's difficult to deal with different branches. Eventually, we want to make modifications to all heads of all branches so that they becomes identical. This is called merge.
During a merge, we need to resolve all differences between two commits. In Figure Branches, we need to resolve differences between $\mathrm{master}@t=3$ and $\mathrm{dev}@t=1.2$.
A new commit needs to be created which will be an update of both $\mathrm{master}@t=3$ and $\mathrm{dev}@t=1.2$.
t=0 *
|
|
t=1 *--
| \
| \
t=2 * * t=1.1
| |
| |
t=3 * * t=1.2
| /
| /
t=4 *--
master dev
So far, we have been naming commits by a fictitious timestamp (e.g. $t=1$, $t=2$ etc).
While this makes sense in a single timeline, it quickly becomes confusing when we are dealing with multiple branches.
In practice, commits are named by a unique hashcode that is generated by the tree filesystem associated with that associated with that commit.