{:check ["true"], :rank ["intro_to_filesystem" "abstraction" "paths" "absolute_path" "relative_path" "directory_file" "shell"]}
In this section, we will be exploring the working environment of the UNIX system.
In particular, we will introduce the basics of its filesystem, and how we can interact with the file system via a command line interface, known as the SHELL.
https://en.wikipedia.org/wiki/Unix_filesystem
In Unix and operating systems inspired by it, the file system is considered a central component of the operating system
Wikipedia
A filesystem is a component of an operating system that is responsible for storing and managing data in persistent storage.
All filesystems are modeled after the original UNIX filesystem. Google Bucket storage and Amazon S3 storage are cloud based solutions for Internet scale data storage, and they too are designed based on principles we discuss in this course.
We will understand the abstraction of filesystems, the concept of paths, and finally UNIX commands that help us navigating the UNIX filesystem.
The filesystem is a tree of nodes.
There are a few exceptions where the filesystem is not strictly a tree.
Here is a tree with nodes: $A, B, C, \dots, J$
A
├── B
│ ├── C
│ ├── D
│ ├── E
│ └── F
│ ├── G
│ └── H
├── I
│ └──
└── J
Nodes $A, B, F, I$ are intermediate. They are nodes that can have children.
Note: $I$ is an intermediate node with no children.
Nodes $C, D, E, G, H$ are leaf nodes.
Each node has information known as its metadata. At least, each node has a name. In the example above, all the names are unique, but it's not necessarily so in general.
Consider the following tree:
<root>
├── A
│ ├── B
│ └── A
│ ├── B
│ └── A
└── B
└──
We note the following properties:
The nodes are labeled only by either A
or B
.
Important: The node label is not sufficient to uniquely identify the node.
- There are three distinct nodes all labeled as
A
.- There are three distinct nodes labeled as
B
.
Definition: Path
A path is a sequence of node labels that can be used to identify one or more nodes in a tree.
Definition: Absolute Path
An absolute path is a sequence of node labels from the root, separated by
/
.
Absolute paths are used to identify nodes in a tree. The node corresponding to an absolute path is called the locate of the path, or we say the path resolves to that location.
Example: Absolute Path
<root> (0) ├── A (1) │ ├── B (2) │ └── A (3) │ ├── B (4) │ └── A (5) └── B (6) └──
Here are examples of absolute paths.
/A/A/B
resolves to the location(4)
/
resolves to the root node(0)
It's possible that an absolute path does not resolve to any location in the tree.
Example: the following absolute paths do no resolve to any location.
/C
/A/B/A
We want to describe situations where one starts navigating a tree from some initial node that is not necessarily the root.
Definition: Current Location
Given a tree, the current location is any intermediate node in the tree that is the starting point of navigation.
Example: consider that the current location is /A/A
.
<root> (0) ├── A (1) │ ├── B (2) │ └── A (3) <-- current location │ ├── B (4) │ └── A (5) └── B (6) └──
A relative path is a path that describes the navigation
relative to a current location. A relative path is a sequence
of steps separated by /
.
Each step can be one of the following:
$\bullet\bullet$
: traverse to the parent node.$\bullet$
: stay in the current node.Example:
<root> (0) ├── A (1) │ ├── B (2) │ └── A (3) <-- current location │ ├── B (4) │ └── A (5) └── B (6) └──
Current location | Relative Path | Final location |
---|---|---|
(3) | . |
(3) |
(3) | .. |
(1) |
(3) | ../.. |
(0) |
(3) | ./B |
(4) |
(3) | ./B/../A |
(5) |
(3) | ../../B |
(6) |
The UNIX filesystem is (almost) a tree. The intermediate nodes are called directories, and leaf nodes files.
Note:
- Sometimes, directories are also referred to as folders.
- Some programming languages (such as Java and Python) uses the term file to refer to a node in the filesystem.
Directories are used to store zero or more sub-directories and files. Using directories one can form any tree structure.
Files are the actual units data storage. Images, program source codes or data files are stored as files.
The SHELL is a program that allows the user to interact with the UNIX system.
+------+ command +-------+ +-----------+
| user | ----------> | SHELL | <--> | Operating |
| | | | | System |
| | output | | | |
| | <--------- | | | |
| | | | +-----------+
+------+ +-------+
At all times, the SHELL tracks a current location known as the present working directory,
which can be examined using the pwd
command. There is also an environment
variable $PWD
that tracks the present working directory.
# List the current directory
ls
# List directory elsewhere
ls <path>
# path can be relative or absolute
$ cd <path>
# Creating a directory
mkdir <path>
<path>
must already exist.# Moving file to new location
mv <old_path> <new_path>
<old_path>
to a new location given by <new_path>
.# Remove files given by path
rm <path> <path> ...