✓ Data Transformation

(require '[clojure.pprint :refer [pprint]])
nil

Immutable data structures

Lists as data structure

'(1 2 3)
(1 2 3)
(quote (1 2 3))
(1 2 3)

Lists are immutable

(def courses '("csci 3055u"
               "csci 4050u"
               "csci 3050u"
               "csci 4020u"
               "csci 3020u"))
#'user/courses
  • Once the list is constructed, it cannot be changed. This is called immutable data structures.

  • Functional programming promotes immutablility because such programs are much easier to reason about.

  • Inplace update is replaced by data transformation.

{margin}

Conjoin is an operation that adds a new element into a collection in the most
efficient way.

```clojure
(conj <collection> <element>)
```
;;
;; Add another course into the list in the more efficient way.
;;
(conj courses "csci 2000u")
("csci 2000u" "csci 3055u" "csci 4050u" "csci 3050u" "csci 4020u" "csci 3020u")
;;
;; Add a course to the list, but use cons.
;;
(cons "csci 2000u" courses)
("csci 2000u" "csci 3055u" "csci 4050u" "csci 3050u" "csci 4020u" "csci 3020u")

Drop elements from the list

;;
;; Drop the first course in the list
;;
(drop 1 courses)
("csci 4050u" "csci 3050u" "csci 4020u" "csci 3020u")
;;
;; Drop the last course in the list.
;; Warning: linear complexity
;;
(drop-last 1 courses)
("csci 3055u" "csci 4050u" "csci 3050u" "csci 4020u")

Sorting the list

;;
;; Sort the courses
;;
(sort courses)
("csci 3020u" "csci 3050u" "csci 3055u" "csci 4020u" "csci 4050u")

Composition of transformations

;;
;; composed data transformation
;;
;; - add a course
;; - sort the list
;;
(sort (conj courses "csci 2000u"))
("csci 2000u" "csci 3020u" "csci 3050u" "csci 3055u" "csci 4020u" "csci 4050u")

Getting elements

;;
;; getting the first course
;;
(first courses)
"csci 3055u"
;;
;; getting a specific element from the list
;;
(nth courses 4)
"csci 3020u"

Counting

(count courses)
5

Merging collections

(concat courses ["hello" "world"])
("csci 3055u" "csci 4050u" "csci 3050u" "csci 4020u" "csci 3020u" "hello" "world")

into inserts a collection of elements a list in the most efficient way. This means that elements are added at the head of the list.

(into courses ["hello" "world"])
("world" "hello" "csci 3055u" "csci 4050u" "csci 3050u" "csci 4020u" "csci 3020u")

Hash-map data structure

Modeling using hash-map

(def kens-teaching {"csci 3055u" "Programming Languages"
                    "csci 4050u" "Machine Learning"
                    "csci 3050u" "Database Systems"
                    "csci 4020u" "Compilers"
                    "csci 3020u" "Algorithms"})
#'user/kens-teaching

Retrieving the value

;;
;; gets the course title of an existing course
;;
(get kens-teaching "csci 3050u")
"Database Systems"
;;
;; get the course title of a non-existing course
;;
(get kens-teaching "csci xxxxu")
nil
;;
;; hash-map is also a function expression
;;
(kens-teaching "csci 3055u")
"Programming Languages"
(kens-teaching "csci xxxxu")
nil

Updating an existing key

(pprint (assoc kens-teaching "csci 4050u" "Machine Learning: Theory and Practice"))
{"csci 3055u" "Programming Languages",
 "csci 4050u" "Machine Learning: Theory and Practice",
 "csci 3050u" "Database Systems",
 "csci 4020u" "Compilers",
 "csci 3020u" "Algorithms"}
nil

Adding a new entry

(pprint (assoc kens-teaching "csci 2000u" "Scientific Data Analysis"))
{"csci 3055u" "Programming Languages",
 "csci 4050u" "Machine Learning",
 "csci 3050u" "Database Systems",
 "csci 4020u" "Compilers",
 "csci 3020u" "Algorithms",
 "csci 2000u" "Scientific Data Analysis"}
nil

Deleting an entry

(pprint (dissoc kens-teaching "csci 3020u"))
{"csci 3055u" "Programming Languages",
 "csci 4050u" "Machine Learning",
 "csci 3050u" "Database Systems",
 "csci 4020u" "Compilers"}
nil

Updating the value of a key

Update syntax is functional in that it expects an function and additional arguments if any.

(update <hash-map> <key> <update-fn> <rest-of-arguments...>)

This updates the value associated with the <key> with the return value of

(<update-fn> <old-value> <rest-of-arguments...>)
;;
;; The updated hashmap will have:
;;
;; {
;;   ...
;;   "csci 4050u"  (str "Machine Learning" ": Theory and Applications")
;;   ...
;; }
;;
(pprint (update kens-teaching "csci 4050u" str ": Theory and Applications"))
{"csci 3055u" "Programming Languages",
 "csci 4050u" "Machine Learning: Theory and Applications",
 "csci 3050u" "Database Systems",
 "csci 4020u" "Compilers",
 "csci 3020u" "Algorithms"}
nil

The convenience of Clojure

(pprint
 (assoc kens-teaching 
       "math 1000u" "Calculus"
       "math 2000u" "Linear Algebra"
       "math 3000u" "Optimization"))
{"csci 3055u" "Programming Languages",
 "csci 4050u" "Machine Learning",
 "csci 3050u" "Database Systems",
 "csci 4020u" "Compilers",
 "csci 3020u" "Algorithms",
 "math 1000u" "Calculus",
 "math 2000u" "Linear Algebra",
 "math 3000u" "Optimization"}
nil
(pprint
 (dissoc kens-teaching
        "csci 3020u"
        "csci 3050u"
        "csci 3055u"))
{"csci 4050u" "Machine Learning", "csci 4020u" "Compilers"}
nil

Hash-maps are list-like

(first kens-teaching)
["csci 3055u" "Programming Languages"]
(drop 3 kens-teaching)
(["csci 4020u" "Compilers"] ["csci 3020u" "Algorithms"])
(count kens-teaching)
5

Nested data modeling

(def kens-teaching-history
    {"csci 3055u" {:title "Programming Languages"
                   :years '(2021 2020 2019 2018 2017)}
     "csci 4050u" {:title "Machine Learning"
                   :years '(2021 2020 2019)}
     "csci 3050u" {:title "Database Systems"
                   :year '(2015 2016)}
     "csci 4020u" {:title "Compilers"
                   :years '(2020 2019 2018)}
     "csci 3020u" {:title "Algorithms"
                   :years '(2015)}})
#'user/kens-teaching-history

Decomposition

;;
;; title of this course.
;;

(get (get kens-teaching-history "csci 3055u") :title)
"Programming Languages"
;;
;; using hash-map as functions
;;
((kens-teaching-history "csci 3055u") :title)
"Programming Languages"
;;
;; clojure supports get by path
;;
(get-in kens-teaching-history ["csci 3055u" :title])
"Programming Languages"
;;
;; adding a new course
;;
(pprint (assoc kens-teaching-history "csci 2000u" {:title "Scientific Data Analysis"
                                                   :years nil}))
{"csci 3055u"
 {:title "Programming Languages", :years (2021 2020 2019 2018 2017)},
 "csci 4050u" {:title "Machine Learning", :years (2021 2020 2019)},
 "csci 3050u" {:title "Database Systems", :year (2015 2016)},
 "csci 4020u" {:title "Compilers", :years (2020 2019 2018)},
 "csci 3020u" {:title "Algorithms", :years (2015)},
 "csci 2000u" {:title "Scientific Data Analysis", :years nil}}
nil
;;
;; updating an course
;;
(pprint (update kens-teaching-history
                "csci 3020u"
                update
                :years
                conj 2022))
{"csci 3055u"
 {:title "Programming Languages", :years (2021 2020 2019 2018 2017)},
 "csci 4050u" {:title "Machine Learning", :years (2021 2020 2019)},
 "csci 3050u" {:title "Database Systems", :year (2015 2016)},
 "csci 4020u" {:title "Compilers", :years (2020 2019 2018)},
 "csci 3020u" {:title "Algorithms", :years (2022 2015)}}
nil
;;
;; deep update
;;
(pprint (update-in kens-teaching-history
                   ["csci 3020u" :years]
                   conj
                   2022 2023 2024))
{"csci 3055u"
 {:title "Programming Languages", :years (2021 2020 2019 2018 2017)},
 "csci 4050u" {:title "Machine Learning", :years (2021 2020 2019)},
 "csci 3050u" {:title "Database Systems", :year (2015 2016)},
 "csci 4020u" {:title "Compilers", :years (2020 2019 2018)},
 "csci 3020u" {:title "Algorithms", :years (2024 2023 2022 2015)}}
nil
;;
;; keep only a few years for "csci 3055u"
;; using deep update and lambda function
;;
(pprint (update-in kens-teaching-history
                   ["csci 3055u" :years]
                   (fn [old-years] (take 2 old-years))))
{"csci 3055u" {:title "Programming Languages", :years (2021 2020)},
 "csci 4050u" {:title "Machine Learning", :years (2021 2020 2019)},
 "csci 3050u" {:title "Database Systems", :year (2015 2016)},
 "csci 4020u" {:title "Compilers", :years (2020 2019 2018)},
 "csci 3020u" {:title "Algorithms", :years (2015)}}
nil