{:draft ["true"]}

Index

Programming with Sequences

Seq

Everything is a sequence

Clojure's concept of seq applies to a surpriingly wide range of data types.

  • List
  • Vector
  • Hash-map
  • Regular expression matching
  • File I/O
  • Infinite streams

Lazy Sequences

In particular, Clojure provides facility of lazy evaluation.

A lazy expression is an expression that is not evaluated until it is needed.

Clojure's support for lazy sequences allows one to build representation of infinite sequences using lazy list construction.

An eager sequence builder

First, let's look at an example of a normal sequence.

In [50]:
(defn slow-list-builder [n]
    (if (zero? n) 
        nil
        (do (Thread/sleep 500)
            (println "conj" n)
            (conj (slow-list-builder (dec n)) n))))
Out[50]:
#'user/slow-list-builder
In [51]:
(def my-list (slow-list-builder 20))
conj 20
conj 19
conj 18
conj 17
conj 16
conj 15
conj 14
conj 13
conj 12
conj 11
conj 10
conj 9
conj 8
conj 7
conj 6
conj 5
conj 4
conj 3
conj 2
conj 1
Out[51]:
#'user/my-list
In [52]:
(time (take 3 my-list))
"Elapsed time: 0.022789 msecs"
Out[52]:
(20 19 18)

We say that slow-list-builder is an eager function because it has built the entire list from 1 to n regardless of how many elements are required.

A lazy sequence builder

  • If we might only need a few elements, we can build a lazy sequence, thus potentially saving us a lot of computational cost.
  • Instead of (conj xs element), we protect it with the lazy-seq form:
(lazy-seq (conj <list-expr> element))

This will ensure that <list-expr> is evaluated only when more elements from it are needed by some computation.

In [64]:
(defn lazy-list-builder [n]
    (if (zero? n) 
        nil
        (do (Thread/sleep 500)
            (println (str ">>" n "<<"))
            (lazy-seq (cons n (lazy-list-builder (dec n)))))))
Out[64]:
#'user/lazy-list-builder
In [65]:
(time (def my-list (lazy-list-builder 10)))
>>10<<
"Elapsed time: 501.305758 msecs"
Out[65]:
#'user/my-list

Note:

  • It only took 100ms to build the "entire" list.
  • This is because it only built one element. The rest are in a lazy expression, and will only be evaluated if we need more elements.
In [66]:
(take 3 my-list)
Out[66]:
(>>9<<
>>8<<
10 >>7<<
9 8)
In [67]:
(take 4 my-list)
Out[67]:
(10 9 >>6<<
8 7)

for is partially lazy

In [102]:
(take 3 (for [i (range 10000)]
            (do (println (str ">>" i "<<"))
                i)))
Out[102]:
(>>0<<
>>1<<
>>2<<
>>3<<
>>4<<
>>5<<
>>6<<
>>7<<
>>8<<
>>9<<
>>10<<
>>11<<
>>12<<
>>13<<
>>14<<
>>15<<
>>16<<
>>17<<
>>18<<
>>19<<
>>20<<
>>21<<
>>22<<
>>23<<
>>24<<
>>25<<
>>26<<
>>27<<
>>28<<
>>29<<
>>30<<
>>31<<
0 1 2)

doall: forces lazy seq to materialize

(doall <seq-expression>)
  • forces the sequence to be fully computed.
  • Be careful: infinite sequences will cause your program to eat up all the memory.
  • Most programs are so fast that eager sequences are lazy sequeces are equivalent in performance.
  • Lazy sequences are designed to save memory.

Regular Expressions in Clojure

  • Clojure can build regular expression patterns fast: #"\d+.\d".
  • Clojure comes with built-in functions:
    • re-matches
    • re-find

re-matches

(re-matches pattern string)

  • returns nil if pattern does not match string.

If the pattern contains groups:

  • returns a vector if pattern matches the entire string.
  • The first element is the entire match.
  • Subsequent elements correspond to groups.

If the pattern does not have groups:

  • returns string if pattern matches the entire string.
In [68]:
(re-matches #"(\d+)(.\d+)?" "hello")
Out[68]:
nil
In [70]:
(re-matches #"(\d+)(\.\d+)?" "123")
Out[70]:
["123" "123" nil]
In [72]:
(re-matches #"(\d+)(\.\d+)?" "3.1415")
Out[72]:
["3.1415" "3" ".1415"]
In [77]:
(re-matches #"\d+\.\d+" "3.1415")
Out[77]:
"3.1415"

re-seq

(re-find pattern string)

  • returns a sequence matching results
  • the pattern is used to match non-overlapping substrings in string.
In [81]:
(re-seq #"\d+\.\d+" "Gold 6.7, Oil 8.9")
Out[81]:
("6.7" "8.9")
In [82]:
(re-seq #"(\d+)(\.\d+)" "Gold 6.7, Oil 8.9")
Out[82]:
(["6.7" "6" ".7"] ["8.9" "8" ".9"])

File I/O

  • (clojure.java.io/reader <filename>) converts filename to a reader.
  • (line-seq <reader>) constructs a lazy sequence of lines from a reader.
In [84]:
;
; Let's import clojure.java.io with a shorter alias.
;
(require '[clojure.java.io :as io])
Out[84]:
nil
In [92]:
(let [r (io/reader "weather_madrid.csv")]
    (clojure.pprint/pprint (take 3 (line-seq r))))
("CET,Max TemperatureC,Mean TemperatureC,Min TemperatureC,Dew PointC,MeanDew PointC,Min DewpointC,Max Humidity, Mean Humidity, Min Humidity, Max Sea Level PressurehPa, Mean Sea Level PressurehPa, Min Sea Level PressurehPa, Max VisibilityKm, Mean VisibilityKm, Min VisibilitykM, Max Wind SpeedKm/h, Mean Wind SpeedKm/h, Max Gust SpeedKm/h,Precipitationmm, CloudCover, Events,WindDirDegrees"
 "1997-1-1,7,4,2,5,3,2,100,95,76,1010,1008,1004,10,9,4,13,6,,0.00,6,,229"
 "1997-1-2,7,3,0,6,3,0,100,92,71,1007,1003,997,10,9,4,26,8,47,0.00,5,Rain,143")
Out[92]:
nil

Programming with infinite (lazy) sequences

range

In [93]:
;
; (range n) is a finite lazy sequence
;
(range 10)
Out[93]:
(0 1 2 3 4 5 6 7 8 9)
In [99]:
;
; Let's time how long it takes to construct all one million elements
;
(time (take 3 (doall (range 1e6))))
"Elapsed time: 41.893853 msecs"
Out[99]:
(0 1 2)
In [103]:
;
; Now, we just take the three elements
;
(time (take 3 (range 1e6)))
"Elapsed time: 0.054014 msecs"
Out[103]:
(0 1 2)
In [104]:
;
; (range) returns an infinite sequence
;
(take 3 (range))
Out[104]:
(0 1 2)
  • (repeat n <expr>): sequence with copies of <expr>
  • (repeat <expr>): infinite sequence
In [106]:
(repeat 10 :a)
Out[106]:
(:a :a :a :a :a :a :a :a :a :a)
In [107]:
(take 3 (repeat :a))
Out[107]:
(:a :a :a)

(iterate f x): returns an infinite with:

  • x
  • (f x)
  • (f (f x))
  • ...
In [110]:
(take 5 (iterate inc 0))
Out[110]:
(0 1 2 3 4)
In [111]:
(take 5 (iterate (partial * 2) 1))
Out[111]:
(1 2 4 8 16)

(cycle seq): infinite sequence that cycles through the seq

In [120]:
(take 20 (cycle ["." "-" "o"]))
Out[120]:
("." "-" "o" "." "-" "o" "." "-" "o" "." "-" "o" "." "-" "o" "." "-" "o" "." "-")

(interpose separator seq)

  • inserting separator between elements in seq.
  • works with infinite sequences.
In [123]:
(interpose " | " (take 4 (cycle ["math" "physics" "chemistry"])))
Out[123]:
("math" " | " "physics" " | " "chemistry" " | " "math")
In [122]:
;
; Rember the `(apply f ...)` can be used to pass elements of sequence
; as arguments to a function.
;
(apply str (interpose " | " (take 4 (cycle ["math" "physics" "chemistry"]))))
Out[122]:
"math | physics | chemistry | math"

(interleave seq1 seq2 ...)

  • interleaves all the sequences into one sequence.
In [124]:
(interleave [:a :b :c] [1 2 3 4 5 6])
Out[124]:
(:a 1 :b 2 :c 3)
In [131]:
;
; Using interleave like interpose
; Oops, there is an extra element to be removed by (but-last)
;
(interleave (take 4 (cycle ["math" "physics" "chemistry"]))
            (repeat " | "))
Out[131]:
("math" " | " "physics" " | " "chemistry" " | " "math" " | ")

Threading Macros

Threading macros are a different syntax to express computation that can be described as a pipeline of transformations.

    f      g       h
x ---> x1 ---> x2 --->

We call x, x1, x2 the threaded value.

In [138]:
(defn f [x] (inc x))
(defn g [x] (/ x 2.))
(defn h [x] (str "result = " x))
Out[138]:
#'user/h
In [139]:
(-> 100 f g h)
Out[139]:
"result = 50.5"

But the threading macro can support multi-input functions as well. There are three variations:

  • -> prepend the threaded value to the begining of the argument list.
  • ->> appends the threaded value to the end of the argument list.
  • as-> binds the threaded value to a symbol for the duration of the pipeline.
In [140]:
(-> 100
    (inc)
    (/ 2.)
    (str " is the result."))
Out[140]:
"50.5 is the result."
In [142]:
(->> 100
    (inc)
    (- 2)
    (str "Result is "))
Out[142]:
"Result is -99"
In [144]:
(as-> 100 $
    (inc $)
    (- $ 2)
    (if (even? $)
        (str "Result is even: " $)
        (str "Result is odd: " $)))
Out[144]:
"Result is odd: 99"

Threading is useful in sequence processing

In [9]:
(->> (range)
     (filter even?)
     (map #(format "%02d" %))
     (interpose "|")
     (take 50)
     (apply str))
Out[9]:
"00|02|04|06|08|10|12|14|16|18|20|22|24|26|28|30|32|34|36|38|40|42|44|46|48|"

Boolean sequences

  • (every? pred seq)
  • (some pred seq)
In [10]:
(every? even? [1 2 3 4])
Out[10]:
false
In [11]:
(every? even? (map (partial * 2) [1 2 3 4]))
Out[11]:
true
In [12]:
(some even? [1 2 3 4])
Out[12]:
true
In [13]:
(some even? (map #(+ (* 2 %) 1) [1 2 3 4]))
Out[13]:
nil

Implementing some interesting functions

In [14]:
(defn divisible? [m n]
    (zero? (rem m n)))
Out[14]:
#'user/divisible?
In [15]:
(divisible? 10 5)
Out[15]:
true
In [17]:
(divisible? 10 3)
Out[17]:
false
In [27]:
(defn prime? [m]
    (not (some (partial divisible? m) (range 2 m))))
Out[27]:
#'user/prime?
In [28]:
(prime? 10)
Out[28]:
false
In [29]:
(prime? 13)
Out[29]:
true
In [62]:
(defn next-prime [n]
    (first (filter prime? (iterate inc (inc n)))))
Out[62]:
#'user/next-prime
In [63]:
(next-prime 17)
Out[63]:
19
In [ ]: