02

Higher-order procedures

Function as object (that is, being able to manipulate functions as data) as opposed to the more familiar view of function as process, in which there is a sharp distinction between program and data.

Using functions as arguments.

;;;;; In file cs61a/lectures/1.3/general.scm
(define pi 3.141592654)
(define (square-area r) (* r r))
(define (circle-area r) (* pi r r))
(define (sphere-area r) (* 4 pi r r))
(define (hexagon-area r) (* (sqrt 3) 1.5 r r))

;;; Có thể khái quát hóa thành
(define (area shape r) (* shape r r))

(define square 1)
(define circle pi)
(define sphere (* 4 pi))
(define hexagon (* (sqrt 3) 1.5))

;;; Và gọi kiểu này
(area square 5)

Unnamed functions.

Lambda is a special form; the formal parameter list obviously isn’t evaluated, but the body isn’t evaluated when we see the lambda, either—only when we invoke the function can we evaluate its body.

> (sum (lambda (x) (* (sin x) (sin x))) 5 8)
2.408069916229755

First-class data types.

the value of a variable (i.e., named)
an argument to a function
the return value from a function
a member of an aggregate (Cái này là sao?) -> Trong Scheme thì Function cũng được coi là một First-class data type

Functions as return values.

(define (compose f g) (lambda (x) (f (g x))))
(define (twice f) (compose f f))

Let

But you should remember that let does not provide any new capabilities; it’s merely an abbreviation for a lambda and an invocation of the unnamed function.

;;;;; In file cs61a/lectures/1.3/roots.scm
(define (roots a b c)
    (let ((d (sqrt (- (* b b) (* 4 a c)))))
        (se (/ (+ (- b) d) (* 2 a))
            (/ (- (- b) d) (* 2 a)) )))

Recursion and iteration

We want to know about the efficiency of algorithms, not of computer hardware. So instead of measuring runtime in microseconds or whatever, we ask about the number of times some primitive (fixed-time) operation is performed.

To estimate the efficiency of this algorithm, we can ask, “if the argument has N numbers in it, how many multiplications do we perform?” The answer is that we do one multiplication for each number in the argument, so we do N altogether. ->The amount of time needed should roughly double if the number of numbers doubles.

;;;;; In file cs61a/lectures/1.2/growth.scm
(define (sort sent)
    (if (empty? sent)
        ’()
        (insert (first sent)
                (sort (bf sent)) )))
(define (insert num sent)
    (cond ((empty? sent) (se num sent))
        ((< num (first sent)) (se num sent))
        (else (se (first sent) (insert num (bf sent)))) ))

If there are N numbers, how many comparisons do we do? - If there are K numbers in the argument to insert, how many comparisons does it do? K of them (Phải xử lý insert cho K số, không biết là "số" ở đây để chỉ số arguments hay số lượng số trong arguments =))). - How many times do we call insert? N times (Do có N số cần được sắp xếp) Và mỗi lần gọi insert thì độ dài của sentence (Tức là cái se ấy) đều khác nhau.

The difference between a linear recursive process and an iterative process.

About memory (space) efficiency. linear-recursive structure: has to keep track of N pending computations during the processing:

during the processing:
(count ’(i want to hold your hand))
(+ 1 (count ’(want to hold your hand)))
(+ 1 (+ 1 (count ’(to hold your hand))))
(+ 1 (+ 1 (+ 1 (count ’(hold your hand)))))
(+ 1 (+ 1 (+ 1 (+ 1 (count ’(your hand))))))
(+ 1 (+ 1 (+ 1 (+ 1 (+ 1 (count ’(hand)))))))
(+ 1 (+ 1 (+ 1 (+ 1 (+ 1 (+ 1 (count ’())))))))
(+ 1 (+ 1 (+ 1 (+ 1 (+ 1 (+ 1 0))))))
(+ 1 (+ 1 (+ 1 (+ 1 (+ 1 1)))))
(+ 1 (+ 1 (+ 1 (+ 1 2))))
...

iterative structure: does not need extra memory to remember all the unfinished tasks during the computation

(count ’(i want to hold your hand))
(iter ’(i want to hold your hand) 0)
(iter ’(want to hold your hand) 1)
(iter ’(to hold your hand) 2)
(iter ’(hold your hand) 3)
(iter ’(your hand) 4)
(iter ’(hand) 5)
(iter ’() 6)
6

For the purposes of this course (CS61A), you should generally use the simpler linear-recursive structure and not try for the more complicated iterative structure; the efficiency saving is not worth the increased complexity.

More is less: non-obvious efficiency improvements.

Trong phần này thì có hai ví dụ (Đọc ghi chú tuần 3 của CS61A), ví dụ đầu tiên thì đơn giản nhưng time complexity lên tới $Θ(2^n)$ nhưng chương trình số 2 phức tạp hơn nhiều, nhưng chỉ sử dụng: $Θ(n^2)$

Moral: When it really matters, think hard about your algorithm instead of trying to fine-tune a few microseconds off the obvious algorithm

1.2 Procedures and the Processes They Generate⁠

To become experts, we must learn to visualize the processes generated by various types of procedures. Only after we have developed such a skill can we learn to reliably construct programs that exhibit the desired behavior. (Section 1.2)

A procedure is a pattern for the local evolution of a computation process: how one stage is built on the previous stage.
The global behavior of a computational process is much harder to reason about.
Processes governed by different types of procedures generate different “shapes.”
Computational processes consume two important resources: time and space.

1.2.1 Linear Recursion and Iteration⁠

The factorial of N is defined as the product of the integers on the interval [1,N].
The naive recursive implementation creates a curved shape:

(factorial 4)
(* 4 (factorial 3))
(* 4 (* 3 (factorial 2)))
(* 4 (* 3 (* 2 (factorial 1))))
(* 4 (* 3 (* 2 1)))
(* 4 (* 3 2))
(* 4 6)
24

The iterative implementation maintains a running product and multiplies the numbers from 1 to N. This creates a shape with a straight edge:

(factorial 4)
(fact-iter 1 1 4)
(fact-iter 1 2 4)
(fact-iter 2 3 4)
(fact-iter 6 4 4)
(fact-iter 24 5 4)
24

Both compute the same mathematical function, but the computational processes evolve very differently.
The first one is a linear recursive process. The chain of deferred operations causes an expansion (as operations are added) and a contraction (as operations are performed).
- The interpreter must keep track of all these operations.
- It is a linear recursive process because the information it must keep track of (the call stack) grows linearly with N.
The second is a linear iterative process. It does not grow and shrink.
- It is summarized by a fixed number of state variables and a rule to describe how they should update and when the process should terminate.
- It is a linear iterative process because the number of steps grows linearly with N.
In the iterative process, the variables provide a complete description of the state of the process at any point. In the recursive process, there is “hidden” information that makes it impossible to resume the process midway through.
The longer the chain of deferred operations, the more information must be maintained (in a stack, as we will see).
A recursive procedure is simply a procedure that refers to itself directly or indirectly.
A recursive process refers to the evolution of the process described above.
A recursive procedure can generate an iterative process in Scheme thanks to tail-call optimization. In other languages, special-purpose looping constructs are needed.

1.2.2 Tree Recursion⁠

With tree recursion, the procedure invokes itself more than once, causing the process to evolve in the shape of a tree.
The naive Fibonacci procedure calls itself twice each time it is invoked, so each branch splits into two at each level.

In general, the number of steps required by a tree-recursive process will be proportional to the number of nodes in the tree, while the space required will be proportional to the maximum depth of the tree. (Section 1.2.2)

The iterative Fibonacci procedure is vastly more efficient in space and in time.

Example: Counting change

Let f(A,N) represent the number of ways of changing the amount A using N kinds of coins. If the first kind of coin has denomination N, then f(A,N)=f(A,N−1)+f(A−D,N). In words, there are two situations: where you do not use any of the first kind of coin, and when you do. The value of f(A,N−1) assumes we don’t use the first kind at all; the value of f(A−D,N) assumes we use one or more of the first kind.

That rule and a few degenerate cases are sufficient to describe an algorithm for counting the number of ways of changing amounts of money. We can define it with the following piecewise function:

f(A,N)={1,if A=0,0,if A<0 or N=0,f(A,N−1)+f(A−D,N),if A>0 and N>0.

Like Fibonacci, the easy tree-recursive implementation involves a lot of redundancy. Unlike it, there is no obvious iterative solution (it is possible, just harder). One way to improve the performance of the tree-recursive process is to use memoization.

1.2.3 Orders of Growth⁠

Different processes consume different amounts of computational resources.
We compare this using order of growth, a gross measure of the resources required by a process as the inputs becomes larger.
Let n be a parameter that measures the size of a problem—it could be the input itself, the tolerance, the number of rows in the matrix, etc.
Let R(n) be the amount of resources the process requires for a problem of size n. This could be time, space (amount of memory), number of registers used, etc.
We say that R(n) has order of growth Θ(f(n)), or R(n)=Θ(f(n)), if there are positive constants A and B independent of n such that Af(n)≤R(n)≤Bf(n) for any sufficiently large value of n.
The value R(n) is sandwiched between Af(n) and Bf(n).
The linear recursive process for computing factorials had Θ(n) time and Θ(n) space (both linear), whereas the linear iterative process had Θ(1) space (constant).
The order of growth is a crude description of the behavior of a process.
Its importance is allowing us to see the change in the amount of resources required when you, say, increment n or double n.

1.2.4 Exponentiation⁠

One way to calculate b to the nth power is via the following recursive definition:

b0=1,bn=b∗bn−1.

A faster method is to use successive squaring:

bn={(bn/2)2,if n is even,b∗bn−1,if n is odd.

1.2.5 Greatest Common Divisors⁠

The GCD of integers a and b is the largest integer that divides both a and b with no remainder. For example, gcd⁡(16,28)=4.
An efficient algorithm uses gcd⁡(a,b)=gcd⁡(b,a mod b).
For example, we can reduce (gcd 206 40) as follows:

(gcd 206 40)
(gcd 40 6)
(gcd 6 4)
(gcd 4 2)
(gcd 2 0)
2

This always works: you always get a pair where the second number is zero, and the other number is the GCD of the original pair.
This is called Euclid’s Algorithm.
Lamé’s Theorem: If Euclid’s Algorithm requires k steps to compute the GCD of some pair (a,b), then min⁡{a,b}≥Fib(k).

1.2.6 Example: Testing for Primality⁠

Searching for divisors

One way to test for primality is to find the number’s divisors.
A number is prime if and only if it is its own smallest divisor.

The Fermat test

The Fermat test is a Θ(log(n)) primality test based on Fermat’s Little Theorem:

If n is a prime number and a is any positive integer less than n, then a raised to the nth power is congruent to a modulo n. (Section 1.2.6)

The test works like this:

Given a number n, pick a random number a<n and calculate an mod n.
Fail: If the result is not equal to a, then n is not prime.
Pass: If the result is equal to a, then n is likely prime.
Repeat. The more times the number passes the test, the more confident we are that n is prime. If there is a single failure, n is certainly not prime.

Probabilistic methods

Most familiar algorithms compute an answer that is guaranteed to be correct.
Not so with the Fermat test. If n passes the Fermat test for one random value of a, all we know is that there is a better than 50% chance of n being prime.
A probabilistic algorithm does not always give a correct result, but you can prove that the chance of error becomes arbitrarily small.
We can make the probability error in our primality test as small as we like simply by running more Fermat tests—except for Carmichael numbers.

Highlights

Numbers that fool the Fermat test are called Carmichael numbers, and little is known about them other than that they are extremely rare. … In testing primality of very large numbers chosen at random, the chance of stumbling upon a value that fools the Fermat test is less than the chance that cosmic radiation will cause the computer to make an error in carrying out a “correct” algorithm. Considering an algorithm to be inadequate for the first reason but not for the second illustrates the difference between mathematics and engineering. (Footnote 1.47)

mk12.github.io SICP Section 1.3 Notes 7–9 minutes 1.3 Formulating Abstractions with Higher-Order Procedures⁠

We have seen that procedures are, in effect, abstractions that describe compound operations on numbers independent of the particular numbers. (Section 1.3)

One of the things we should demand from a powerful programming language is the ability to build abstractions by assigning names to common patterns and then to work in terms of the abstractions directly. (Section 1.3)

Procedures operating on numbers are powerful, but we can go further.
To abstract more general programming patterns, we need to write procedures that take other procedures as arguments and return new procedures.
These are called higher-order procedures.

1.3.1 Procedures as Arguments⁠

Procedures that compute a sum all look the same:

(define (name a b) (if (> a b) 0 (+ (term a) (name (next a) b))))

This is a useful abstraction, just as sigma notation is useful in math because the summation of a series is so common.

The power of sigma notation is that it allows mathematicians to deal with the concept of summation itself rather than only with particular sums. (Section 1.3.1)

1.3.2 Constructing Procedures Using Lambda⁠

lambda creates anonymous procedures. They are just like the procedures created by define, but without a name: (lambda (formal-parameters) body).

A lambda expression can be used as the operand in a combination. It will be evaluated to a procedure and applied to the arguments (the evaluated operands). The name comes from the λ-calculus, a formal system invented by Alonzo Church. Using let to create local variables

We often need local variables in addition to the formal parameters. We can do this with a lambda expression that takes the local variables as arguments, but this is so common that there is a special form let to make it easier.

The general form of a let-expression is:

(let ((var1 exp1) (var2 exp2) ... (varn expn)) body)

This is just syntactic sugar for:

((lambda (var1 var2 ... varn) body) exp1 exp2 ... expn)

The scope of a variable in a let-expression is the body. This allows variables to be bound as locally as possible.
The variables in the let-expression are parallel and independent. They cannot refer to each other, and their order does not matter.
You can use let-expressions instead of internal definitions.

1.3.3 Procedures as General Methods⁠

So far, we have seen:

Compound procedures that abstract patterns of numerical operators (mathematical functions), independent of the particular numbers.
Higher-order procedures that express a more powerful kind of abstraction, independent of the procedures involved.

Now we will take it a bit further. Finding roots of equations by the half-interval method

The half-interval method : a simple but powerful technique for solving f(x)=0.
Given f(a)<0<f(b), there must be at least one zero between a and b.
To narrow it down, we let x be the average of a and b, and then replace either the left bound or the right bound with it.

(define (search f neg-point pos-point) (let ((midpoint (average neg-point pos-point))) (if (close-enough? neg-point pos-point) midpoint (let ((test-value (f midpoint))) (cond ((positive? test-value) (search f neg-point midpoint)) ((negative? test-value) (search f midpoint pos-point)) (else midpoint))))))

(define (close-enough? x y) (< (abs (- x y)) 0.001))

(define (half-interval-method f a b) (let ((a-value (f a)) (b-value (f b))) (cond ((and (negative? a-value) (positive? b-value)) (search f a b)) ((and (negative? b-value) (positive? a-value)) (search f b a)) (else (error "values are not of opposite sign" a b)))))

We can use half-interval-method to approximate π as a root of sin⁡x=0:

(half-interval-method sin 2.0 4.0) → 3.14111328125

Finding fixed points of functions

A number x is a fixed point of a function if f(x)=x.
In some cases, repeatedly applying the function to an arbitrary initial guess will converge on the fixed point.
The procedure we made earlier for finding square roots is actually a special case of the fixed point procedure.

(define tolerance 0.00001) (define (fixed-point f first-guess) (define (close-enough? v1 v2) (< (abs (- v1 v2)) tolerance)) (define (try guess) (let ((next (f guess))) (if (close-enough? guess next) next (try next)))) (try first-guess))

We can use fixed-point to approximate the fixed point of the cosine function:

(fixed-point cos 1.0) → 0.7390822985224023

1.3.4 Procedures as Returned Values⁠

Passing procedures as arguments gives us expressive power. Returning procedures from functions gives us even more. For example, we can write a procedure that creates a new procedure with average damping:

(define (average-damp f) (lambda (x) (average x (f x))))

If we use average-damp on square, we get a procedure that calculates the sum of the numbers from 1 to n:

((average-damp square) 10) → 55

(+ 1 2 3 4 5 6 7 8 9 10) → 55

Highlights

In general, there are many ways to formulate a process as a procedure. Experienced programmers know how to choose procedural formulations that are particularly perspicuous, and where useful elements of the process are exposed as separate entities that can be reused in other applications. (Section 1.3.4)

Newton’s method

The square-root procedure we wrote earlier was a special case of Newton’s method. Given a function f(x), the solution to f(x)=0 is given by the fixed point of

x↦x−f(x)f′(x).

Newton’s method converges very quickly—much faster than the half-interval method in favorable cases. We need a procedure to transform a function into its derivative (a new procedure). We can use a small dx for this:

f′(x)=f(x+dx)−f(x)dx.

This translates to the following procedure:

(define (deriv f) (lambda (x) (/ (- (f (+ x dx)) (f x)) dx)))

Now we can do things like this:

(define (cube x) (* x x x)) (define dx 0.00001) ((deriv cube) 5) → 75.00014999664018

Abstractions and first-class procedures

Compound procedures let us express general methods of computing as explicit elements in our programming language.
Higher-order procedures let us manipulate methods to create further abstractions.
We should always be on the lookout for underlying abstractions that can be brought out and generalized. But this doesn’t mean we should always program in the most abstract form possible; there is a level appropriate to each task.
Elements with the fewest restrictions are first-class:
    They may be named by variables.
    They may be passed as arguments to procedures.
    They may be returned as the results of procedures.
    They may be included in data structures.
In Lisp, procedures have first-class status. This gives us an enormous gain in expressive power.