03 - Basic Types

In go there are only 25 keywords

In an intepretive lang like Python, when you do a = 2, a is an obj that represents or masquerades as a number in the interpreter - eventually the interpreter (program written in C) will use the underlying hardware to actually do math - needs to turn it into binary numbers at some point.

a = 2 -> a object -> interpreter -> cpu -> ram

in Go, when we do a := 2, a is purely the address of a memory location in the machine. - no interpreter, no JVM - it’s direct.

int is the default type for integers in Go.

non ints are represented in floating point: float32 float64

Don’t use floating point for monetary calculations - use pkg like go money

Simple declarations var a int var (b = 2 f = 2.01) Only inside funcs: c := 2 - short declaration operator - declare var, gives it value implicitly.

a := 2
b := 2.4
fmt.Printf("a: %8T %v\n", a, a)
fmt.Printf("b: %8T %[1]v\n", b)

[1] syntax reuses the prev arg. 8 is for spacing.

We can’t assing int to float - have to type cast.

Special types

Can’t convert bool to integer - so can’t do false to 0 or true to 1.
error - special type with one function, Error() - may be nil or non nil
pointers are physically addresses, logically opaque A pointer can be nil or non nil no pointer manipulation except through unsafe pkg

inits

There are no uninitialised vars in go - either u do it or go will do it for u.

All numerical types get 0
bool gets false
string gets ""
Everything else gets nil

constants

only numbers, strings, and booleans can be constants (immutables) good to have immutable constants in a concurrent language.

sample

can redirect nums into stdin can pipe file into stdin

❯ go run . < nums.txt
The avg is 5.75
❯ cat nums.txt | go run .
The avg is 5.75

04 - Strings

rune is the go equivalent of a character (wide char) - int32 physically strings are the utf-8 encoding of unicode characters.

byte is a synonym for uint8
rune is a synonym for int32 for characters.
string: an immutable sequence of “characters”
- physically a sequence of bytes (utf-8 encoding)
- logically a sequence of (unicode) runesp
runes (characters are enclosed in single quotes: ‘a’)

s := "élite"

fmt.Printf("%8T %[1]v\n", s)

// cast to sequence of runes
fmt.Printf("%8T %[1]v\n", []rune(s))
// this prints [233 108 105 116 101]
// ASCII chars fit itno 0 - 127, so when this gets encoded to UTF-8, there
// is an expansion

// cast to bytes - now there is one extra byte. 233 is represented by 2 bytes.
// that's just how UTF-8 encodes unicode
b := []byte(s)
fmt.Printf("%8T %[1]v\n", b)

// get the length too (6)
fmt.Printf("%8T %[1]v %d\n", b, len(b))

// len of s?
fmt.Printf("%8T %[1]v %d\n", s, len(s))

// So the length of a string is the length of the byte string that's
// necessary to encode the string in UTF-8
// Logically it's 5 chars, physically it's 6 bytes in UTF-8 encoding.

//! Len of string is num of bytes required to represent the unicode
// chars, not the number of unicode chars

s:= "hello, world s is a “descriptor” - describes something, has a pointer in it. The pointer points to the actual location in memory where the bytes are, and it has extra info like length (num of bytes that make up the string).

no null byte at end of string
in go len of string is encoded in descriptor instead so we don’t have to walk the string until the null byte to know it ended.

hello := s[:5] - this points to same bytes in memory, but has new descriptor these strings are immutable so they can share storage world :=s[7:] - sub string in s and reuses memory in s t := s - t will be a new descriptor with pointer to same bytes and same length

s := "the quick brown fox" d := s[:4] + "slow" + s[9:] - this points to some completely different memory. s[5] = 'a' - we can’t do this. s += "es" - (copies) - new chunk of memory made. s points to this new chunk now. original s still exists as other vars reference it so in can’t pe garbage collected

s = strings.ToUpper(s) - new piece of memory, s points to this new memory.

05 - Arrays, Slices, and Maps

[4]int // array
[]int // slice
map[string]int // map of string to int

Arrays

arrays are not used a lot
they have a fixed size.
fixed at compile time
arrays are passed by value, so elements are copied
bytes copied physically. d = b - bytes copied.

Slice

Slices have variable length, backed by some array.
Slices have a descriptor like string, and it points at some other memory

var a []int	      // nil, no storage
var b = []int{1,2}    // initialized

a = append(a, 1)      // append to nil OK
b = append(b, 3)      // []int{1,2,3}

a = b		      // overwrites a (`b` descriptor gets copied into `a` descriptor)
		      // points at same bytes

d := make([]int, 5)   // []int{0,0,0,0,0}
e := a		      // same storage (alias) e and a descriptors are the same.
e[0] == b[0]	      // true

off by one

Slices are indexed like [8:11] (read as starting el and one past the ending el, so 11 - 8 = 3 els in slice) 8,9,10

Passed by reference, no copying, updating OK.

Slice

Variable length
Passed by reference
Not comparable
Cannot be used as map key
Has copy & append helpers Useful as function param

Array

Length fixed at compile time
Passed by value (copied)
Comparable (==)
Can be used as map key Useful as “pseudo” constants

var w = [...]int{1,2,3}
var x = []int{0,0,0}

func do(a [3]int, b [int]) []int {
  a = b		  // syntax error
  a[0] = 4	  // w unchanged (bytes copied, changing just local var)
  b[0] = 3	  // x changes

  c := make([]int, 5)
  c[4] = 42
  copy(c,b)	  // copies only 3 els

  return c
}

y := do(w,x) // [1,2,3], [3,0,0] [3,0,0,42]

Map

Maps are dictionaries, indexed by key, returing a value.
You can read from a nil map, but inserting will panic

var m map[string]int // nil, no storage
p := make(map[string]int) // non-nil but empty

a := p["the"] // returns 0 (it's the "nil" value for int)
b := m["the"] // same thing as above

m["and"] = 1 // PANIC - nil map
m = p
m["and"]++    // OK, same map as p now
c := p["and"] // returns 1

m has no hash table behind it so trying to insert crashes. m points to nothing p points to some hash table

m = p copies p descriptor into m descriptor. m points to same hash table p point to

Maps are passed by reference, no copying, updating ok
The type for the key must have ”==” and “!=” defined (not slices, maps, funcs) must
Maps can’t be compared to one another; maps can only be compared to nil.

Can declare a map inline:

var m = map[string]int{
  "and": 1,
  "the": 1,
  "or": 2,
}

var n map[string]int // nil
b := m == n // syntax error
c := n == nil // true
d := len(m) // 3
e := cap(m) // type mismatch (capacity)

Maps have a special two-result lookup function The second var tells you if key was there

p := map[string]int{}	// non-nil but empty

a := p["the"]	// return 0 but is it 0 cause key not in map, or value is 0???
b, ok := p["and"] // 0, false

p["the"]++
c, ok := p["the"] // 1, true

if w, ok := p["the"]; ok {
  // we know w is not the default value here
  // ...
}

Making nil useful Nil is a type of zero - inicates the absence of something. In go the builtins like len, cap, range are safe to use with nils. Can get len on nil string, can read from a nil map, can loop over nil string etc.

“Make the zero value useful” - Rob Pike. https://www.youtube.com/watch?v=ynoY2xz-F8s

Look at word-sort project

06 - Control Statements; Declarations & Types

if statements need brackets.
can start with short declaration or statement if err := doSomething(); err != nil {}

When using range

for i := range myArray {
  fmt.Println(i, myArray[i])
}

VS

// here, we copy value out of array into v. Don't do this if value is big
// bad performance, use first version.
for i, v := range myArray {
  fmt.Println(i, v)
}

In go, map is based off a hash table - no order.
If something is capitalized in go, it’s exported. If it’s not, it’s internal to the package.
Dependencies are tree structure list. Can’t do cyclical dependencies. Move common deps to 3rd pkg, or eliminate.

What makes a good package?

A pkg should embed deep functionality behind a simple API.

package os

func Create(name string) (*File, error)
func Open(name string) (*File, error)

func (f *File) Read(b []byte) (n int, err error)
func (f *File) Write(b []byte) (n int, err error)
func (f *File) Close() error

short declarations have some gotchas! (:=)

func Bad(f *os.File, buf []byte) error {
  var err error

  for {
    n, err := f.Read(buf) // shadows err above
    if err != nil {
      break // causes return of WRONG value
    }

    foo(buf)
  }

  return err // will always be nil
}

Go uses named typing for non function user-declared types. Most other cases it’s structure based typings (duck typing)

07 - Formatted & File I/O

Unix has the notion of 3 standard I/O streams:

standard input
standard output
standard error (output)

these are normally mapped to the console/terminal but can be redirected find . -name "*.go" | xargs grep -n "rintf" > print.txt

// Always os.Stdout
fmt.Println(...interface{}) (int, error)
fmt.Printf(string, ...interface{}) (int, error)

// Print to anything that has the correct Write() method
fmt.Fprintln(io.Writer, ...interface{}) (int, error)
fmt.Fprintf(io.Writer, string, ...interface{}) (int, error)

// Return a string
fmt.Sprintln(...interface{}) (int, error)
fmt.Sprintf(string, ...interface{}) (int, error)

In UNIX, a run of bytes is a file. Every file is just bytes.

08 - Functions, Parameters & Defer

Functions in Go are “first class” objects.

Define them, even inside other funct
Create anon func literals
Pass them as function parameters/return values
Store them in vars, slices, maps (not as keys), fields of a stuct type
Send and receive in channels
Write methods against a func type
Compare a func var against nil

The signature of a fn is the order & type of its params and return values. It does not depend on the names of those params or returns.

A function declaration lists FORMAL parameters func do(a, b int) int {...}

A function call has ACTUAL params (aka arguments) result := do(1, 2)

A param is passed by VALUE if the func gets a copy. The caller can’t see changes to the copy

A param is passed by REFERENCE if the function can modify the actual param such as that the caller sees the changes. Get the pointer to it.

By value: numbers, bool, arrays, structs By reference: things passed by pointer (&x), strings (immutable), slices, maps, channels

// Two different types of references

/*
Here the map descriptor gets copied, but we have reference to original hash table.
*/
func do(m1 map[int]int) {
	m1[3] = 0
	m1 = make(map[int]int)
	// m1 is a totally different map local to this fn.
	// No relation to m at all.
	m1[4] = 4
	fmt.Printf("m1: %v\n", m1)
}

/*
Here the ADDRESS of a map descriptor. We change map descriptor in this func,
as well as in main - it gets overwritten
*/
func doWithPointer(m1 *map[int]int) {
	// dereference pointer
	(*m1)[3] = 0
	(*m1) = make(map[int]int)
	(*m1)[4] = 4
	fmt.Printf("m1: %v\n", *m1)
}

func main() {
	m := map[int]int{4: 1, 7: 2, 8: 3}
	// do(m)
	// pass in address of m

	fmt.Printf("m: %v\n", m)
	doWithPointer(&m)
	fmt.Printf("m: %v\n", m)
}

So technically params are always passed in by value. (copying something)
If the thing copied is a pointer or descriptor, then the shared backing store (array, hash table, etc) can be changed through it.
So we think of it as a reference
Techically in go there’s no such thing as by reference.
So that’s why when we pass a pointer to a map descriptor (doWithPointer), if we change the map descriptor in other func, we change where it was called (main)

So the descriptors of slices, maps are copied, the underlying arrays/hash tables are not.

Defer

The defer statement captures a function call to run later.

operates on a function scope, not on a block scope. It will happen when func exits.

Don’t always defer by default, for example if looping over filenames from os.Args, you open a file, then defer file.Close() at end of each iteration, it won’t close it at end of iteration, but at end of func. So you might end up constantly opening files and not closing, running out of memory.

Unlike a closure, defer copies arguments into defered call.

func main() {
  a := 10
  defer fmt.Println(a)
  a = 11
  fmt.Println(a)
}
// prints 11, 10

The param a gets copied at the defer statement (not a reference)

func do() (a int) {
  defer func() {
    a = 2
  }()

  a = 1
  return
}
// returns 2

09 - Closures

Fucs that live inside funcs and refer the enclosing funcs data.

Scope vs lifetime Scope is static, based on the code at compile time. Lifetime depends on program execution (runtime) Lifetime of a var can exceed the scope in which it’s declared

func do() *int {
  var b int
  // ...
  return &b
}

ends up in heap

escape analysis Var b can only be seen inside do, but its value will live on. The value (object) will live so long as part of the program keeps a pointer to it.

In interpreted languages all allocations happen in the heap - not efficient. Go allocates as much as possible in stack - when lifetime exceeds, then uses heap.

What is a closure? When a func inside another func “closes over” one or more local variables of the outer func.

func fib() func() int {
  a, b := 0, 1

  return func() int {
    a, b = b, a + b
    return b
  }
}

The inner func gets a reference to the outer func’s var Those vars may end up with much longer lifetime than expected - as long as there’s a reference to the inner func.

Example with common misunderstanding https://youtu.be/US3TGA-Dpqo?list=PLoILbKo9rG3skRCj37Kn5Zj803hhiuRK6

s := make([]func(), 4)

for i := 0; i < 4; i++ {
	i := i // here's how to fix that issue - create local copy that cannot mutate again
	s[i] = func() {
		fmt.Printf("%d @ %p\n", i, &i)
	}
}

for i := 0; i < 4; i++ {
	s[i]()
}

// when we run this, we get same location and value for i - 4
// closure has a reference to i. So that's why it stays as 4 for all.
// so when we call closure, we always print 4 - they use the same i

10 - Slices in Detail

![[Pasted image 20231007112434.png]] ![[Pasted image 20231007112449.png]] Why do we care? ![[Pasted image 20231007112636.png]] When we check if slice is empty or not, the best way is if len(s) == 0 {} cause checking if s == nil , it can be nil in 2 cases - nil or empty slice.

Common gotcha - creating a slice with 0 length and 5 capacity is all good, can do s[0] = "", and can append, but creating slice with len and cap 5 (make([]int, 5)) and append will add the new values after the 5 0s…

It is also okay to append to a nil slice.

Another common gotcha is not using d := a [0:1:1] (slice with len and capacity.) With 2 slicing operators, you get the capacity of original underlying array. Not intuitive ![[Pasted image 20231007120537.png]]

![[Pasted image 20231007120521.png]]

If we append first, we get new memory.

11 - Homework #2

Depth first search

12 - Structs, Struct tags & JSON

A struct is an aggregate, like a DB record.

type Employee struct {
	Name   string
	Number int
	Boss   *Employee
	Hired  time.Time
}

func main() {
	// Always do maps of strings to struct POINTERS - issues otherwise (https://youtu.be/0m6iFd9N_CY?list=PLoILbKo9rG3skRCj37Kn5Zj803hhiuRK6&t=796)
	c := map[string]*Employee{}

	c["lamine"] = &Employee{"Lamine", 2, nil, time.Now()}

	c["matt"] = &Employee{
		Name:   "Matt",
		Number: 1,
		Hired:  time.Now(),
		Boss:   c["lamine"],
	}

	fmt.Printf("%T %+[1]v\n", c["lamine"])
	fmt.Printf("%T %+[1]v\n", c["matt"])
}

func main() {
	var album1 = struct {
		title string
	}{
		"The white album",
	}
	var album2 = struct {
		title string
	}{
		"The black album",
	}

	// we can do this - duck typing
	album1 = album2
	fmt.Println(album2, album1)
}

type album1 struct {
	title string
}

type album2 struct {
	title string
}

func main() {
	var a1 = album1{
		"The white album",
	}
	var a2 = album2{
		"The  black album",
	}

	// We can't do this - not the same named types.
	a1 = a2
	// They are convertible
	a1 = album1(a2)
	fmt.Println(a2, a1)
}

Two struct types are compatible if

fields have same types and names
in the same order
and the same tags (*)

A struct may be copied or passed as a param in its entirety. A struct is comparable if all its fields are comparable The zero value for a stuct is “zero” for each field in turn.

Passing structs

func soldAnother(a album) {
  // oops
  a.copies++
}
func soldAnother(a *album) {
  a.copies++
  // same as (*a).copies
}

struct{} is a singleton!

Struct tags

A string that’s put at end of the field type and it has some info on how to encode the field in various protocols.
Usually used to convers data in the program to an external format, json, protobufs, dbs, xml Using reflection to encode and decode

type Response struct {
	Page int `json:"page"`
	// if lowercase, not exported. Private fields of struct not encoded.
	Words []string `json:"words,omitempty"`
}

func main() {
	r := &Response{Page: 1, Words: []string{"up", "in", "out"}}
	j, _ := json.Marshal(r)
	fmt.Printf("j: %v\n", string(j))
	fmt.Printf("r: %#v\n", r)

	var r2 Response
	_ = json.Unmarshal(j, &r2)
	fmt.Printf("r2: %#v\n", r2)
}

13 - Regular Expressions & Search

// TODO later

14 - Reference & Value Semantics

When to use pointers vs when to use values? Pointers - Shared, not copied Values - Copied, not shared

Sometimes not sharing is safer (concurrency) Pointer semantics may be more efficient.

Why use pointers?

Some objs can’t be copied safely (mutex)
Some objs are too large to copy efficiently (consider when size > 64 bytes)
Some methods need to change (mutate) the receiver (later!)
Decoding protocol data into an obj (JSON)
Using a pointer to signal a null obj. Remember, pointers have a cost too - two trips to memory instead of one.

Any struct with a mutex MUST be passed by reference.

Any small struct under 64 bytes probably should be copied. For example passing around descriptors is done by value! they’re small 16 or 24 bytes.

If a thing is to be shared, always pass a pointer. ![[Pasted image 20231008101229.png]]

Stack allocation is more efficient. Putting on stack means not using pointers.
Accessing a var directly is more efficient that following a pointer
Accessing a dense sequence of data is more efficient that spare data (array faster than linked list etc.)
Escape analysis - go figures out if to put stuff on stack or heap build with -gcflags -m=2 to see.

For loop:

for i, thing := range things {
	// thing is a copy
}

// Use index if you need to mutate element
for i, thing := range things {
	thing[i].foo = bar
}

Slice safety: Anytime a func mutates a slice that’s passed in, we must return a copy.

func update(things []thing) []thing {
...
	things = append(things, x)
	return things
}

![[Pasted image 20231008103746.png]]

items := [][2]byte{{1, 2}, {3, 4}, {5, 6}}
a := [][]byte{}

// WRONG!
for _, item := range items {
	// We are creating slice with same underlying array.
	// Effectively appending a slice that refernces the same memory location
	// as item.
	a = append(a, item[:])
}

// CORRECT
for _, item := range items {
	i := make([]byte, len(item))
	copy(i, item[:])
	a = append(a, i)
}

fmt.Printf("items: %v\n", items)

// by the time we print a, item has been updated to {5, 6} and all
// slices in 'a' refernce this same memory location
fmt.Printf("a: %v\n", a)

Another example ![[Pasted image 20231008105421.png]]

![[Pasted image 20231008105436.png]]

Common problems in closure, loop variables, goroutines.

15 - Networking with HTTP

![[Pasted image 20231008111618.png]] Go doesn’t have classes, but go allows methods to be put on any user declared type. Can put methods on a function - why not? it’s just an objuct

16 - Homework #3

Two programs -

Read all comics into a file
Search those comics for a keyword

17 - Go does OOP

Essentials of OOP:

abstraction
encapsulation
polymorphism
inheritance

Sometimes the als two items are combined or confused. Go’s approach to OOP is similar but different.

Abstraction Decoupling behaviour from implementation details. Unix file system API is good example - 5 basic funcs hide all messy details open, close, read, write, ioctl

Encapsulation Hiding implementaton details from misuse Controlling the visibility of names (private vars)

Polymorphism Many shapes Multiple types behind a single interface. 3 main types:

Ad-hoc - function/operator overloading
Parametric: “generic programming” - generics
Subtype: subclasses substituting for superclasses

“Protocol-oriented” programming uses explicit interface types, now supported in many langs. (an ad-hoc method) Behaviour is completely separate from implementation, which is good for abstraction.

Inheritance It has conflicting meanings.

substitution (subtype) polymorphism
structural sharing of implementation details

Inheritance gets overused.

It injects a dependence on the superclass into the subclass

what if superclass changes behaviour?
what if abstract concept is leaky?

Not having inheritance means better encapsulation & isolation

“Interfaces will force you to think in term of communication between objects”

See also “Composition over inheritance” and “Inheritance tax”

OO in Go Go offers 4 main supports for OOP

encapsulation using the package for visibility control
abstraction & polymorphism using interface types
enhanced composition to provide structure sharing

Go does not offer inheritanco or substitutability based on types

Substitutability is based only on interfaces: purely a function of abstract behaviour.

Go allows defining methods on any user defined type, rather than only a “class”

Go allows any object to implement the method(s) of an interface, not just a “subclass”.

18 - Methods and Interfaces

An interface specifies abstract behaviour in terms of methods

type Stringer interface {
  String() string
}

Concrete types offer methods that satisfy the interface

A method is a special type of func. It has a receiver param before the func name param.

type IntSlice []int

// is is the receiver
func (is IntSlice) String() string {
  // ...
}

Why interfaces Without them, we’d have to write many funcs for many concrete types, possibly coupled to them

func OutputToFile(f *File, ...) {...}
func OutputToBuffer(b *Buffer, ...) {...}
func OutputToSocket(s *Socket, ...) {...}

Better - we want to define our funcs in terms of abstract behaviour

type Writer interface {
  Write([]byte) (int, error)
}

func OutputTo(w io.Writer, ...) {...}

Takes any obj that provides a write method that allows me to write bytes to it.

An interface specifiec required behaviour as a method set. Any type that implements that method set satisfies the interface. Known as “duck” typing No type will declare itself to implement interface explicitly. We think of interface from a consumer side.

We don’t need a struct to declare methods. Any user-declared (named) type.

A method may take a pointer or value receiver, but not both

type Point struct {
  X, Y float64
}

func (p Point) Offset(x,y float64) Point {
  return Point{p.x+x, p.y+y}
}

func (p *Point) Move(x,y float64) {
  p.x += x
  p.y += y
}

Taking a pointer allows the method to change the receiver.

type ByteCounter int

func (b *ByteCounter) Write(p []byte) (int, error) {
	l := len(p)
	*b += ByteCounter(l)
	return l, nil
}

func main() {
	var c ByteCounter

	f1, _ := os.Open("a.txt")
	f2 := &c

	n, _ := io.Copy(f2, f1)

	fmt.Println("copied", n, "bytes")
	fmt.Println(c)
}

All the methods must be present to satisfy the interface. So it pays to keep interfaces small. ![[Pasted image 20231009084212.png]]

The receiver must be the right type (pointer or value) ![[Pasted image 20231009084607.png]]

io.ReadWriter is actually defined by Go as two interfaces.

type Reader interface {
	Read(p []byte) (n int, err error)
}

type Writer interface {
	Write(p []byte) (n int, err error)
}

type ReadWriter interface {
	Reader
	Writer
}

Small interfaces with composition where needed are more flexible

Interface declarations All methods for a give type must be declared in the same package where the type is defined.

We can always extend the type in a new package through embedding

type Bigger struct {
	my.Big // get all Big methods via promotion
}
func (b Bigger) Do() {...} // add one more method here.

19 - Composition

The fields of an embedded struct are PROMOTED to the level of the embedding stucture.

type Pair struct {
  Path string
  Hash string
}

type PairWithLength struct {
  Pair
  Length int
}

pl := PairWithLength{Pair{"/urs", "arstaars"}, 121}
fmt.Println(pl.Path, pl.Length) // not pl.x.Path

Promotion - Fields of Pair appear at same level as fields of PairWithLength We also promote methods..

type Pair struct {
	Path string
	Hash string
}

func (p Pair) String() string {
	return fmt.Sprintf("Hash of %s is %s", p.Path, p.Hash)
}

type PairWithLength struct {
	Pair
	Length int
}

// this will be used instead of Pair. Not called override - no inheritance.
func (p PairWithLength) String() string {
	return fmt.Sprintf("Hash of %s is %s with length %d", p.Path, p.Hash, p.Length)
}

func (p Pair) Filename() string {
	return filepath.Base(p.Path)
}

type Filenamer interface {
	Filename() string
}

func main() {
	p := Pair{"/usr", "0xfdfe"}
	// exception to promotion - have to use Pair struct here.
	// pl := PairWithLength{Pair{"/usr", "0xdead"}, 133}
	fmt.Println(p)

	var fn Filenamer = PairWithLength{Pair{"/usr", "0xdead"}, 133}
	fmt.Printf("fn: %v\n", fn)
}

we can do this assignment to Filenamer! Filename method was on Pair not PairWithLength but that method was promoted into PairWithLength

Interfaces is how we get around this issue - had a method on a concrete type Path, and PairWithLength is a different concrete type. They are both examples of Filenamer because of promotion.

Composition with pointer types A struct can embed a pointer to another type; promotion of its fields and methods works the same way.

type Fizgig struct {
  *PairWithLength
  Broken bool
}

// we pass in address of something we created with PairWithLength - allocated on
// heap, and we get pointer to it.
fg := Fizgig{
  &PairWithLength{Pair{"/usr", "0xfdfe"}, 121},
  false,
}
fmt.Println(fg)
// Length of /usr ir 121 with hash 0xfdfe
// Still allows promotion even if embedding pointers!!

Sorting Look at compositon Organs.go ![[Pasted image 20231011220101.png]]

![[Pasted image 20231011220015.png]]

Make nil useful ![[Pasted image 20231011220208.png]] We don’t see the data field here in StringStack struct. It’s lowercase - not exported, it’s encapsulated. Instead, Push and Pop are exposed. For pop, if we try to pop from empty stack, we use panic to make a custom error message. We use data []string so that we have a good default “nil” value - can straigt away append to an empty slice!

Nothing in Go prevents calling a method from a nil receiver

It’s up to us to handle. ![[Pasted image 20231011220859.png]]

20 - Interfaces & Methods in Detail

An interface variable is nil until initialized. It really has 2 parts:

A value or pointer of some type
A pointer to the type information so the correct actual method can be identified. - the concrete data.

var r io.Reader // nil until initialised
var b *bytes.Buffer // ditto
r = b // r is no longer nil! but it has a nil pointer to a Buffer.

An interface is nil only if both parts are!

If both those pointers are nil, the interface itself is nil! ![[Pasted image 20231012081239.png]] r1 has a nil buffer pointer! So after we do the assignment, the interface is not nil.

Error is really an interface! We called error a special type, but it’s really an interface!

type error interface {
  func Error() string
}

It’s an interface because we sometimes wish to have more in there than just an error string!

We can compare it to nil unless we made a mistake The mistake is to store a nil pointer to a concrete type in the error variable.

type errFoo struct {
	err  error
	path string
}

// to make errFoo compatible with err type.
func (e errFoo) Error() string {
	return fmt.Sprintf("%s: %s", e.path, e.err)
}

// FIX: change to return error
func XYZ(a int) *errFoo {
	return nil
}

func main() {
	// err := XYZ(1) // err would be *errFoo
	var err error = XYZ(1)
	fmt.Printf("err: %v\n", err) // BAD: interface gets a nil concrete pointer.

	if err != nil {
		fmt.Println("oops!!!!")
	} else {
		fmt.Println("OK!!!!")
	}
}

/*
XYZ returns a nil pointer to a concrete value, and that gets copied into an
interface, and that interface is no longer nil!
It's not nil, even though it has a nil pointer inside!

Fix is to make XYZ return a type error, and when it returns nil it's not returning
a nil pointer to a concrete type, its's returning a nil interface value!

Actually not easy to differentiate between the interface being nil and the
interface having a nil pointer!

Make sure the errorable function return out the interface `error`
*/

A method can be defined on a pointer of a type

type Point struct {
  x, y float32
}

// Point gets changed
func (p *Point) Add(x, y float32) {
  p.x, p.y = p.x + x, p.y + y
}

// Point does not get changed!
func (p Point) OffsetOf(p1 Point) (x float32, y float32) {
  x, y = p.x - p1.x, p.y - p1.y
  return
}

The same method name may not be bound to both T and *T

Pointer methods may be called on non-pointers and vice versa. Go will automatically use * or & as needed.

p1 := new(Point) // *Point, at (0,0)
p2 := Point{1,1}

p1.OffsetOf(p2) // same as (*p1).OffsetOf(p2) - go auto dereference the pointer.
p2.Add(3, 4)	// same as (&p2).Add(3,4) - compiler auto takes the address of p2.

Catch: & only works to objects that are addressable

Compatibility between objs and receiver types

	Pointer	L-Value	R-Value
pointer receiver	OK	OK	NOT OK
value receiver	OK *	OK	OK

A method requiring a pointer receiver may only be called on an addressable obj.

var p Point

p.Add(1, 2) // OK, &p
Point{1,1}.Add(2,3) // NOT OK - can't take address - point literal is a var, not a place.
// not addressable.

If one method of a type takes a pointer receiver, then all its methods should take pointers (except for other reasons :D)

And in general objects of that type are probably not safe to copy!

type Buffer struct {
  buf	[]byte
  off	int
}

// not safe to make copy of buffer - has slice. So all method receivers will
// be pointers.
func (b *Buffer) ReadString(delim byte) (string, error) {
  ...
}

A method value with a value receiver copies the receiver. If it has a pointer receiver, it copies a pointer to the receiver.

func (p *Point) Distance(q Point) float64 {
  return math.Hypot(q.X-p.X, q.Y-p.Y)
}

p := Point{1,2}
q := Point{4,64,64,6}

distanceFromP := p.Distance
p = Point{3,4}
fmt.Pintln(distanceFromP(q)) // uses "new" value of p

Interfaces in practice

let consumers define interfaces. (what minimal behavior do they require?)
Re-use standard interfaces wherever possible.
Keep interface declarations small. (The bigger the interface, the weaker the abstraction)
Compose one-method interfaces into larger interfaces (if needed)
Avoid coupling interfaces to particular types/implementations
Accept interfaces, but return concrete types (let the consumer of the return type decide how to use it.)

“Be liberal in what you accept, be conservative in what you return”

Put the least description on what params u accept (the minimal interface) Don’t require ReadWriteCloser if you only need to read Can require Reader interface! even if just reading from file, if u try to write unit tests, you might want to read from a buffer instead!!
Avoid restricting the use of your return type. The concrete value you return might fit with many interfaces! Returning *os.File is less restrictive than returning io.ReadWriteCloser because files have other useful methods.

Exception - returning error. Return the error interface.

The interface{} type has no methods. So it is satisfied by anything!

Empty interfaces are commonly used; they’re how the formatted I/O routines can print any type. Bit like a void ptr in C.

func fmt.Printf(f string, args ...interface{}) // any!

Dynamic typing area! Have to use reflection to determine what the concrete type is.

21 - Homework #4

done

22 - What is Concurrency?

Some definitions of concurrency

Execution happens in some non-deterministic order
Undefined out of order execution
Non sequential execution
Parts of program execute out of order or in partial order.

Partial Order: -> 2a -> 2b -> 1 - - - - - - 4 -> 3a -> 3b ->

part 1 happens before parts 2 or 3
both 2 and 3 complete before 4
the parts of 2 and 3 are ordered among themselves

Some possible orders -

1,2a,2b,3a,3b,4
1,2a,3a,2b,3b,4
1,2a,3a,3b,2b,4
1,3a,3b,2a,2b,4

We don’t means different results, but a different trace of execution.

Subroutines are subordinate, while coroutines are co-equal program -> subroutine -> program -> subroutine -> program vs program -> \\-> coroutine -> \\\\-> coroutine ->

Definition of concurrency: Parts of the program may execute independently in some non-deterministic (partial) order

Parallelism

Parts of a program execute independently at the same time
You can have concurrency with a single core processor. (interrupt handling in the operating system)
Parallelism can only happen on a multi-core processor.
Concurrency doesn’t make the program faster, parallelism does.
Concurrency is about dealing with things happening out of order
Parallelism is about things actually happening at the same time
A single program won’t have parallelism without concurrency.
We need concurrency to allow parts of the program to execute independently

But concurrency brings problem…

Race Condition

-It’s the possibility that my out of order, non deterministic execution may get something wrong. It’s a bug.

“System behaviour depends on the (non deterministic) sequence or timing of parts of the program executing inpenendently, where some possible behaviours (orders of execution) produce invalid results”.


Read
Modify	Shared Account	Read
Write		Modify
		Write

Each operation group of R+M+W needs to be clamped, ATOMIC - can’t be spread apart. Whichever action group happens first is seen by second one.

Solutions?

dont share anything
make shared things read-only
allow only one writer to the shared things
make the read-modify-write operations atomic

In last case, we’re adding more sequential order to our operations. This reduces concurrency!

23 - CSP, Goroutines, and Channels

CSP - Communicating Sequential Processes.

Channels

A channel is a one-way communication pipe. Like a unix pipe - a way of sending data from one program to another. (Usually with | which redirects output of on program into input of another, usually happens in parallel!)
Unidirectional, in order
Things go in one end, come out the other
In the same order they went in (unidirectional)
Until channel closed
Multiple readers & writers can share it safely

The channel is the communicating part of the CSPs - acts as a buffer or sync point allowing these sequential processes individually to be concurrent as a group.

Each part is independent
All they share are the channels between them
The parts can run in parallel as the hardware allows

CSP Provides a model for thinking about it that makes it less hard (take the program apart and make the pieces talk to eachother) Lets us write async code in a sync style.

Goroutine

A unit of independent execution (coroutine)

To start one, put go in front of a function call.

The trick is knowing how the goroutine will stop. Else we end up with memory leak - a goroutine that’s become orphaned - stuck, doesn’t get finished, holds onto resources it has - bad esp if building web server.

you have a well-defined loop terminating condition, or
you signal completion through a channel / context, or
you let it run until program stops.

Need to make sure it doesn’t get blocked by mistake.

A goroutine is not a thread.

Channel

It’s a datatype. It’s like a one-way socket or Unix pipe (except it allows multiple readers and writers.)
It’s a method of synchronization as well as communication
We know that a send (write) always happens before a receive (read)
It’s a vehicle for transferring ownership of data, so that only one goroutine at a time is writing the data (avoid race conditions)

“Don’t communicate by sharing memory; instead, share memory by communicating”

24 - Select

Select allows any “ready” alternative to proceed among

a channel we can read from
a channel we can write to
a default action that’s always ready

Most often select runs in a loop so we keep trying.

We can compose channels as synchronization primitives
Traditional primitives (mutex, condition var) can’t be composed!

func TickMain() {
	log.Print("start")
	const tickRate = 2 * time.Second

	stopper := time.After(5 * tickRate)
	ticker := time.NewTicker(tickRate).C

loop:
	for {
		select {
		case <-ticker:
			log.Println("tick")
		case <-stopper:
			log.Println("STOPPING")
			break loop // breaking out of for loop, not select.
		}
	}
}

In a select block, the default case is always ready and will be chosen if no other case is.

Don’t use default inside a loop - the select will busy wait and waste CPU.

BOOK: https://www.amazon.co.uk/Concurrency-Go-Katherine-Cox-buday/dp/1491941197

25 - Context

Context package offers a common method to cancel requests

explicit cancellation
implicit cancellation based on a timeout or deadline

We get two controls

A channel that closes when the operation occurs
An error that’s readable once the channel close

The error value tells if req was cancelled or timed out. We often use channel from Done() in a select block

Contexts form an immutable tree structure. (go-routine safe; changes to context dont affect its ancestors.)

Cancellation or timeout applies to the current context and its subtree, same for val.

We gotta think about trees, subtrees. We don’t modify a context, we add subtree.

If I have a context that smb passes into me and I want to add a timeout again, I need to create a new context that points at the old one and the timeout applies downward, it applies to the context I just created applies to any context that is derived from the one I have now.
AKA, any new children will point up to my context with a timeout, but the timeout will not apply to anything “above” me.

type result struct {
	url     string
	err     error
	latency time.Duration
}

func get(ctx context.Context, url string, ch chan<- result) {
	var r result

	start := time.Now()
	ticker := time.NewTicker(1 * time.Second).C
	req, _ := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)

	if resp, err := http.DefaultClient.Do(req); err != nil {
		// put result on a var, dont put it on chan immediately.
		r = result{url, err, 0}
	} else {
		t := time.Since(start).Round(time.Millisecond)
		r = result{url, nil, t}
		resp.Body.Close()
	}

	for {
		select {
		case ch <- r:
			return
		case <-ticker:
			log.Println("tick", r)
		}
	}
}

// we start multiple requests, get one response, make the other ones cancel.
func first(ctx context.Context, urls []string) (*result, error) {
	results := make(chan result)
	// results := make(chan result, len(urls)) // buffer to avoid leaking, solves bug.
	ctx, cancel := context.WithCancel(ctx)

	defer cancel()

	for _, url := range urls {
		go get(ctx, url, results)
	}

	select {
	// we ge the first response by listening to the results channel.
	// What happens to the other go routines?
	case r := <-results:
		return &r, nil // we do deferred cancellation
	// we need this case because we had a ctx provided from above! What if there
	// was some timeout above? We gotta handle parent context like this.
	case <-ctx.Done():
		return nil, ctx.Err()
	}

}

func ParallelGetV2() {
	urls := []string{
		"https://amazon.com",
		"https://nytimes.com",
		"https://wsj.com",
		"https://facebook.com",
		"https://google.com",
		"http://localhost:8080/wait",
	}

	// give me the first response.
	r, _ := first(context.Background(), urls)

	if r.err != nil {
		log.Printf("%-20s %s\n", r.url, r.err)
	} else {
		log.Printf("%-20s %s\n", r.url, r.latency)
	}

	time.Sleep(9 * time.Second)
	log.Println("Quit anyway...", runtime.NumGoroutine(), "still running")

}

Here is a program. There is one bug in it with leaking go routines. This is the main culprit of a memory leak - leaking goroutines and sockets.

select {
case r := <-results:
    return &r, nil // we do deferred cancellation

Here in first(), We first start some goroutines. We then get the first res by listening to results channel. So wtf happens to the other goroutines? We cancel http operation, but those goroutines get hung up.

This is because we made the results channel with no buffer.

In an unbuffered channel, if someone wants to send, somebody else has to be able to receive (they happen roughly at same time.)

So if there’s nobody receiving, the sender blocks until there’s somebody ready to receive!!

So after we get the first res, the other goroutines that want to write to this channel are gonna get stuck!! They can’t write, as there is nobody to read.

So what we need to do, is to buffer to avoid leaking. If the channel is buffered, it means it already has a certain amount of space. So people can store their results on the channel even if there’s nobody ready to receive.

Unbuffered channel: sender can’t send unless receiver is ready to receive Buffered channel: as long as there is space in buffer, the sender can send, and the receiver will receive later. results := make(chan result, len(urls)) // buffer to avoid leaking, solves bug. So now the goroutines are not gonna tick!

Values Context values should be data specific to a req, like

a trace ID or start time(for latency calculation)
security or authorization data

AVOID using the context to carry “optional” parameters.

Use package specific, private context key type (not string) to avoid collisions.

26 - Channels in Detail

Channel state

Channels block unless ready to read or write.

A channel is ready to write if:

it has a buffer space, or
at least one reader is ready to read (rendezvous)

A channel is ready to read if:

it has unread data in its buffer, or
at least one writer is ready to write (rendezvous), or
it is closed.

Channels are unidirectional, but have two ends (which can be passed separately as params)

and end for writing and closing

func get(url string, ch chan<- result) {} // write-only end

an end for reading

func collect(ch <-chan result) map[string]int // read-only end

We are constraining it here by only providing a read-end or write-end. Useful, makes it clear exactly what the channel will be doing in func.

Closed chans

Channel reading is a bit like reading from map. can read from a nil map, it will return default value. When reading, have access to second var ok

func main() {
  ch := make(chan int, 1)

  ch <- 1

  b, ok := <-chan // 1 true
  close(ch)
  c, ok := <-chan // 0 false
}

if we get rid of the buffer when making channel, go ch := make(chan int) and run, we get a crash - all goroutines are asleep - deadlock! Deadlock - none of the goroutines can make any progress cause they’re all waiting for something. Go has a built in deadlock detector.

In a concurrent program, we worried about race cons. So we use sync tools like channels to prevent race cons, but those can cause problems themselves.

A channel can only be closed once (else it will panic)

One of the main issues with working with goroutines is ending them.

An unbuffered channel reqs a reader and writer. (a writer blocked on a channel with no reader will “leak”)
Closing a channel is often a signal that work is done.
Only one goroutine can close a channel
We need some way to coordinate closing a chan or stopping goroutines (beyond the channel itself)

Nil channels

Reading or writing a channel that is nil always blocks(*) But a nil channel in a select block is ignored

This can be useful

Use a chan to get input
Suspend in by changing the channel var to nil
Can unsuspend in again
But close the chan if there really is no more input (EOF)

Use only when needed, can cause some issues and not always super clear.

State	Receive	Send	Close
Nil	Block*	Block*	Panic
Empty	Block	Write	Close
Partly Full	Read	Write	Readable until empty
Full	Read	Block	Readable until empty
Closed	Default value**	Panic	Panic
Receive-only	OK	Compile error	Compile error
Send-only	Compile Error	OK	OK

* Select ignores a nil channel since it would alwas block ** Reading a closed channel return (, !ok)

Unbuffered channels (rendezvous model) (default)

https://i.imgur.com/rEMwX8i.png

The sender blocks until the receiver is ready (and vice versa)
The send always happens before the receive
The receive always returns before the send
The sender and receiver are synchronized

Analogy of delivering package - delivery driver waits for u to sign package. Buffered is like a mailbox, smb puts letters in mailbox for u to pick up.

Sender and receiver come together, do sth to exchange data, and separate. Whoever comes first, has to wait for the other one.

SEND DOES NOT HAPPEN AFTER RECEIVE.

What happens is

Sender starts to send
Receiver starts to receive
Receiver finishes receiving
Send finishes.

So the sender knows when the send is done that the receiver has received!

Even if receiver starts first, the sender will always end last. Receiver returns -> sender returns, so that sender knows the receive has happened.

Buffered channels

https://imgur.com/lyKNnyi

The sender deposits its item and returns immediately.
The sender blocks only if the buffer is full.
The receiver blocks only if the buffer is empty
The sender and receiver run independently

Buffering

Allows the sender to send without waiting

func main() {
  // make a chan with buffer that holds 2 items
  messages := make(chan string, 2)

  // now we can send twice without getting blocked
  messages <- "buffered"
  messages <- "channel"

  // and then receive them both as usual
  fmt.Println(<-messages)
  fmt.Println(<-messages)
}

With a size 1 (or no buffer at alll), it will deadlock on send!!

type T struct {
	i byte
	b bool
}

func send(i int, ch chan<- *T) {
	t := &T{i: byte(i)}
	ch <- t

	// RACE CON
	// once u give a var to channel, you renounce ownership of it.
	t.b = true // UNSAFE AT ANY SPEED
}

func main() {
	vs := make([]T, 5)
	// unbuffered chan, rendezvous behaviour
	ch := make(chan *T)

	for i := range vs {
		go send(i, ch)
	}

	time.Sleep(1 * time.Second) // all goroutines guaranteed to have started

	// copy quickly
	for i := range vs {
		// read chan which has pointer, immediately dereference pointer and copy.
		vs[i] = *<-ch
	}

	// print later
	for _, v := range vs {
		fmt.Println(v)
	}
}

Here, the results print all falses. Why?

We’re using unbuffered chan, so we are doing blocking sends. (rendezvous)
We create T obj in send(), and sending on channel.
Receive finishes before send finishes.
So when we receive, we receive and copy probably before the sender has a chance to return from its send and modify t.b

*If we change this to be a buffered channel make(chan *T, 5), we get Trues* Now sends are non blocking. We read the values that have already been modified!

Why buffer?

avoid goroutine leaks (from abandoned channel)
avoid rendezvous pauses (performance improvement)

Don’t buffer until it’s needed. Buffering may hide a race condition

Special use: Counting semaphore pattern. It limits work in progress (or occupancy)

Once it’s full, only one unit of work can enter for each one that leaves

We model this with a buffered channel:

attempt to send (write) before starting work
the send will block if the buffer is full (occupancy is at max)
receive (read) when the work is done to free up a space in the buffer (this allows next worker to start)

27 - Concurrent File Processing

https://youtu.be/SPD7TykYy5w?list=PLoILbKo9rG3skRCj37Kn5Zj803hhiuRK6 Turning sequential program into concurrent.

Problem:

Finding duplicate files based on their content Use a secure hash, because the names / dates may differ.

Using MapReduce paradigm Bulk of work is reading the files and calculating the hashes.

Approach concurrent (like map-reduce)

Use a fixed worker pool of goroutines and a collector and channels

graph LR;
    TreeWalk-->|paths|Workers;
    Workers-->|pairs|Collector;
    Workers-->TreeWalk;
    Collector-->|results|TreeWalk;

Fixed num of workers
Feed these workers a channel of paths.
Tree walk writes to chan, workers read the chan.
When worker done, they feed pairs on channel to Collector.
Collector will manage the final hash table. We can’t write to it in parallel! So we put the hash table with one writer(Collector) which will send data back on results channel.

How do we know we’re done? Close the paths channel (that Workers are reading). Back to idea “last person out turn off light” - no practical way in this model for workers to coordinate who the last one is. So it’s left up to main program.

Main program will start workers, feed them, when done feeding it will close the paths channel and wait for workers to finish. Workers will signal they’re done. Then main will close pairs channel, collector will finish and we’ll get results.

2 Adding a goroutine for each directory using sync.WaitGroup

3 Use goroutine for every dir and file hash

What can go wrong - we’ll run out of threads. GOMAXPROCS doesnt’t limit threads blocked on syscalls (all our disk I/O) We need to limit the num of active goroutines instead (the ones making syscalls)

We will use Counting semaphores

A goroutine can’t proceed without sending on a channel
A ch with buff size N can accept N sends without blocking (with no intervening reads)
The buf provides a fixed upper bound (unlike WaitGroup)
One goroutine can start for each one that finishes (each reads from chan when done)

https://imgur.com/a/gAif3DS

package main

import (
	"crypto/md5"
	"fmt"
	"io"
	"log"
	"os"
	"path/filepath"
	"runtime"
	"sync"
)

type pair struct {
	hash string
	path string
}
type fileList []string
type results map[string]fileList

func hashFile(path string) pair {
	file, err := os.Open(path)

	if err != nil {
		log.Fatal(err)
	}

	defer file.Close()

	hash := md5.New()

	if _, err := io.Copy(hash, file); err != nil {
		log.Fatal(err)
	}

	return pair{fmt.Sprintf("%x", hash.Sum(nil)), path}
}

func processFile(path string, pairs chan<- pair, wg *sync.WaitGroup, limits chan bool) {
	defer wg.Done()

	limits <- true

	defer func() {
		<-limits
	}()

	pairs <- hashFile(path)
}

func collectHashes(pairs <-chan pair, result chan<- results) {
	hashes := make(results)

	for p := range pairs {
		hashes[p.hash] = append(hashes[p.hash], p.path)
	}

	result <- hashes
}

func searchTree(dir string, pairs chan<- pair, wg *sync.WaitGroup, limits chan bool) error {
	defer wg.Done()

	visit := func(path string, info os.FileInfo, err error) error {
		if err != nil && err != os.ErrNotExist {
			return err
		}

		if info.Mode().IsDir() && path != dir {
			wg.Add(1)
			go searchTree(path, pairs, wg, limits)
			return filepath.SkipDir
		}

		if info.Mode().IsRegular() && info.Size() > 0 {
			wg.Add(1)
			go processFile(path, pairs, wg, limits)
		}

		return nil
	}

	limits <- true

	defer func() {
		<-limits
	}()

	return filepath.Walk(dir, visit)
}

func run(dir string) results {
	workers := 2 * runtime.GOMAXPROCS(0)
	limits := make(chan bool, workers)
	pairs := make(chan pair)
	result := make(chan results)
	wg := new(sync.WaitGroup)

	// need goroutine so we don't block
	go collectHashes(pairs, result)

	// multithread walk of dir tree; we need a waitGroup cause we don't know
	// how many to wait for
	wg.Add(1)

	err := searchTree(dir, pairs, wg, limits)

	if err != nil {
		log.Fatal(err)
	}

	// close the paths channel so the workers stop
	wg.Wait()

	close(pairs)

	return <-result

}

func main() {
	if len(os.Args) < 2 {
		log.Fatal("provide dir name")
	}

	if hashes := run(os.Args[1]); hashes != nil {
		for hash, files := range hashes {
			if len(files) > 1 {
				fmt.Println(hash[len(hash)-7:], len(files))

				for _, file := range files {
					fmt.Println(" ", file)
				}

			}
		}
	}
}

This is probably the best approach to this problem. Check code at walks/

Check out Andahl’s Law.

Conclusions We don’t need to limit goroutines. We need to limit the contention for shared resources (Disk/Network)

28 - Conventional Synchronization

These are the things Go’s CSP model is built on. Mutex, Once, Pool, RWMutex, WaitGroup

Mutual exclusion

What if multiple goroutines must read & write some data?

We must make sure only one of them can do so at any instant (“critical section”)

We accomplist this with some kind of lock

acquire the lock before accessing the data
any other goroutine will block waiting to get the lock
release the lock when done

func do() int {
	var n int64
	var w sync.WaitGroup

	for i := 0; i < 1000; i++ {
		w.Add(1)

		go func() {
			n++ // DATA RACE
			w.Done()
		}()
	}

	w.Wait()
	return int(n)
}

This is some code for this problem - we will almost never get n to 1000.

We can solve it with a counting semaphore of size 1.

m := make(chan bool, 1)
// ....
go func() {
	m <- true
	n++
	<-m
	w.Done()
}()

This is behaving exactly as a lock / mutex. We can do this in go.

var m sync.Mutex
// ....
go func() {
	m.Lock()
	n++
	m.Unlock()
	w.Done()
}()

Mutex is better designed for this purpose.

type SafeMap struct { // original map in go not goroutine safe.
  sync.Mutex  // not safe to copy
  m map[string]int
}

// so methods must take a pointer not a value
func (s *SafeMap) Incr(key string) {
  s.Lock()
  defer s.Unlock()

  // only one goroutine can exec this code at the same time, guaranteed
  s.m[key]++
}

Sometimes we need to read more than write, so use sync.RWMutex. Multiple readers are allowed.

For best perf, use atomic pkg, which is primitive and works more on hardware lvl.

Only once execution Can ensure a fn runs only once

var once sync.Once
var x	*singleton

func init() {
  x = NewSingleton()
}

func handle(w http.ResponseWriter, r *http.Request) {
  once.Do(init)
  // ...
}
// Can't check if x == nil - UNSAFE

Pool Provides safe & efficient reuse of objects, but it’s a container of interface{}

var bufPool = sync.Pool{
  New: func() interface{} {
    return new(bytes.Buffer)
  },
}

// ...
b := bufPool.Get().(*bytes.Buffer) // reflection, need to downcast
b.Reset()
// write to it
w.Write(b.Bytes())
bufPool.Put(b)

29 - Homework #5 (h/w #4 part deux)

Revisit server hw4, add in

type database struct {
	mu sync.Mutex
	db map[string]dollars
}

// ....
func (d *database) list(w http.ResponseWriter, req *http.Request) {
	// make it concurrency safe
	d.mu.Lock()
	defer d.mu.Unlock()
	// ....

Can run server with -trace flag.

Need to pass in pointer to db struct to have pointer to mutex, cant copy mutexes.

30 - Concurrency Gotchas

Fundamental problem with concurrency are:

Race conditions, where unprotected read and writes overlap.

must be some data that is written to
could be a read-modify-write operation
and two goroutines can do it at the same time.

Deadlock, where no goroutine can make progress

goroutines could all be blocked on empty channels
goroutines could all be blocked waiting on a mutex.
GC could be prevented from running (busy loop)

Go detects some deadlocks automatically With -race it can find some data races.

Goroutine leaks

goroutine hangs on an empty or blocked channel
not deadlock - other goroutines make progress
often found by looking at pprof output

Most common memory leak problem is hanging goroutines / hanging sockets.

When you start a goroutine, always know how/when it will end!

Channel errors

trying to send on a closed channel
trying to send or receive on a nil channel
closing a nil channel
closing a channel twice.

Other

closure miscapture
misuse of Mutex
misuse of WaitGroup
misuse of select

ch := make(chan bool)
go func(ok bool) {
	fmt.Println("START")

	if ok {
		ch <- ok
	}
}(true)
<-ch
fmt.Println("DONE")

Here we deadlock as we are reading from ch, but nothing is sent on it, and it’s not buffered so it expects rendezvous behaviour.

In this example, a timeout leaves the goroutine hanging forever. The correct solution is to make a buffered channel

func finishReq(timeout time.Duration) *obj {
  ch := make(chan obj)

  go func() {
    // ...	// work that takes too long
    ch <- fn()  // blocking send
  }
}()

select {
case rslt := <-ch:
  return rslt
case <-time.After(timeout):
  return nil
}

Wait Group Always, always, always all Add befor go or Wait

Closure Capture A goroutine shouldn’t capture a mutating variable.

for i := i; i < 10; i++ {     // WRONG
  go func() {
    fmt.Println(i)
  1}()
}

// Instead, pass the variable's value as a param!!!
for i := i; i < 10; i++ {
  // or do i := i
  go func(i int) {
    fmt.Println(i)
  }(i)
}

Select

default is always active
a nil channel is always ignored
a full channel (for send) is skipped ovre
a “done” channel, is just another channel
available channels are selected at random

Conclusions

Don’t start a goroutine without knowing how it will stop
Acquire locks/semaphores as late as possible; release in reverse order (defer does that already)
Don’t wait for non-parallel work that you could do yourself.
Simplify, Review, Test

31 - Odds & Ends

There are no enumerated types in Go You can make an almost-enum type using a named type and constant.

Can use … to denote variadic params and also unpack slices and such

Not gonna use this prob but here’s vid https://youtu.be/oTtYtrFv3gw

Go has goto which probably shouldnt be used much.

32 - Error Handling

https://youtu.be/oIxXp0OgK_0

Most of the time, errors are just strings.

Errors in Go are objects satisfying the error interface.

Any concrete type with Error() can represent a string.

Errors.Is operates on an error var Errors.As operates on an error type (downcasting)

Normal error

result from input or external conditions (e.g file not found)

Go code handles this case by returning the error type.

Abnormal errors

Result from invalid program logic (e.g nil pointer) For program logic errors, Go code does a panic

func (d *digest) checkSum() [Size]byte {
  // finish writing checksum
  ...
  if d.nx != 0 { // panic if theres data leftover
    panic("d.nx != 0")
  }
}

This is a fault of logic that’s doing the processing here.

When your program has a logic bug, FAIL HARD, FAIL FAST

Why?

If your server crashes, it will get immediate attention.

logs are often noisy
so proactive log searches for “problems” are rare.

We want evidence of the failure as close as possible in time and space to the original defect in the code

connect the crash to logs that explain the context
traceback from the point closest to the broken logic.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! In a distributed system, crash failures are the safest type to handle.

It’s better to die than to be a zombie or babble or corrupt the DB.
not crashing may lead to Byzantine failures

It’s the best to just crash.

When should we panic??

Only when the error was caused by our own programming defect, e.g

we can’t walk a data structure we built
we have an off by one bug encoding bytes etc

PANIC SHOULD ONLY BE USED WHEN OUR ASSUMPTIONS OF OUR OWN PROGRAMMING DESIGN OR LOGIC ARE WRONG.

These cases might use an “assert” in other programming langs.

Define errors out of existence

Error (edge) cases are one of the primary sources of complexity.

The best way to deal with many errors is to make them impossible

Design abstractions so that most (or all) operations are safe:

reading from a nil map
appending to a nil slice
deleting a non-existant item from a map
taking the len of an unitialized string

Try to reduce edge cases that are hard to test or debug (or even think about)

Proactively prevent problems

Every piece of data in program should start life in a valid state.

Every transformation should leave it in a valid state.

break large programs into small pieces you can understand.
hide info to reduce the chance of corruption
avoid clever code and side effects
avoid unsafe operations
assert your invariants
never ignore errors
test, test, test
never ever accept input from user (or environment) without validation

In Go, we handle errors right in place. This is good, as it gives visibility - it might be verbose, but it’s explicit.

33 - Reflection

https://youtu.be/T2fqLam1iuk?list=PLoILbKo9rG3skRCj37Kn5Zj803hhiuRK6 - for json type matching

It’s the notion of a program looking at itself.

To extract a specific type, we can use a type assertion (downcasting)

var w io.Writer = os.Stdout
f, ok := w.(*os.File) // success
b, ok := w.(*bytes.Buffer) // failure

We can also switch on a type (as opposed to value.)

func Println(args ...interface{}) {
  buf := make([]byte, 0, 80)

  for arg := range args {
    switch a := arg.(type) {
    case string:
      buf = append(buf, a...)
    case Stringer:
      buf = append(buf, a.String()...)
    }
  }
}

34 - Mechanical Sympathy

https://youtu.be/7QLoOd9HinY?list=PLoILbKo9rG3skRCj37Kn5Zj803hhiuRK6

Go tends to be used in cloud. When we build in cloud, we make some concessions on perf. Deliberate choice to accept some overhead

We have to tradeoff perf against:

choice of architecture
quality, reliability, scalability
const of dev & ownership

Need to optimise where we can given those choices. We still want simplicity, readability, maintainability of code.

Optimization

Top-down refinement

Architecture: latency, const of comms
Design: algos, concurrency, layers
Implementation: programming lang, memory use

Mechanical sympathy plays role in implementation.

Interpreted langs may cost 10x more to operate due to their inefficiency.

around 2005 processors stopped getting much faster.

consider gap between CPU and Memore (DRAM) perf. CPU gotten faster, but memory not as much

Realities

CPUs not getting faster anymore
The gap between memory and CPU is not shrinking
software gets slower more quickly that CPU gets faster.

Also software got much more bloated. Eating up CPU and capacity. (like interpreted langs)

Two competing realities

maintain or improve perf of software
control the cost of dev

The only way to do that is:

make software simpler
make software that works with the machine, not against it (suck less.)

Memory caching

“computational cost” is often dominated by memory access cost

Caching takes advantage of access patterns to keep freq used code and data “close” to the CPU to reduce access time.

To get best perf, we want to

keep things in contiguous memory
access them sequentially

Access Patterns

A little bit of copying is better than lots of pointer chasing!!!

A slice of objects beats a list with pointers!

A struct with contiguous fields beats a class with pointers. (cost of pointer chasing)

Calling lots of short methods via dynamic dispatch is very expensive. The cost of calling a fn should be directly proportional to the work it does

Bad design to have method call that goes to other obj to do method call that goes to another obj etc etc, “passing the buck”. Too many layers of abstraction.

We take a little piece of work and magnify its cost by all these dynamically dispatched method calls.

Avoid short method calls. Harder to read if little methods spread all over. Also massive perf issues.

Sync cost - False sharing - cores fight over a cache line for different vars.

Only other big cost we can control is GC.

reduce unnecessary allocations
reduce embedded pointers in objs
paradoxically, may want larger heap

Go gives you lots of choices. How to Optimize?

Can choose

to allocate contiguously
to copy or not copy
to allocate on the stack or heap (sometimes)
to be synchronous or async
to avoid unnecessary abstraction layers
to avoid short/forwarding methods

Go doesn’t get between you and the machine

Good code in Go doesn’t hide the costs involved We want to make logic and cost both explicit and clear!

There are only 3 optimizations:

Do less
Do it less often
Do it faster

“The largest gains come from 1, but we spend all our time on 3.”

35 - Benchmarking

https://youtu.be/nk4rALKLQkc?list=PLoILbKo9rG3skRCj37Kn5Zj803hhiuRK6

List vs slice False sharing

Few things to consider

Is the data/code available in cache?
did u hit a garbage collection
did virtual memory have to page in/out
did branch prediction work the same way
did the compiler remove code via optimization
are u running in parallel? how many cores?
are those cores virtual
are u sharing a core with anything else
what other processes are sharing the machine

36 - Profiling

Good way to test is using pprof, running some traffic on server, checking if there are more goroutines that expected hanging about (leak)

Prometheus metrics

37 - Static Analysis

Aka linting. Tools that inspect code and help see issues before running program.

Some other tools: goconst gosec ineffasign gocyclo deadcode, unused, varcheck unconvert gosimple

One tool to rule them all golangci-lint

38 - Testing

https://youtu.be/PIPfNIWVbc8?list=PLoILbKo9rG3skRCj37Kn5Zj803hhiuRK6

Testing culture “Tests are the contract about what your software does and does not do. Unit tests at the pkg level should lock in the behaviour of the pkgs API. They describe in code what the pkg promises to do. If there is a unit test for each input permutation, you have defined the contract for what the code will do in code, not documentation”

“This is a contract u can assert as simply as typing go test. At any stage, you can know with a high degree of confidence that the behaviour people relied on before your changes continues to function after your change.”

Should make assumption your code does not work, unless

You have tests (unit, integration, etc)
They work correctly
You run them
They pass

Work is not done until we’ve added or updated the tests. Basic code hygiene. Start clean, stay clean.

“The hardest bugs are those where your mental model of the situation is just wrong, so u can’t see the problem at all.”

Developers test to show that things are working and done according to their understanding of the problem & solution.

Most difficulties are failures of imagination.

There are 8 levels of correctness in order of increasing difficulty of achievement.(Gries & Conway)

it compiles (and passes static analysis)
it has no bugs that can be found just running the program
it works for some hand picked data
it works for typical reasonable input
it works with test data chosen to be difficult
it works for all input that follows the spec
it works for all valid input and likely error cases
it works for all input.

“it works” means it produces the desired behaviour or fails safely.

There are 4 different types of errors. (Gries & Conway)

errs in understanding the problem reqs.
errs in understanding the programming lang.
errs in understanding the underlying algos.
errs where u knew better but simply slipped up

Should aim form 75 - 85% code coverage

unit tests
integration tests
post deployment sanity checks.

Devs must be responsible for the quality of their code.

Reality check: Pick any two

good
fast
cheap

You cant have all three in the real world.

39 - Code Coverage

https://youtu.be/HfCsfuVqpcM?list=PLoILbKo9rG3skRCj37Kn5Zj803hhiuRK6

go test -cover Can use -coverprofile to gen file with coverage counts Can see visually with go tool conver -html=coverage.out Can use -covermode=count to see as heatmap.

40 - Go Modules

Go module support solves several problems

avoid the need for $GOPATH
group packages versioned/released together
provide semantic versioning & backwards compatibility
provide in project dependency management
offer strong dependency security & availability
continue support of vendoring
work transparently across the Go ecosystem

41 - Building Go Programs

https://youtu.be/rXgUP_BNyaI?list=PLoILbKo9rG3skRCj37Kn5Zj803hhiuRK6

Can build a “pure” go program

A “pure” program can be put into a “from-scratch” container. All you need is a go binary.

It not being dynamic is a good thing.

Go can cross-compile.

Project layout

Root
README
Makefile
(build/ -> Dockerfile)
cmd/ -> programs
(deploy/ -> K8s files)
go.mod
go.sum
pkg/ -> libraries
(scripts/)
test/integration tests (most test files go in same place tested file is)
(vendor/ -> modules)

Documentation - README.MD should be full of info.

“Just read the source code” is not enough.

overview - who and what is it for
dev setup
project & dir tree structure
dependency management
how to build/install (make targets etc)
how to test it (UTs, integration, e2e, load, etc)
how to run it (locally, in docker)
database & schema
credentials & security
debugging monitoring (metrics, logs)
CLI and usage

Reasons we might need a Makefile

we need to calculate params
we have other steps/dependencies
options are way too long to type
may have non-Go commands (Docker, cloud provider etc.)

Versioning the executable Keep track of a version string. (setting compile-time vars for versioning)

Docker We can use Docker to build as well as to run

multi stage builds
use a golang image to build
copy the results to another img

Result is a small docker container built for linux. Can run it without Go install Great for CI/CD envs.

42 - Parametric Polymorphism

###Generics in Go

Use type params to replace dynamic typing with static typing interface{} + v.(T) -> type MyType[T any] struct{...}

Continue to use (non-empty) interfaces wherever possible Performance should not be main reason for using generics.

Don’t be clever with it.

type Vector[T any] []T

func (v *Vector[T]) Push(x T) {
	*v = append(*v, x)
}

func Map[F, T any](s []F, f func(F) T) []T {
	r := make([]T, len(s))

	for i, v := range s {
		r[i] = f(v)
	}

	return r
}

func main() {
	s := Vector[int]{}

	s.Push(1)
	s.Push(2)

	t1 := Map(s, strconv.Itoa)
	t2 := Map([]int{1, 2, 3}, strconv.Itoa)

	fmt.Printf("t1: %#v\n", t1)
	fmt.Printf("t2: %#v\n", t2)
}

Writing a map isn’t very “Go-ish”. Just use a plain old for loop.

43 - Parting Thoughts

Clear is better than clever
A little copying is better than a little dependency
Concurrency it not parallelism
Channels orchestrate; mutexes serialise
Don’t communicate by sharing memory, share memory by communicating
Make the zero value useful
The bigger the interface, the weaker the abstraction
Errors are values
Don’t just check errors, handle them gracefully

“Built your software so it’s obviously correct.”

Good presentations

Simple made easy (Rich Hickey)
Small is beautiful (Kevlin Henney)
Software that fits your head (Dan North)
Worse is better, for better or worse (Kevlin Henney)
Solid snakes, or how to take 5 weeks of vacation

“Never stop learning”

Go notes

03 - Basic Types

04 - Strings

05 - Arrays, Slices, and Maps

06 - Control Statements; Declarations & Types

07 - Formatted & File I/O

08 - Functions, Parameters & Defer

09 - Closures

10 - Slices in Detail

11 - Homework #2

12 - Structs, Struct tags & JSON

13 - Regular Expressions & Search

14 - Reference & Value Semantics

15 - Networking with HTTP

16 - Homework #3

17 - Go does OOP

18 - Methods and Interfaces

19 - Composition

20 - Interfaces & Methods in Detail

21 - Homework #4

22 - What is Concurrency?

23 - CSP, Goroutines, and Channels

24 - Select

25 - Context

26 - Channels in Detail

27 - Concurrent File Processing

28 - Conventional Synchronization

29 - Homework #5 (h/w #4 part deux)

30 - Concurrency Gotchas

31 - Odds & Ends

32 - Error Handling

33 - Reflection

34 - Mechanical Sympathy

35 - Benchmarking

36 - Profiling

37 - Static Analysis

38 - Testing

39 - Code Coverage

40 - Go Modules

41 - Building Go Programs

42 - Parametric Polymorphism

43 - Parting Thoughts

03 - Basic Types

Special types

inits

constants

sample

04 - Strings

05 - Arrays, Slices, and Maps

Arrays

Slice

off by one

Map

06 - Control Statements; Declarations & Types

07 - Formatted & File I/O

08 - Functions, Parameters & Defer

Defer

09 - Closures

10 - Slices in Detail

11 - Homework #2

12 - Structs, Struct tags & JSON

13 - Regular Expressions & Search

14 - Reference & Value Semantics

15 - Networking with HTTP

16 - Homework #3

17 - Go does OOP

18 - Methods and Interfaces

19 - Composition

20 - Interfaces & Methods in Detail

21 - Homework #4

22 - What is Concurrency?

Some definitions of concurrency

Parallelism

Race Condition

23 - CSP, Goroutines, and Channels

24 - Select

25 - Context

26 - Channels in Detail

Channel state

Closed chans