diff options
Diffstat (limited to 'docs/lists.md')
| -rw-r--r-- | docs/lists.md | 909 |
1 files changed, 909 insertions, 0 deletions
diff --git a/docs/lists.md b/docs/lists.md new file mode 100644 index 00000000..e605eea1 --- /dev/null +++ b/docs/lists.md @@ -0,0 +1,909 @@ +# Lists + +Tomo supports lists as a container type that holds a list of elements of any +type in a compact format similar to a C-style array. Lists are immutable by +default, but use copy-on-write semantics to efficiently mutate in place when +possible. **Lists are 1-indexed**, which means the first item in the list has +index `1`. + +## Syntax + +Lists are written using square brackets and a list of comma-separated elements: + +```tomo +nums := [10, 20, 30] +``` + +Each element must have the same type (or be easily promoted to the same type). If +you want to have an empty list, you must specify what type goes inside the list +like this: + +```tomo +empty : [Int] = [] +``` + +For type annotations, a list that holds items with type `T` is written as `[T]`. + +### List Comprehensions + +Lists can also use comprehensions, where you specify how to dynamically create +all the elements by iteration instead of manually specifying each: + +```tomo +>> [i*10 for i in (3).to(8)] += [30, 40, 50, 60, 70, 80] +>> [i*10 for i in (3).to(8) if i != 4] += [30, 50, 60, 70, 80] +``` + +Comprehensions can be combined with regular items or other comprehensions: + +```tomo +>> [-1, i*10 for i in (3).to(8), i for i in 3] += [-1, 30, 40, 50, 60, 70, 80, 1, 2, 3] +``` + +## Length + +List length can be accessed by the `.length` field: + +```tomo +>> [10, 20, 30].length += 3 +``` + +## Indexing + +List values are accessed using square bracket indexing. Since lists are +1-indexed, the index `1` corresponds to the first item in the list. Negative +indices are used to refer to items from the back of the list, so `-1` is the +last item, `-2` is the second-to-last, and so on. + +```tomo +list := [10, 20, 30, 40] +>> list[1] += 10 + +>> list[2] += 20 + +>> list[-1] += 40 + +>> list[-2] += 30 +``` + +If a list index of `0` or any value larger than the length of the list is +used, it will trigger a runtime error that will print what the invalid list +index was, the length of the list, and a stack trace. As a performance +operation, if list bounds checking proves to be a performance hot spot, you +can explicitly disable bounds checking by adding `list[i; unchecked]` to the +list access. + +## Iteration + +You can iterate over the items in a list like this: + +```tomo +for item in list: + ... + +for i, item in list: + ... +``` + +List iteration operates over the value of the list when the loop began, so +modifying the list during iteration is safe and will not result in the loop +iterating over any of the new values. + +## Concatenation + +Lists can be concatenated with the `++` operator, which returns a list that +has the items from one appended to the other. This should not be confused with +the addition operator `+`, which does not work with lists. + +```tomo +>> [1, 2] ++ [3, 4] += [1, 2, 3, 4] +``` + +## Implementation Details + +Under the hood, lists are implemented as a struct that contains a pointer to a +contiguous chunk of memory storing the elements of the list and some other +metadata. Since Tomo has datatypes with different sizes, like `Bool`s which +take one byte and `struct`s which can take up many bytes, it's worth noting +that lists store the elements compactly and inline, without the need for each +list cell to hold a pointer to where the data actually lives. + +The other metadata stored with a list includes its length as well as the +_stride_ of the list. The stride is not exposed to the user, but it's the gap +in bytes between each element in the list. The reason this is mentioned is +that it is possible to create immutable slices of lists in constant time by +creating a new struct that points to the appropriate starting place for the +list items and has the appropriate stride. The upshot is that a method like +`list.reversed()` does not actually copy the list, it simply returns a struct +that points to the back of the list with a negative stride. Lists adhere to +copy-on-write semantics, so we can cheaply create many read-only references to +the same data, and only need to do copying if we plan to modify data. After +doing a modification, future modifications can be done in-place as long as +there is only one reference to that data. + +Internally, we also take advantage of this inside of tables, which compactly +store all of the key/value pairs in a contiguous list and we can return an +immutable slice of that list showing only the keys or only the values by +choosing the right starting point and stride. + +## Copy on Write + +Lists can be thought of as values that have copy-on-write semantics that use +reference counting to perform efficient in-place mutations instead of copying +as a performance optimization when it wouldn't affect the program's semantics. +Without getting too deep into the details, suffice it to say that when you +create a list, that list can be thought of as a singular "value" in the same +way that `123` is a value. That variable's value will never change unless you +explicitly perform an assignment operation on the variable or call a method on +the variable. + +Because it would be tedious to require users to write all list operations as +pure functions like `list = list.with_value_at_index(value=x, index=i)`, Tomo +provides the familiar imperative syntax for modifying lists, but keeps the +semantics of the pure functional style. Writing `list[i] = x` is +_semantically_ equivalent to `list = list.with_value_at_index(value=x, +index=i)`, but much more readable and easy to write. Similarly, +`list.insert(x)` is semantically equivalent to `list = +list.with_value_inserted(x)`. We implement these mutating methods as functions +that take a pointer to a list variable, which then either mutate the list's +data in-place (if this is the only thing referencing that data) or construct a +new list and store its value in the memory where the list variable is stored. + +When there is only a single reference to a list value, we can perform these +modifications in-place (lists typically have a little bit of spare capacity at +the end, so appending usually doesn't trigger a reallocation). When there are +shared references, we must create a copy of the list's data before modifying +it so the other references don't see the effects of the mutation. Here are some +simple examples: + +```tomo +nums := [10, 20, 30, 39] + +// Efficient in-place mutation because data references are not shared: +nums[4] = 40 + +// Constant time operation, but increments the reference count: +tmp := nums +>> tmp += [10, 20, 30, 40] + +// Now, a mutation will trigger a copy-on-write, +// which resets the reference count to zero: +nums[4] = 999 +>> nums += [10, 20, 30, 999] + +// Because of the copy-on-write, `tmp` is unchanged: +>> tmp += [10, 20, 30, 40] + +// Since the reference count has been reset, we can do more +// mutations without triggering another copy-on-write: +nums[4] = -1 +>> nums += [10, 20, 30, -1] +``` + +List reference counting is _approximate_, but will only ever err on the side +of correctness at the expense of performance, not the other way around. +Occasionally, unnecessary copying may occur, but you should never experience an +list value changing because of some operation performed on a different list +value. + +## List Pointers + +Since the normal case of lists is to treat them like immutable values, what do +we do if we actually want to have a shared reference to a list whose contents +change over time? In that case, we want to use the `@` operator to create a +pointer to a heap-allocated list and pass that pointer around. This is the same +behavior that you get in Python when you create a `list`: + +```tomo +nums := @[10, 20, 30] +tmp := nums + +nums.insert(40) +>> tmp += @[10, 20, 30, 40] +``` + +Having multiple pointers to the same heap-allocated list does not cause the +list's reference count to increase, because there is only one "value" in play: +the one stored on the heap. It's only when we store the "value" in multiple +places that we need to increment the reference count: + +```tomo +// Increment the reference count, because `value` now has to hold +// whatever data was at the pointer's location at this point in time: +value := nums[] +``` + +The TL;DR is: you can cheaply modify local variables that aren't aliased or +`@`-allocated lists, but if you assign a local variable list to another +variable or dereference a heap pointer, it may trigger copy-on-write behavior. + +## List Methods + +- [`func binary_search(list: [T], by: func(x,y:&T->Int32) = T.compare -> Int)`](#binary_search) +- [`func by(list: [T], step: Int -> [T])`](#by) +- [`func clear(list: @[T] -> Void)`](#clear) +- [`func counts(list: [T] -> {T=Int})`](#counts) +- [`func find(list: [T], target: T -> Int?)`](#find) +- [`func first(list: [T], predicate: func(item:&T -> Bool) -> Int)`](#first) +- [`func from(list: [T], first: Int -> [T])`](#from) +- [`func has(list: [T] -> Bool)`](#has) +- [`func heap_pop(list: @[T], by: func(x,y:&T->Int32) = T.compare -> T?)`](#heap_pop) +- [`func heap_push(list: @[T], item: T, by=T.compare -> Void)`](#heap_push) +- [`func heapify(list: @[T], by: func(x,y:&T->Int32) = T.compare -> Void)`](#heapify) +- [`func insert(list: @[T], item: T, at: Int = 0 -> Void)`](#insert) +- [`func insert_all(list: @[T], items: [T], at: Int = 0 -> Void)`](#insert_all) +- [`func pop(list: &[T], index: Int = -1 -> T?)`](#pop) +- [`func random(list: [T], random: func(min,max:Int64->Int64)? = none -> T)`](#random) +- [`func remove_at(list: @[T], at: Int = -1, count: Int = 1 -> Void)`](#remove_at) +- [`func remove_item(list: @[T], item: T, max_count: Int = -1 -> Void)`](#remove_item) +- [`func reversed(list: [T] -> [T])`](#reversed) +- [`func sample(list: [T], count: Int, weights: [Num]? = ![Num], random: func(->Num)? = none -> [T])`](#sample) +- [`func shuffle(list: @[T], random: func(min,max:Int64->Int64)? = none -> Void)`](#shuffle) +- [`func shuffled(list: [T], random: func(min,max:Int64->Int64)? = none -> [T])`](#shuffled) +- [`func slice(list: [T], from: Int, to: Int -> [T])`](#slice) +- [`func sort(list: @[T], by=T.compare -> Void)`](#sort) +- [`sorted(list: [T], by=T.compare -> [T])`](#sorted) +- [`to(list: [T], last: Int -> [T])`](#to) +- [`unique(list: [T] -> {T})`](#unique) + +### `binary_search` +Performs a binary search on a sorted list. + +```tomo +func binary_search(list: [T], by: func(x,y:&T->Int32) = T.compare -> Int) +``` + +- `list`: The sorted list to search. +- `by`: The comparison function used to determine order. If not specified, the + default comparison function for the item type will be used. + +**Returns:** +Assuming the input list is sorted according to the given comparison function, +return the index where the given item would be inserted to maintain the sorted +order. That is, if the item is found, return its index, otherwise return the +place where it would be found if it were inserted and the list were sorted. + +**Example:** +```tomo +>> [1, 3, 5, 7, 9].binary_search(5) += 3 + +>> [1, 3, 5, 7, 9].binary_search(-999) += 1 + +>> [1, 3, 5, 7, 9].binary_search(999) += 6 +``` + +--- + +### `by` +Creates a new list with elements spaced by the specified step value. + +```tomo +func by(list: [T], step: Int -> [T]) +``` + +- `list`: The original list. +- `step`: The step value for selecting elements. + +**Returns:** +A new list with every `step`-th element from the original list. + +**Example:** +```tomo +>> [1, 2, 3, 4, 5, 6].by(2) += [1, 3, 5] +``` + +--- + +### `clear` +Clears all elements from the list. + +```tomo +func clear(list: @[T] -> Void) +``` + +- `list`: The mutable reference to the list to be cleared. + +**Returns:** +Nothing. + +**Example:** +```tomo +>> my_list.clear() +``` + +--- + +### `counts` +Counts the occurrences of each element in the list. + +```tomo +func counts(list: [T] -> {T=Int}) +``` + +- `list`: The list to count elements in. + +**Returns:** +A table mapping each element to its count. + +**Example:** +```tomo +>> [10, 20, 30, 30, 30].counts() += {10=1, 20=1, 30=3} +``` + +--- + +### `find` +Finds the index of the first occurrence of an element (if any). + +```tomo +func find(list: [T], target: T -> Int?) +``` + +- `list`: The list to search through. +- `item`: The item to find in the list. + +**Returns:** +The index of the first occurrence or `!Int` if not found. + +**Example:** +```tomo +>> [10, 20, 30, 40, 50].find(20) += 2 : Int? + +>> [10, 20, 30, 40, 50].find(9999) += none : Int? +``` + +--- + +### `first` +Find the index of the first item that matches a predicate function (if any). + +```tomo +func first(list: [T], predicate: func(item:&T -> Bool) -> Int) +``` + +- `list`: The list to search through. +- `predicate`: A function that returns `yes` if the item should be returned or + `no` if it should not. + +**Returns:** +Returns the index of the first item where the predicate is true or `!Int` if no +item matches. + +**Example:** +```tomo +>> [4, 5, 6].find(func(i:&Int): i.is_prime()) += 5 : Int? +>> [4, 6, 8].find(func(i:&Int): i.is_prime()) += none : Int? +``` + +--- + +### `from` +Returns a slice of the list starting from a specified index. + +```tomo +func from(list: [T], first: Int -> [T]) +``` + +- `list`: The original list. +- `first`: The index to start from. + +**Returns:** +A new list starting from the specified index. + +**Example:** +```tomo +>> [10, 20, 30, 40, 50].from(3) += [30, 40, 50] +``` + +--- + +### `has` +Checks if the list has any elements. + +```tomo +func has(list: [T] -> Bool) +``` + +- `list`: The list to check. + +**Returns:** +`yes` if the list has elements, `no` otherwise. + +**Example:** +```tomo +>> [10, 20, 30].has(20) += yes +``` + +--- + +### `heap_pop` +Removes and returns the top element of a heap or `none` if the list is empty. +By default, this is the *minimum* value in the heap. + +```tomo +func heap_pop(list: @[T], by: func(x,y:&T->Int32) = T.compare -> T?) +``` + +- `list`: The mutable reference to the heap. +- `by`: The comparison function used to determine order. If not specified, the + default comparison function for the item type will be used. + +**Returns:** +The removed top element of the heap or `none` if the list is empty. + +**Example:** +```tomo +>> my_heap := [30, 10, 20] +>> my_heap.heapify() +>> my_heap.heap_pop() += 10 +``` + +--- + +### `heap_push` +Adds an element to the heap and maintains the heap property. By default, this +is a *minimum* heap. + +```tomo +func heap_push(list: @[T], item: T, by=T.compare -> Void) +``` + +- `list`: The mutable reference to the heap. +- `item`: The item to be added. +- `by`: The comparison function used to determine order. If not specified, the + default comparison function for the item type will be used. + +**Returns:** +Nothing. + +**Example:** +```tomo +>> my_heap.heap_push(10) +``` + +--- + +### `heapify` +Converts a list into a heap. + +```tomo +func heapify(list: @[T], by: func(x,y:&T->Int32) = T.compare -> Void) +``` + +- `list`: The mutable reference to the list to be heapified. +- `by`: The comparison function used to determine order. If not specified, the + default comparison function for the item type will be used. + +**Returns:** +Nothing. + +**Example:** +```tomo +>> my_heap := [30, 10, 20] +>> my_heap.heapify() +``` + +--- + +### `insert` +Inserts an element at a specified position in the list. + +```tomo +func insert(list: @[T], item: T, at: Int = 0 -> Void) +``` + +- `list`: The mutable reference to the list. +- `item`: The item to be inserted. +- `at`: The index at which to insert the item (default is `0`). Since indices + are 1-indexed and negative indices mean "starting from the back", an index of + `0` means "after the last item". + +**Returns:** +Nothing. + +**Example:** +```tomo +>> list := [10, 20] +>> list.insert(30) +>> list += [10, 20, 30] + +>> list.insert(999, at=2) +>> list += [10, 999, 20, 30] +``` + +--- + +### `insert_all` +Inserts a list of items at a specified position in the list. + +```tomo +func insert_all(list: @[T], items: [T], at: Int = 0 -> Void) +``` + +- `list`: The mutable reference to the list. +- `items`: The items to be inserted. +- `at`: The index at which to insert the item (default is `0`). Since indices + are 1-indexed and negative indices mean "starting from the back", an index of + `0` means "after the last item". + +**Returns:** +Nothing. + +**Example:** +```tomo +list := [10, 20] +list.insert_all([30, 40]) +>> list += [10, 20, 30, 40] + +list.insert_all([99, 100], at=2) +>> list += [10, 99, 100, 20, 30, 40] +``` + +--- + +### `pop` +Removes and returns an item from the list. If the given index is present in +the list, the item at that index will be removed and the list will become one +element shorter. + +```tomo +func pop(list: &[T], index: Int = -1 -> T?) +``` + +- `list`: The list to remove an item from. +- `index`: The index from which to remove the item (default: the last item). + +**Returns:** +`none` if the list is empty or the given index does not exist in the list, +otherwise the item at the given index. + +**Example:** +```tomo +>> list := [10, 20, 30, 40] + +>> list.pop() += 40 +>> list += &[10, 20, 30] + +>> list.pop(index=2) += 20 +>> list += &[10, 30] +``` + +--- + +### `random` +Selects a random element from the list. + +```tomo +func random(list: [T], random: func(min,max:Int64->Int64)? = none -> T) +``` + +- `list`: The list from which to select a random element. +- `random`: If provided, this function will be used to get a random index in the list. Returned + values must be between `min` and `max` (inclusive). (Used for deterministic pseudorandom number + generation) + +**Returns:** +A random element from the list. + +**Example:** +```tomo +>> [10, 20, 30].random() += 20 +``` + +--- + +### `remove_at` +Removes elements from the list starting at a specified index. + +```tomo +func remove_at(list: @[T], at: Int = -1, count: Int = 1 -> Void) +``` + +- `list`: The mutable reference to the list. +- `at`: The index at which to start removing elements (default is `-1`, which means the end of the list). +- `count`: The number of elements to remove (default is `1`). + +**Returns:** +Nothing. + +**Example:** +```tomo +list := [10, 20, 30, 40, 50] +list.remove_at(2) +>> list += [10, 30, 40, 50] + +list.remove_at(2, count=2) +>> list += [10, 50] +``` + +--- + +### `remove_item` +Removes all occurrences of a specified item from the list. + +```tomo +func remove_item(list: @[T], item: T, max_count: Int = -1 -> Void) +``` + +- `list`: The mutable reference to the list. +- `item`: The item to be removed. +- `max_count`: The maximum number of occurrences to remove (default is `-1`, meaning all occurrences). + +**Returns:** +Nothing. + +**Example:** +```tomo +list := [10, 20, 10, 20, 30] +list.remove_item(10) +>> list += [20, 20, 30] + +list.remove_item(20, max_count=1) +>> list += [20, 30] +``` + +--- + +### `reversed` +Returns a reversed slice of the list. + +```tomo +func reversed(list: [T] -> [T]) +``` + +- `list`: The list to be reversed. + +**Returns:** +A slice of the list with elements in reverse order. + +**Example:** +```tomo +>> [10, 20, 30].reversed() += [30, 20, 10] +``` + +--- + +### `sample` +Selects a sample of elements from the list, optionally with weighted +probabilities. + +```tomo +func sample(list: [T], count: Int, weights: [Num]? = ![Num], random: func(->Num)? = none -> [T]) +``` + +- `list`: The list to sample from. +- `count`: The number of elements to sample. +- `weights`: The probability weights for each element in the list. These + values do not need to add up to any particular number, they are relative + weights. If no weights are given, elements will be sampled with uniform + probability. +- `random`: If provided, this function will be used to get random values for + sampling the list. The provided function should return random numbers + between `0.0` (inclusive) and `1.0` (exclusive). (Used for deterministic + pseudorandom number generation) + +**Errors:** +Errors will be raised if any of the following conditions occurs: +- The given list has no elements and `count >= 1` +- `count < 0` (negative count) +- The number of weights provided doesn't match the length of the list. +- Any weight in the weights list is negative, infinite, or `NaN` +- The sum of the given weights is zero (zero probability for every element). + +**Returns:** +A list of sampled elements from the list. + +**Example:** +```tomo +>> [10, 20, 30].sample(2, weights=[90%, 5%, 5%]) += [10, 10] +``` + +--- + +### `shuffle` +Shuffles the elements of the list in place. + +```tomo +func shuffle(list: @[T], random: func(min,max:Int64->Int64)? = none -> Void) +``` + +- `list`: The mutable reference to the list to be shuffled. +- `random`: If provided, this function will be used to get a random index in the list. Returned + values must be between `min` and `max` (inclusive). (Used for deterministic pseudorandom number + generation) + +**Returns:** +Nothing. + +**Example:** +```tomo +>> list.shuffle() +``` + +--- + +### `shuffled` +Creates a new list with elements shuffled. + +```tomo +func shuffled(list: [T], random: func(min,max:Int64->Int64)? = none -> [T]) +``` + +- `list`: The list to be shuffled. +- `random`: If provided, this function will be used to get a random index in the list. Returned + values must be between `min` and `max` (inclusive). (Used for deterministic pseudorandom number + generation) + +**Returns:** +A new list with shuffled elements. + +**Example:** +```tomo +>> [10, 20, 30, 40].shuffled() += [40, 10, 30, 20] +``` + +--- + +### `slice` +Returns a slice of the list spanning the given indices (inclusive). + +```tomo +func slice(list: [T], from: Int, to: Int -> [T]) +``` + +- `list`: The original list. +- `from`: The first index to include. +- `to`: The last index to include. + +**Returns:** +A new list spanning the given indices. Note: negative indices are counted from +the back of the list, so `-1` refers to the last element, `-2` the +second-to-last, and so on. + +**Example:** +```tomo +>> [10, 20, 30, 40, 50].slice(2, 4) += [20, 30, 40] + +>> [10, 20, 30, 40, 50].slice(-3, -2) += [30, 40] +``` + +--- + +### `sort` +Sorts the elements of the list in place in ascending order (small to large). + +```tomo +func sort(list: @[T], by=T.compare -> Void) +``` + +- `list`: The mutable reference to the list to be sorted. +- `by`: The comparison function used to determine order. If not specified, the + default comparison function for the item type will be used. + +**Returns:** +Nothing. + +**Example:** +```tomo +list := [40, 10, -30, 20] +list.sort() +>> list += [-30, 10, 20, 40] + +list.sort(func(a,b:&Int): a.abs() <> b.abs()) +>> list += [10, 20, -30, 40] +``` + +--- + +### `sorted` +Creates a new list with elements sorted. + +```tomo +sorted(list: [T], by=T.compare -> [T]) +``` + +- `list`: The list to be sorted. +- `by`: The comparison function used to determine order. If not specified, the + default comparison function for the item type will be used. + +**Returns:** +A new list with sorted elements. + +**Example:** +```tomo +>> [40, 10, -30, 20].sorted() += [-30, 10, 20, 40] + +>> [40, 10, -30, 20].sorted(func(a,b:&Int): a.abs() <> b.abs()) += [10, 20, -30, 40] +``` + +--- + +### `to` +Returns a slice of the list from the start of the original list up to a specified index (inclusive). + +```tomo +to(list: [T], last: Int -> [T]) +``` + +- `list`: The original list. +- `last`: The index up to which elements should be included. + +**Returns:** +A new list containing elements from the start up to the specified index. + +**Example:** +```tomo +>> [10, 20, 30, 40, 50].to(3) += [10, 20, 30] + +>> [10, 20, 30, 40, 50].to(-2) += [10, 20, 30, 40] +``` + +--- + +### `unique` +Returns a Set that contains the unique elements of the list. + +```tomo +unique(list: [T] -> {T}) +``` + +- `list`: The list to process. + +**Returns:** +A set containing only unique elements from the list. + +**Example:** +```tomo +>> [10, 20, 10, 10, 30].unique() += {10, 20, 30} +``` |
