(221 lines)

Lists

Tomo supports lists as a container type that holds a list of elements of any type in a compact format similar to a C-style array. Lists are immutable by default, but use copy-on-write semantics to efficiently mutate in place when possible. Lists are 1-indexed, which means the first item in the list has index 1.

Syntax

Lists are written using square brackets and a list of comma-separated elements:

nums := [10, 20, 30]

Each element must have the same type (or be easily promoted to the same type). If you want to have an empty list, you must specify what type goes inside the list like this:

empty : [Int] = []

For type annotations, a list that holds items with type T is written as [T].

List Comprehensions

Lists can also use comprehensions, where you specify how to dynamically create all the elements by iteration instead of manually specifying each:

assert [i*10 for i in (3).to(8)] == [30, 40, 50, 60, 70, 80]
assert [i*10 for i in (3).to(8) if i != 4] == [30, 50, 60, 70, 80]

Comprehensions can be combined with regular items or other comprehensions:

nums := [-1, i*10 for i in (3).to(8), i for i in 3]
assert nums == [-1, 30, 40, 50, 60, 70, 80, 1, 2, 3]

Length

List length can be accessed by the .length field:

assert [10, 20, 30].length == 3

Indexing

List values are accessed using square bracket indexing. Since lists are 1-indexed, the index 1 corresponds to the first item in the list. Negative indices are used to refer to items from the back of the list, so -1 is the last item, -2 is the second-to-last, and so on.

list := [10, 20, 30, 40]
assert list[1] == 10?

assert list[2] == 20?

assert list[999] == none

assert list[-1] == 40?

assert list[-2] == 30?

If a list index of 0 or any value larger than the length of the list is used, a none value will be returned.

Iteration

You can iterate over the items in a list like this:

for item in list
    ...

for i, item in list
    ...

List iteration operates over the value of the list when the loop began, so modifying the list during iteration is safe and will not result in the loop iterating over any of the new values.

Concatenation

Lists can be concatenated with the ++ operator, which returns a list that has the items from one appended to the other. This should not be confused with the addition operator +, which does not work with lists.

assert [1, 2] ++ [3, 4] == [1, 2, 3, 4]

Implementation Details

Under the hood, lists are implemented as a struct that contains a pointer to a contiguous chunk of memory storing the elements of the list and some other metadata. Since Tomo has datatypes with different sizes, like Bools which take one byte and structs which can take up many bytes, it's worth noting that lists store the elements compactly and inline, without the need for each list cell to hold a pointer to where the data actually lives.

The other metadata stored with a list includes its length as well as the stride of the list. The stride is not exposed to the user, but it's the gap in bytes between each element in the list. The reason this is mentioned is that it is possible to create immutable slices of lists in constant time by creating a new struct that points to the appropriate starting place for the list items and has the appropriate stride. The upshot is that a method like list.reversed() does not actually copy the list, it simply returns a struct that points to the back of the list with a negative stride. Lists adhere to copy-on-write semantics, so we can cheaply create many read-only references to the same data, and only need to do copying if we plan to modify data. After doing a modification, future modifications can be done in-place as long as there is only one reference to that data.

Internally, we also take advantage of this inside of tables, which compactly store all of the key/value pairs in a contiguous list and we can return an immutable slice of that list showing only the keys or only the values by choosing the right starting point and stride.

Copy on Write

Lists can be thought of as values that have copy-on-write semantics that use reference counting to perform efficient in-place mutations instead of copying as a performance optimization when it wouldn't affect the program's semantics. Without getting too deep into the details, suffice it to say that when you create a list, that list can be thought of as a singular "value" in the same way that 123 is a value. That variable's value will never change unless you explicitly perform an assignment operation on the variable or call a method on the variable.

Because it would be tedious to require users to write all list operations as pure functions like list = list.with_value_at_index(value=x, index=i), Tomo provides the familiar imperative syntax for modifying lists, but keeps the semantics of the pure functional style. Writing list[i] = x is semantically equivalent to list = list.with_value_at_index(value=x, index=i), but much more readable and easy to write. Similarly, list.insert(x) is semantically equivalent to list = list.with_value_inserted(x). We implement these mutating methods as functions that take a pointer to a list variable, which then either mutate the list's data in-place (if this is the only thing referencing that data) or construct a new list and store its value in the memory where the list variable is stored.

When there is only a single reference to a list value, we can perform these modifications in-place (lists typically have a little bit of spare capacity at the end, so appending usually doesn't trigger a reallocation). When there are shared references, we must create a copy of the list's data before modifying it so the other references don't see the effects of the mutation. Here are some simple examples:

nums := [10, 20, 30, 39]

// Efficient in-place mutation because data references are not shared:
nums[4] = 40

// Constant time operation, but increments the reference count:
tmp := nums
assert tmp == [10, 20, 30, 40]

// Now, a mutation will trigger a copy-on-write,
// which resets the reference count to zero:
nums[4] = 999
assert nums == [10, 20, 30, 999]

// Because of the copy-on-write, `tmp` is unchanged:
assert tmp == [10, 20, 30, 40]

// Since the reference count has been reset, we can do more
// mutations without triggering another copy-on-write:
nums[4] = -1
assert nums == [10, 20, 30, -1]

List reference counting is approximate, but will only ever err on the side of correctness at the expense of performance, not the other way around. Occasionally, unnecessary copying may occur, but you should never experience an list value changing because of some operation performed on a different list value.

List Pointers

Since the normal case of lists is to treat them like immutable values, what do we do if we actually want to have a shared reference to a list whose contents change over time? In that case, we want to use the @ operator to create a pointer to a heap-allocated list and pass that pointer around. This is the same behavior that you get in Python when you create a list:

nums := @[10, 20, 30]
tmp := nums

nums.insert(40)
assert tmp == @[10, 20, 30, 40]

Having multiple pointers to the same heap-allocated list does not cause the list's reference count to increase, because there is only one "value" in play: the one stored on the heap. It's only when we store the "value" in multiple places that we need to increment the reference count:

// Increment the reference count, because `value` now has to hold
// whatever data was at the pointer's location at this point in time: 
value := nums[]

The TL;DR is: you can cheaply modify local variables that aren't aliased or @-allocated lists, but if you assign a local variable list to another variable or dereference a heap pointer, it may trigger copy-on-write behavior.

API

API documentation

   1 # Lists
   2 
   3 Tomo supports lists as a container type that holds a list of elements of any
   4 type in a compact format similar to a C-style array. Lists are immutable by
   5 default, but use copy-on-write semantics to efficiently mutate in place when
   6 possible. **Lists are 1-indexed**, which means the first item in the list has
   7 index `1`.
   8 
   9 ## Syntax
  10 
  11 Lists are written using square brackets and a list of comma-separated elements:
  12 
  13 ```tomo
  14 nums := [10, 20, 30]
  15 ```
  16 
  17 Each element must have the same type (or be easily promoted to the same type). If
  18 you want to have an empty list, you must specify what type goes inside the list
  19 like this:
  20 
  21 ```tomo
  22 empty : [Int] = []
  23 ```
  24 
  25 For type annotations, a list that holds items with type `T` is written as `[T]`.
  26 
  27 ### List Comprehensions
  28 
  29 Lists can also use comprehensions, where you specify how to dynamically create
  30 all the elements by iteration instead of manually specifying each:
  31 
  32 ```tomo
  33 assert [i*10 for i in (3).to(8)] == [30, 40, 50, 60, 70, 80]
  34 assert [i*10 for i in (3).to(8) if i != 4] == [30, 50, 60, 70, 80]
  35 ```
  36 
  37 Comprehensions can be combined with regular items or other comprehensions:
  38 
  39 ```tomo
  40 nums := [-1, i*10 for i in (3).to(8), i for i in 3]
  41 assert nums == [-1, 30, 40, 50, 60, 70, 80, 1, 2, 3]
  42 ```
  43 
  44 ## Length
  45 
  46 List length can be accessed by the `.length` field:
  47 
  48 ```tomo
  49 assert [10, 20, 30].length == 3
  50 ```
  51 
  52 ## Indexing
  53 
  54 List values are accessed using square bracket indexing. Since lists are
  55 1-indexed, the index `1` corresponds to the first item in the list. Negative
  56 indices are used to refer to items from the back of the list, so `-1` is the
  57 last item, `-2` is the second-to-last, and so on.
  58 
  59 ```tomo
  60 list := [10, 20, 30, 40]
  61 assert list[1] == 10?
  62 
  63 assert list[2] == 20?
  64 
  65 assert list[999] == none
  66 
  67 assert list[-1] == 40?
  68 
  69 assert list[-2] == 30?
  70 ```
  71 
  72 If a list index of `0` or any value larger than the length of the list is
  73 used, a `none` value will be returned.
  74 
  75 ## Iteration
  76 
  77 You can iterate over the items in a list like this:
  78 
  79 ```tomo
  80 for item in list
  81     ...
  82 
  83 for i, item in list
  84     ...
  85 ```
  86 
  87 List iteration operates over the value of the list when the loop began, so
  88 modifying the list during iteration is safe and will not result in the loop
  89 iterating over any of the new values.
  90 
  91 ## Concatenation
  92 
  93 Lists can be concatenated with the `++` operator, which returns a list that
  94 has the items from one appended to the other. This should not be confused with
  95 the addition operator `+`, which does not work with lists.
  96 
  97 ```tomo
  98 assert [1, 2] ++ [3, 4] == [1, 2, 3, 4]
  99 ```
 100 
 101 ## Implementation Details 
 102 
 103 Under the hood, lists are implemented as a struct that contains a pointer to a
 104 contiguous chunk of memory storing the elements of the list and some other
 105 metadata. Since Tomo has datatypes with different sizes, like `Bool`s which
 106 take one byte and `struct`s which can take up many bytes, it's worth noting
 107 that lists store the elements compactly and inline, without the need for each
 108 list cell to hold a pointer to where the data actually lives.
 109 
 110 The other metadata stored with a list includes its length as well as the
 111 _stride_ of the list. The stride is not exposed to the user, but it's the gap
 112 in bytes between each element in the list. The reason this is mentioned is
 113 that it is possible to create immutable slices of lists in constant time by
 114 creating a new struct that points to the appropriate starting place for the
 115 list items and has the appropriate stride. The upshot is that a method like
 116 `list.reversed()` does not actually copy the list, it simply returns a struct
 117 that points to the back of the list with a negative stride. Lists adhere to
 118 copy-on-write semantics, so we can cheaply create many read-only references to
 119 the same data, and only need to do copying if we plan to modify data. After
 120 doing a modification, future modifications can be done in-place as long as
 121 there is only one reference to that data.
 122 
 123 Internally, we also take advantage of this inside of tables, which compactly
 124 store all of the key/value pairs in a contiguous list and we can return an
 125 immutable slice of that list showing only the keys or only the values by
 126 choosing the right starting point and stride.
 127 
 128 ## Copy on Write
 129 
 130 Lists can be thought of as values that have copy-on-write semantics that use
 131 reference counting to perform efficient in-place mutations instead of copying
 132 as a performance optimization when it wouldn't affect the program's semantics.
 133 Without getting too deep into the details, suffice it to say that when you
 134 create a list, that list can be thought of as a singular "value" in the same
 135 way that `123` is a value. That variable's value will never change unless you
 136 explicitly perform an assignment operation on the variable or call a method on
 137 the variable.
 138 
 139 Because it would be tedious to require users to write all list operations as
 140 pure functions like `list = list.with_value_at_index(value=x, index=i)`, Tomo
 141 provides the familiar imperative syntax for modifying lists, but keeps the
 142 semantics of the pure functional style. Writing `list[i] = x` is
 143 _semantically_ equivalent to `list = list.with_value_at_index(value=x,
 144 index=i)`, but much more readable and easy to write. Similarly,
 145 `list.insert(x)` is semantically equivalent to `list =
 146 list.with_value_inserted(x)`. We implement these mutating methods as functions
 147 that take a pointer to a list variable, which then either mutate the list's
 148 data in-place (if this is the only thing referencing that data) or construct a
 149 new list and store its value in the memory where the list variable is stored.
 150 
 151 When there is only a single reference to a list value, we can perform these
 152 modifications in-place (lists typically have a little bit of spare capacity at
 153 the end, so appending usually doesn't trigger a reallocation). When there are
 154 shared references, we must create a copy of the list's data before modifying
 155 it so the other references don't see the effects of the mutation. Here are some
 156 simple examples:
 157 
 158 ```tomo
 159 nums := [10, 20, 30, 39]
 160 
 161 // Efficient in-place mutation because data references are not shared:
 162 nums[4] = 40
 163 
 164 // Constant time operation, but increments the reference count:
 165 tmp := nums
 166 assert tmp == [10, 20, 30, 40]
 167 
 168 // Now, a mutation will trigger a copy-on-write,
 169 // which resets the reference count to zero:
 170 nums[4] = 999
 171 assert nums == [10, 20, 30, 999]
 172 
 173 // Because of the copy-on-write, `tmp` is unchanged:
 174 assert tmp == [10, 20, 30, 40]
 175 
 176 // Since the reference count has been reset, we can do more
 177 // mutations without triggering another copy-on-write:
 178 nums[4] = -1
 179 assert nums == [10, 20, 30, -1]
 180 ```
 181 
 182 List reference counting is _approximate_, but will only ever err on the side
 183 of correctness at the expense of performance, not the other way around.
 184 Occasionally, unnecessary copying may occur, but you should never experience an
 185 list value changing because of some operation performed on a different list
 186 value.
 187 
 188 ## List Pointers
 189 
 190 Since the normal case of lists is to treat them like immutable values, what do
 191 we do if we actually want to have a shared reference to a list whose contents
 192 change over time? In that case, we want to use the `@` operator to create a
 193 pointer to a heap-allocated list and pass that pointer around. This is the same
 194 behavior that you get in Python when you create a `list`:
 195 
 196 ```tomo
 197 nums := @[10, 20, 30]
 198 tmp := nums
 199 
 200 nums.insert(40)
 201 assert tmp == @[10, 20, 30, 40]
 202 ```
 203 
 204 Having multiple pointers to the same heap-allocated list does not cause the
 205 list's reference count to increase, because there is only one "value" in play:
 206 the one stored on the heap. It's only when we store the "value" in multiple
 207 places that we need to increment the reference count:
 208 
 209 ```tomo
 210 // Increment the reference count, because `value` now has to hold
 211 // whatever data was at the pointer's location at this point in time: 
 212 value := nums[]
 213 ```
 214 
 215 The TL;DR is: you can cheaply modify local variables that aren't aliased or
 216 `@`-allocated lists, but if you assign a local variable list to another
 217 variable or dereference a heap pointer, it may trigger copy-on-write behavior.
 218 
 219 # API
 220 
 221 [API documentation](../api/lists.md)