Add serialization docs

author: Bruce Hill <bruce@bruce-hill.com> 2024-11-29 19:36:17 -0500
committer: Bruce Hill <bruce@bruce-hill.com> 2024-11-29 19:36:17 -0500
commit: 0d6ef67a014f231b4e24b71bb85ec1e4df5b6319 (patch)
tree: 9ee243a7722fb3ef6b08a34bdfeae330bca967c1 /docs
parent: 6d2017d5b811826ac84e8d1df6dba84381cf6d2d (diff)
1 files changed, 69 insertions, 0 deletions
diff --git a/docs/serialization.md b/docs/serialization.md
new file mode 100644
index 00000000..6f0749ca
--- /dev/null
+++ b/docs/serialization.md
@@ -0,0 +1,69 @@
+# Serialization
+
+Data serialization and deserialization is notoriously difficult to do correctly
+and tedious to implement. In order to make this process easier, Tomo comes with
+built-in support for serialization and deserialization of most built-in types,
+as well as user-defined structs and enums. Serialization is a process that
+takes Tomo values and converts them to bytes, which can be saved in a file or
+sent over a network. Serialized bytes can the be deserialized to retrieve the
+original value.
+
+## Serializing
+
+To serialize data, simply call the method `:serialize()` on any value and it
+will return an array of bytes that encode the value's data:
+
+```tomo
+value := Int64(5)
+>> serialized := value:serialize()
+= [0x0A] : [Byte]
+```
+
+Serialization produces a fairly compact representation of data as a flat array
+of bytes. In this case, a 64-bit integer can be represented in a single byte
+because it's a small number.
+
+## Deserializing 
+
+To deserialize data, you must provide its type explicitly. The current syntax
+is a placeholder, but it looks like this:
+
+```tomo
+i := 123
+bytes := i:serialize()
+
+roundtripped := DESERIALIZE(bytes):Int
+>> roundtripped
+= 123 :Int
+```
+
+## Pointers
+
+In the case of pointers, deserialization creates a new heap-allocated region of
+memory for the values. This means that if you serialize a pointer, it will
+store all of the memory contents of that pointer, but not the literal memory
+address of the pointer, which may not be valid memory when deserialization
+occurs. The upshot is that you can easily serialize datastructures that rely on
+pointers, but pointers returned from deserialization will point to new memory
+and will not point to the same memory as any pre-existing pointers.
+
+One of the nice things about this process is that it automatically handles
+cyclic datastructures correctly, enabling you to serialize cyclic structures
+like circularly linked lists or graphs:
+
+```tomo
+struct Cycle(name:Text, next=NONE:@Cycle)
+
+c := @Cycle("A")
+c.next = @Cycle("B", next=c)
+>> c
+= @Cycle(name="A", next=@Cycle(name="B", next=@~1))
+>> serialized := c:serialize()
+= [0x02, 0x02, 0x41, 0x01, 0x04, 0x02, 0x42, 0x01, 0x02] : [Byte]
+>> roundtrip := DESERIALIZE(serialized):@Cycle
+= @Cycle(name="A", next=@Cycle(name="B", next=@~1)) : @Cycle
+```
+
+The deserialized version of the data correctly preserves the cycle
+(`roundtrip.next.next == roundtrip`). The representation is also very compact:
+only 9 bytes for the whole thing!
author	Bruce Hill <bruce@bruce-hill.com>	2024-11-29 19:36:17 -0500
committer	Bruce Hill <bruce@bruce-hill.com>	2024-11-29 19:36:17 -0500
commit	0d6ef67a014f231b4e24b71bb85ec1e4df5b6319 (patch)
tree	9ee243a7722fb3ef6b08a34bdfeae330bca967c1 /docs
parent	6d2017d5b811826ac84e8d1df6dba84381cf6d2d (diff)