aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorBruce Hill <bruce@bruce-hill.com>2025-01-23 15:33:56 -0500
committerBruce Hill <bruce@bruce-hill.com>2025-01-23 15:33:56 -0500
commitf93dde14496ef784df6b7b3e1de1030a868be985 (patch)
treee4f5bcc1852d13e2f2d853a1f6590ccdd93e18a2 /docs
parentc60ea2079fb230213308904cd0966e5481d2d994 (diff)
Overhaul of Text implementation to be more like Cords and have much
better performance for long sequences of repeated concatenation.
Diffstat (limited to 'docs')
-rw-r--r--docs/text.md22
1 files changed, 16 insertions, 6 deletions
diff --git a/docs/text.md b/docs/text.md
index 0384aa41..c8a105d5 100644
--- a/docs/text.md
+++ b/docs/text.md
@@ -9,12 +9,22 @@ etc.).
## Implementation
-Internally, Tomo text's implementation is based on [Raku's
-strings](https://docs.raku.org/language/unicode). Strings store their grapheme
-cluster count and either a compact array of 8-bit ASCII characters (for ASCII
-text), an array of 32-bit normal-form grapheme cluster values (see below), or a
-flat subarray of multiple texts that are either ASCII or graphemes (the
-structure is not arbitrarily nested).
+Internally, Tomo text's implementation is based on [Raku/MoarVM's
+strings](https://docs.raku.org/language/unicode) and [Boehm et al's
+Cords](https://www.cs.tufts.edu/comp/150FP/archive/hans-boehm/ropes.pdf).
+Strings store their grapheme cluster count and either a compact array of 8-bit
+ASCII characters (for ASCII text), an array of 32-bit normal-form grapheme
+cluster values (see below), or a (roughly) balanced binary tree concatenation
+of two texts. The upside is that repeated concatenations are typically a
+constant-time operation, which will occasionally require a small rebalancing
+operation. Index-based text operations (like retrieving an arbitrary index or
+slicing) are very fast: typically a constant-time operation for arbitrary
+unicode text, but in the worst case scenario (text built from many
+concatenations), `O(log(n))` time with very generous constant factors typically
+amounting to only a handful of steps. Since concatenations use shared
+substructures, they are very memory-efficient and can be used efficiently for
+applications like implementing a text editor that stores a full edit history of
+a large file's contents.
### Normal-Form Graphemes