diff options
| author | Bruce Hill <bruce@bruce-hill.com> | 2025-08-09 15:28:50 -0400 |
|---|---|---|
| committer | Bruce Hill <bruce@bruce-hill.com> | 2025-08-09 15:28:50 -0400 |
| commit | 98cadc2135be65abcf9dff53d87af9ea549757a2 (patch) | |
| tree | 0c21743e63f763d736923f5e72a6a064b559602b /docs/text.md | |
| parent | 57154250c71aee7d7827afd3c608ba876f51206a (diff) | |
Wording
Diffstat (limited to 'docs/text.md')
| -rw-r--r-- | docs/text.md | 29 |
1 files changed, 16 insertions, 13 deletions
diff --git a/docs/text.md b/docs/text.md index bff6ee4e..2df27811 100644 --- a/docs/text.md +++ b/docs/text.md @@ -13,19 +13,22 @@ Internally, Tomo text's implementation is based on [Raku/MoarVM's strings](https://docs.raku.org/language/unicode) and [Boehm et al's Cords/Ropes](https://www.cs.tufts.edu/comp/150FP/archive/hans-boehm/ropes.pdf). Texts store their grapheme cluster count and either a compact list of 8-bit -ASCII characters (for ASCII text), a list of 32-bit normal-form grapheme -cluster values (see below), a compressed form of grapheme clusters with a -lookup table, or a (roughly) balanced binary tree representing a concatenation. -The upside of this approach is that repeated concatenations are typically a -constant-time operation, which will occasionally require a small rebalancing -operation. Text is stored in a format that is highly memory-efficient and -index-based text operations (like retrieving an arbitrary index or slicing) are -very fast: typically a constant-time operation for arbitrary unicode text, but -in the worst case scenario (text built from many concatenations), `O(log(n))` -time with very generous constant factors typically amounting to only a handful -of steps. Since concatenations use shared substructures, they are very -memory-efficient and can be used efficiently for applications like implementing -a text editor that stores a full edit history of a large file's contents. +ASCII characters (for ASCII text), a list of 32-bit normal-form grapheme cluster +values (see below), a compressed form of grapheme clusters with a lookup table, +or a (roughly) balanced binary tree representing a concatenation. The upside of +this approach is that repeated concatenations are typically a constant-time +operation, which will occasionally require a small rebalancing operation. Text +is stored in a format that is highly memory-efficient and index-based text +operations (like retrieving an arbitrary index or slicing) are very fast: +typically a constant-time operation for arbitrary unicode text, but in the worst +case scenario (text built from many concatenations), `O(log(n))` time with very +generous constant factors typically amounting to only a handful of steps. Since +concatenations use shared substructures, they are very memory-efficient for +applications where you want to store many versions of a large text with +modifications. For example, if you are implementing a text editor, you can +naively store the full text contents of a file at each point in its edit +history, and it will only have a small memory footprint because of shared +substructures in the text. ### Normal-Form Graphemes |
