tomo/docs/langs.md

106 lines
3.2 KiB
Markdown
Raw Normal View History

2024-08-19 11:41:04 -07:00
# Domain-Specific Languages
Tomo supports defining different flavors of text that represent specific
languages, with type safety guarantees that help prevent code injection. Code
injection occurs when you insert untrusted user input into a string without
properly escaping the user input. Tomo's `lang` feature addresses this issue by
letting you define custom text types that automatically escape interpolated
values and give type checking errors if you attempt to use one type of string
where a different type of string is needed.
```tomo
lang HTML:
convert(t:Text -> HTML):
2024-09-16 13:18:01 -07:00
t = t:replace_all({
$/&/ = "&",
$/</ = "&lt;",
$/>/ = "&gt;",
$/"/ = "&quot",
$/'/ = "&#39;",
2024-09-16 13:18:01 -07:00
})
return HTML.from_text(t)
2024-08-19 11:41:04 -07:00
func paragraph(content:HTML -> HTML):
2024-08-19 11:41:04 -07:00
return $HTML"<p>$content</p>"
```
In this example, we're representing HTML as a language and we want to avoid
situations where a malicious user might set their username to something like
`<script>alert('pwned')</script>`.
```
>> username := Text.read_line("Choose a username: ")
= "<script>alert('pwned')</script>"
page := $HTML"
<html><body>
Hello $username! How are you?
</body></html>
"
say(page.text)
2024-08-19 11:41:04 -07:00
```
What we _don't_ want to happen is to get a page that looks like:
```html
<html><body>
Hello <script>alert('pwned')</script>! How are you?
</body></html>
```
Thankfully, Tomo handles automatic escaping and gives you a properly sanitized
result:
```html
<html><body>
Hello &lt;script&gt;alert(&#39;pwned&#39;)&lt;/script&gt;! How are you?
</body></html>
```
This works because the compiler checks for a function in the HTML namespace
2025-02-19 15:50:50 -08:00
that was defined with the name `HTML` that takes a `Text` argument and returns
an `HTML` value (a constructor). When performing interpolation, the
2024-08-19 11:41:04 -07:00
interpolation will only succeed if such a function exists and it will apply
that function to the value before concatenating it.
If you have a function that only accepts an `HTML` argument, you cannot use a
`Text` value, you must produce a valid `HTML` value instead. The same is true
for returning a value for a function that returns an `HTML` value or assigning
to a variable that holds `HTML` values.
Languages can also be built around a namespace-based method call API, instead
of building a global function API that takes language arguments. For example,
instead of building a global function called `execute()` that takes a
`ShellScript` argument, you could instead build something like this:
```tomo
lang Sh:
convert(text:Text -> Sh):
return Sh.from_text("'" ++ text:replace($/'/, "''") ++ "'")
2024-08-19 11:41:04 -07:00
func execute(sh:Sh -> Text):
2024-08-19 11:41:04 -07:00
...
2024-09-16 13:18:01 -07:00
dir := ask("List which dir? ")
2024-08-19 11:41:04 -07:00
cmd := $Sh@(ls -l @dir)
result := cmd:execute()
```
## Conversions
You can define your own rules for converting between types using the `convert`
keyword. Conversions can be defined either inside of the language's block,
another type's block or at the top level.
```tomo
lang Sh:
convert(text:Text -> Sh):
return Sh.from_text("'" ++ text:replace($/'/, "''") ++ "'")
struct Foo(x,y:Int):
convert(f:Foo -> Sh):
return Sh.from_text("$(f.x),$(f.y)")
convert(texts:[Text] -> Sh):
return $Sh" ":join([Sh(t) for t in texts])
```