Galene's Blog -- Advocates for the Rights of Characters (ARC)

Delim System -- 2021-02-15

In programming languages, strings are often written using double quotes like this: "Hello, world!". To include double quotes in the string, you have to escape them: "\"Hello, world!\"". Html is similar. To include "<", ">", and "&", you have to use special escape codes, "&lt;", "&gt;", and "&amp;".

The Problem

This quickly becomes complicated when embedding data that contains embedded data down multiple levels:

# Level 0
print("Hello, world!")

# Level 1
eval("print(\"Hello, world!\")")

# Level 2
eval("eval(\"print(\\\"Hello, world!\\\")\")")

# ...

In some cases, it's not even possible to embed down multiple levels. For example, in some programming langauges where /* */ are used to delimit comments, it's not possible to have a comment inside a comment.

Proposed Solution

I propose the "delim system" as a solution to this problem. There are 3 "delim pairs": (), [], and {}. In the delim system, they must always be matched correctly, and cannot be escaped. (Hello, world!) is a valid delim, and so is (Hello [[world]]!), but (Hello]) is not, and neither are (Hello\]) and (Hello[).

Instead of escapes, we use raw delims. Raw delims looke like this:

# Simple raw delim
`(Hello, world!)` # contents: "Hello, world!"

# Raw delim with pattern
`pattern(Hello, world!)pattern` # contents: "Hello, world!"

# Raw delim with unmatched square bracket
`(Hello, world]!)` # contents: "Hello, world]!"

# Raw delim with pattern to disambiguate
`xyz( `(Hello, world!)` )xyz` # contents: " `(Hello, world!)` "

# Crazy raw delim
`a( )` )b` `a()a` # contents: " )` )b` `a("

Anything is allowed inside a raw delim, except its closing pattern. Since raw delims are started by backticks, backticks are not allowed inside the delim system unless they start or end a raw delim, or are inside a raw delim.

This leaves us with the following rules for a language to follow the delim system:

Any language that follows these simple rules can be embedded in any other, without using any escapes.

Aside: Single-Line Comments

Many languages support comments that go until the end of line. But what if your comment contains an unmatched delim?

# This is a comment with an unmatched delim :(

This is not allowed in the delim system, so instead the comment keeps going until the delim is closed. But even after it's closed, it will still keep going, until the end of the line. The delim ends here ->), and the comment ends here, at the newline ->

This is somewhat strange, but it's necessary to keep the property that any language can be embedded in any other. When you need a comment to contain an unmatched delim, you'll have to use a raw delim.

#`[ This is a comment with an unmatched delim (: ]`

Also, if a comment is inside a delim, it ends at the closing delim.

print("Hello, world!" # This is a comment, and it ends here->);

Aside: Strings

Using parentheses for strings can get kind of ugly. For this reason, it's useful to support strings in double quotes. However, the strings must still follow the delim rules. They may not contain unmatched delims, and like comments, they cannot end in the middle of a delim. See some examples:

# Simple examples
"A string" # contents: [A string]
"A string()" # contents: [A string()]

# Double quotes inside a delim do not close the string
"A string(")" # contents: [A string(")]

")" # Invalid string

Some strings cannot be represented this way, so there will also need to be another way to make strings. I think ~s(A string) and ~s`(A string)` would work well.

Aside: Escape codes

Escape codes are not allowed in a language that follows the delim system. However, a language that follows the delim system can contain a sub-delim which itself uses escape codes, for example it can support strings with normal escape codes like \n.

The rule is this: given a delim language and a language that has escape codes, the delim language can contain the escape language, but the escape language cannot contain the delim language.

A string sub-delim can be embedded in any delim language without issue. In the below example, the escape codes of the string cannot conflict with the list delim.

[string(Hello!\n), string(\tHi!)]

"&#40;" produces an open parenthesis, so the html below does not actually follow the delim rules, even though someone who didn't know html would think it did. This is why html is not a delim language.

html(<delimLang>&#40; [1, 2, 3]</delimLang>)

Motivation

My main motivation for the delim system is to design a programming language for making languages, similar to the Racket Programming Language[^]. Being able to embed languages arbitrarily makes this simpler.

This would also be useful in data formats like xml, json, and html, to simplify escaping. All you'd need to do is generate a raw pattern which is not contained in the data. This is much simpler, and the result would be much easier to read, too. You only need to take a glimpse at the html version of this page to see what I mean:

This line:
you have to use special escape codes, "&lt;", "&gt;", and "&amp;".

When escaped, it becomes:
you have to use special escape codes, &quot;&amp;lt;&quot;, &quot;&amp;gt;&quot;, and &quot;&amp;amp;&quot;.

When escaped again, it becomes:
you have to use special escape codes, &amp;quot;&amp;amp;lt;&amp;quot;, &amp;quot;&amp;amp;gt;&amp;quot;, and &amp;quot;&amp;amp;amp;&amp;quot;.

With the delim system, it would look like this:

This line:
you have to use special escape codes, "&lt;", "&gt;", and "&amp;".

When escaped, it becomes:
(you have to use special escape codes, "&lt;", "&gt;", and "&amp;".)

When escaped again, it becomes:
((you have to use special escape codes, "&lt;", "&gt;", and "&amp;".))

Or, for a fairer comparison:

This line:
* Outside of raw delims, backticks "`" are only allowed to start and end a raw delim.

When escaped, it becomes:
`(* Outside of raw delims, backticks "`" are only allowed to start and end a raw delim.)`

When escaped again, it becomes:
`a(`(* Outside of raw delims, backticks "`" are only allowed to start and end a raw delim.)`)a`

Even in the fairer comparison, the delim system gives a result which is much easier to read!

Update 2021-02-16: Raw Line Delims

Since line comments will probably be a common feature, and #`[]` is an ugly syntax for comments containing unmatched delims, I'm adding these additional rules to the delim system:

Relatedly, I've added a wiki entry for the delim system.

Footer

Date: 2021-02-15

Latest Edit: 2021-02-16

Author: Galene

Galene's Avatar (small)

Galene's Personal Home Page
ARC Home Page