Block-based Configuration Language

1.Introduction

Software is depending on a variety of languages to write configuration files. While it used to be common for each program to have its own configuration language, it has become more and more common to rely on data formats such as JSON (RFC 8259), YAML or TOML.

If the wide availability of tools and libraries to work with these formats has facilitated their adoption, their limitations are frustrating: designed to map to a small number of data structures —atomic values, arrays and dictionaries— they lead to very verbose documents when configuration data get complex.

xkcd standards
XKCD 927 — Standards

We present BCL, the Block-based Configuration Language, as an alternative to these minimalist data formats. BCL aims to be expressive, making configuration files simpler and easier to maintain.

Note
This specification is a draft that will evolve until it reaches a first stable version. In the mean time, feel free to send feedback to nicolas@n16f.net.

2.Documents

A BCL configuration file is a document, a container where settings are defined hierarchically. When a program loads a BCL file, the result is a document data structure which can be used to lookup settings.

2.1.Defining settings

Configuring software is about assigning values to settings. With BCL it is as simple as adding new entries to the configuration file, each entry having a name identifying the setting and an optional list of values.

For example:

bind "localhost" 8080
connect_timeout 30.0
drop_inactive_connections

Each entry extends to the end of the line. For long entries it can be useful to split values on multiple lines using the backslash character:

user "Bob" 1000 1000 "Bob Howard" \
     "/home/bob" "/usr/bin/zsh"

2.1.1.Multiple values

Being able to specify multiple values for a setting is one of the super powers of BCL: used correctly, it is more compact and easier to follow than having multiple settings. For example:

bind "localhost" 8080

instead of:

bind_host "localhost"
bind_port 8080

Of course it is easy to abuse this feature: remember that the meaning of each value should be obvious.

2.1.2.Value types

BCL support five fundamental value types:

Strings
UTF-8 encoded character strings, surrounded by double quote characters. Both the backslash characters and the double quote character can be used in the string by escaping them with a blackslash character.
Integers
64 bit signed integers.
Floats
Double precision IEEE 754 floating point numbers.
Booleans
Either true or false.
Symbols
Non-quoted sequences of characters generally used to represent enumeration values.

Aggregate types —usually lists and dictionaries in other configuration formats— do not exist in BCL. When the configuration requires a list of elements, one uses either multiple values in a single entry:

groups "admins" "ops" "users"

or simply multiple entries:

group "admins"
group "ops"
group "users"

The more convenient and readable version depends on the configuration data you are representing. This is something to keep in mind when designing your configuration scheme.

For what would be represented as a dictionary in other formats, BCL groups settings into blocks.

2.2.Grouping settings into blocks

Settings are grouped into hierachical blocks. These structures let you organize your configuration files in a logical way. Top-level entries are part of a implicit group called the top-level group.

account {
  name "bob"
  home "/home/bob"

  contact {
    email_address "bob@example.com"
    email_address "bob@home.example.com"
  }
}

account {
  name "alice"
  disabled

  contact {
    email_address "alice@example.com"
  }
}

Groups can be named so that they can be distinguished and referenced. With group names, the previous example can be simplified:

account "bob" {
  home "/home/bob"

  contact {
    email_address "bob@example.com"
    email_address "bob@home.example.com"
  }
}

account "alice"{
  disabled

  contact {
    email_address "alice@example.com"
  }
}

3.Syntax

3.1.Content

BCL content is represented as a sequence of character, each character being a Unicode codepoint. BCL content is always encoded using UTF-8 (see RFC 3629).

3.1.1.Lines

A physical line is a sequence of character sending with an end-of-line sequence being a newline character (U+001A, "\n") optionally preceded by a carriage return character (U+001D, "\r").

A logical line is a sequence of one or more physical lines where all physical lines but the last ends with a backslash character (U+005C, "\").

For example the following document is made of 3 physical lines that are interpreted as 2 logical lines:

match path "/private"
reply 401 \
       "access denied"

Lines are significant in BCL content: an entry is defined by a single logical line, i.e. one or more physical lines.

3.1.2.Whitespace

Whitespace is defined as a sequence of one or more space (U+0020, " ") or tabulation character (U+0009, "\t").

Whitespace is not significant: in the following example, all three lines are equivalent:

reply 200 "ok"
reply   200	"ok"
	  reply		  200   "ok"

3.1.3.Comments

Comments start with a number sign (U+0023, "#") character and end at the end of the current physical line.

Comments are not significant: they have no impact on the structure of a document or its elements. In particular it means that comments do not affect logical lines: in the following example, the 3 physical lines form a single logical line:

reply 200 \
      # a first comment
      "ok" # another comment

3.1.4.Tokens

Non-whitespace characters are aggregated into tokens. The following tokens are defined:

Opening brackets
  • The opening delimiter for blocks (U+007B, "{").

Closing brackets
  • The closing delimitere for blocks (U+007D, "}").

Symbols
  • Sequences of alphanumerical characters (U+0030-U+0039, "0" to "9", and U+0061-U+007A, "a" to "z") and underscore characters (U+005F, "_") starting with an alphabetical character (U+0061-U+007A, "a" to "z"). E.g. foo, bar-baz-42.

Strings
  • Sequences of characters starting and ending with a double quote character (U+0022), optionally preceded by a sigil. Double quote characters (U+0022) and backslash characters (U+005C "\") can be included in strings provided that they are escaped with a backslash character. The sigil starts with a tilde character (U+007E, "~") and only contains alphanumerical characters (U+0030-U+0039, "0" to "9", and U+0061-U+007A, "a" to "z"). E.g. "foo", "a \"b\" c", ~re"^ab{1,3}c?".

Integers
  • Decimal integers optionally preceded by a sign character. E.g. 42, -123, +456.

Floats
  • Double precision floating point numbers. E.g. 1.0, -2.345, 0.7e-89.

3.2.Elements

A document contains a sequence of elements, each element being either a block or an entry.

3.2.1.Blocks

A block is a grouping construct starting with a symbol identifying the type of the block, optionally followed by a string (the name of the block), then an opening bracket, a sequence (possibly empty) of elements and a closing bracket.

Examples
# An empty named block
account "bob" {
}

# An unnamed block
storage {
  path "/var/lib/example"
}

3.2.2.Entries

An entry is made of a symbol identifying the type of the entry, optionally followed by a list of values, each value being either a symbol, a string, an integer or a float.

Examples
# An entry with no value
log_debug_messages

# An entry with multiple values
match path "/app"

3.2.3.Values

3.2.3.1.Strings

Strings are used to represent arbitrary textual data.

Strings can be annotated with a sigil, a marker signaling that the string can be interpreted with a different semantic. For example the the ~re sigil could be used to indicate that the string is the textual representation of a regular expression. Implementations must not assign meaning to specific sigils: applications are free to interpret them as they see fit, including ignoring them altogether.

3.2.3.2.Integers

Integer values are the decimal representation of signed 64 bit integers.

3.2.3.3.Floats

Float values are the decimal representation of double precision floating point numbers as specified by IEEE 754.

3.2.3.4.Booleans

Boolean values are specified as the true and false symbols.

3.2.3.5.Symbols

Symbols are used to represent fixed values, e.g. enumeration values or constants.

3.2.4.Grammar

The following ABNF grammar (see RFC 5234) is the authoritative description of the BCL language.

document        = element*
element         = block / entry
block           = block-type [ block-name ] "{" element* "}"
block-type      = symbol
block-name      = string
entry           = entry-name value*
entry-name      = symbol
value           = symbol / boolean / string / integer / float
symbol          = %x61-7A *( %x61-7A / %x30-39 / %x5F)                   ; /[a-z][a-z0-9_]*/
boolean         = "true" / "false"
string          = [ sigil ] %x22 ( character / escape-sequence ) %x22
character       = %x20-21 / %x23-5B / %x5D-7E / %x80-10FFFF
character      /= %x5C ( %x22 / %x5C )                                   ; \" \\
character      /= %x5C ( %x61 / %x62 / %x74 / %x6E / %x76 / %x66 / %x72) ; \a \b \t \n \v \f \r
sigil           = %x7E 1*( %x61-7A / %x30-39 )                           ; /~[a-z0-9]+/
integer         = [ sign ] ( %x30 / ( %x31-39 *%x30-39 ) )               ; /0|(?:[1-9][0-9]*)/
float           = [ sign ] integer "." fraction [ exponent ]
fraction        = 1*%x30-39 ; /[0-9]+/
exponent        = ( "e" / "E" ) integer
sign            = "+" / "-"

4.Ecosystem

4.1.Specification

The specification is publicly available at https://n16f.net/bcl. And yes, it is hosted by a Boulevard server!

4.2.Implementations

Go

4.3.Syntax highlighting

Emacs