Skip to content

Configuration File

busshi edited this page Nov 18, 2021 · 10 revisions

Such as any HTTP server, our webserv needs a configuration file to operate.

Configuration file format

Example configuration

For that purpose, we decided to replicate the nginx configuration style, which is basically made of key-value pairs and blocks.

Here is a minimal example configuration:

server {
    listen 80;
    index index.html;

    location / {
        root /var/www/html;
    }

    location /blog {
        root /var/www/html/blog;
    }
}

File format rules

As there is no official specification for the file format nginx uses, our version may slightly differ, which is not an issue. (It is not expected to reproduce the same configuration language anyway)

Lexer rules

The lexer is responsible for validating the grammatical structure of the configuration file.

Here are some of the most common rules enforced by our lexer:

  • A configuration option which is not a block must always have a key and a value, and then end with a semicolon character ;.
autoindex;     # error: no value
autoindex on   # error: key-value pair but no ending semicolon
autoindex off; # valid!
  • A configuration option which IS a block must have a key, but not necessarily a value. If a value is provided, its end is delimited by the first { encountered. Block scope must be represented by a pair of curly braces {}, inside which can be put more specific configuration options. Putting a semicolon after the closing brace is not allowed and would produce a lexing error. If either the opening or closing brace is forgotten, the related error message may not make much sense as it will cascade all over the remaining of the configuration file.
# error: anonymous blocks make no sense
{
}

# error: semicolon at the end
server {
    autoindex on;
};

# ok
server {}

# ok
server { autoindex on; }

#ok
server
{
}

# ok: 2 empty server blocks, and one that contains two location blocks
server {} server {} server {
    location / {}
    location /blog {
        root /var/www/html/blog;
    }
}

Configuration file is free form

As demonstrated above, there is almost no requirement on which form or style the configuration file must use. Lexically, most tokens can be separated with any amount of spaces and newlines without causing any issue. The only real exception is that non-block configuration items must put the semicolon that ends the value on the same line than the value.

For example:

server     {
      autoindex

           on       ;
 }

is perfectly valid, while:

server {
    autoindex
    on
    ;
}

is not, because the semicolon is not on the same line than the value.

Comments

When the # character is encountered, the remaining characters on the line are skipped.

Environment variables

Environment variables are accepted as part of directive values. To use them, the standard shell notation can be used:

server {
  listen 80;
  server_name $SERVER_NAME;
  root $HOME/.local/var/www/html;

  location / {
  }
}

Environment variables are NOT expanded in any other context.

One convenient way to start webserv with a given set of environment variable would be:

ROOT_S1=/var/www/html/portfolio; ROOT_S2=/var/www/html/blog ./webserv ./asset/config/example1.conf

How parsing is done, and data actually used

Data representation

Each configuration item (block or non-block) shares the same underlying structure, and is represented as a simple C++ class:

struct ConfigItem {
    std::string name;
    std::string value;
    std::vector<ConfigItem*> children;
    BlockType type;
}

This is a highly simplified version, please see the corresponding header for more informations.

As shown by the above structure, the block system is put in place by creating a vector of ConfigItem in each ConfigItem. If a ConfigItem has at least one child, then it means that it is a block, because only blocks are capable of holding other items. The type is used to know which block - if it is a block - it is.

Using the data

In reality, the ConfigItem is a way more complex class that bundles several methods in order to easily access its child or parent elements.

First, the configuration is parsed from a file, using a ConfigParser object:

ConfigParser cfgp;

ConfigItem* global = cfgp.loadConfig("/path/to/config");

During the load process, a Lexer::LexerException or a ConfigParser::ParserException may be thrown if there is something wrong with the lexing or parsing processes.

The returned ConfigItem* is a pointer to the global scope of the configuration, which therefore holds the whole configuration.

findBlocks

Helper method that returns a ConfigItem* vector holding all the blocks of a given type that are direct children of the current block. The ConfigItem on which this method is called must therefore be a block otherwise an exception is thrown.

In the following example, findBlocks is used to process each server block from the global config scope, then each location block of each server block.

std::vector<ConfigItem*> serverBlocks = global->findBlocks("server");

for (std::vector<ConfigItem*>::const_iterator ite = serverBlocks.begin();
    ite != serverBlocks.end(); ++ite) {
        std::vector<ConfigItem*> locationBlocks = (*ite)->findBlocks("location");
        // do something with each location of this server block
}

findAtomInBlock

Retrieve a given ConfigItem* inside the block item on which this method is called.

ConfigItem* globalAutoindex = global->findAtomInBlock("autoindex");

if (globalAutoindex) {
    std::cout << globalAutoindex->name() << " = " << globalAutoindex->value();
} else {
    std::cout << "Did not found autoindex directly inside global scope\n";
}

findNearestAtom

Retrieve the nearest configuration item which may apply to the one on which this method is called.

Given the following configuration:

some_config 2;

server {
    some_config 1;
    location / {
        some_config 0;
    }
}

Assuming that locationBlock refers to the location configuration item as described by the configuration above:

ConfigItem* someConfig = locationBlock->findNearestAtom("some_config");

std::cout << someConfig->value() << "\n";

This code snippet will print 1 because it is the nearest value, starting at the scope of the location item itself.

In case we edit the configuration file like so:

some_config 2;

server {
    location / {
        some_config 0;
    }
}

Then the printed value will be 2 as this is the nearest value now that some_config is no longer defined in the server block. This may become really handy for some configuration options that are able to defined at multiple scope levels.

Available directives

Directive Name Syntax context type
listen listen [<IPv4>:]port BLOCK_SERVER value
autoindex autoindex on|off ANY value
upload_max_size upload_max_size <number>[k[b] | m[b]] ANY value
client_body_max_size client_body_max_size <number>[k[b] | m[b]] ANY value
server server BLOCK_GLOBAL block
location location <path> BLOCK_LOCATION block
root root <path> BLOCK_SERVER value
method method GET|POST|PUT|DELETE BLOCK_SERVER block
file_upload_dir file_upload_dir <path> ANY value
default_error_file default_error_file <path> ANY value
index index <filename-1> <filename-2> <filename-x> BLOCK_SERVER value
log_level log_level DEBUG|INFO|WARNING|ERROR BLOCK_GLOBAL value