PyYAML automatically converting certain keys to boolean values

yaml.load takes a second argument, a loader class (by default, yaml.loader.Loader). The predefined loader is a mash up of a number of others:

class Loader(Reader, Scanner, Parser, Composer, Constructor, Resolver):

    def __init__(self, stream):
        Reader.__init__(self, stream)
        Scanner.__init__(self)
        Parser.__init__(self)
        Composer.__init__(self)
        Constructor.__init__(self)
        Resolver.__init__(self)

The Constructor class is the one mapping the data types to Python. One (kludgy, but fast) way to override the boolean conversion could be:

from yaml.constructor import Constructor

def add_bool(self, node):
    return self.construct_scalar(node)

Constructor.add_constructor(u'tag:yaml.org,2002:bool', add_bool)

which overrides the function that the constructor uses to turn boolean-tagged data into Python booleans. What we're doing here is just returning the string, verbatim.

This affects ALL YAML loading, though, because you're overriding the behaviour of the default constructor. A more proper way to do things could be to create a new class derived from Constructor, and new Loader object taking your custom constructor.


PyYAML is YAML 1.1 conformant for parsing and emitting, and for YAML 1.1 this is at least partly documented behavior, so no idiosyncrasy at all, but conscious design.

In YAML 1.2 (which in 2009 superseded the 1.1 specification from 2005) this usage of Off/On/Yes/No was dropped, among other changes.

In ruamel.yaml (disclaimer: I am the author of that package), the round_trip_loader is a safe_loader that defaults to YAML 1.2 behaviour:

import ruamel.yaml as yaml

yaml_str = """\
off:
    yes: "Flavor text for yes"  # quotes around value dropped
    no: "Flavor text for no"
"""

data = yaml.round_trip_load(yaml_str)
assert 'off' in data
print(yaml.round_trip_dump(data, indent=4))

Which gives:

off:
    yes: Flavor text for yes    # quotes around value dropped
    no: Flavor text for no

If your output needs to be version 1.1 compatible then you can dump with an explicit version=(1, 1).

Since the quotes around the nested mapping's scalar values are unnecessary they are not emitted on writing out.


If you need to do this with PyYAML, rewrite the (global) rules it uses for boolean recognition:

import  yaml
from yaml.resolver import Resolver
import re

yaml_str = """\
off:
    yes: "Flavor text for yes"  # quotes around value dropped
    no: "Flavor text for no"
"""

# remove resolver entries for On/Off/Yes/No
for ch in "OoYyNn":
    if len(Resolver.yaml_implicit_resolvers[ch]) == 1:
        del Resolver.yaml_implicit_resolvers[ch]
    else:
        Resolver.yaml_implicit_resolvers[ch] = [x for x in
                Resolver.yaml_implicit_resolvers[ch] if x[0] != 'tag:yaml.org,2002:bool']

data = yaml.load(yaml_str)
print(data)
assert 'off' in data
print(yaml.dump(data))

Which gives:

{'off': {'yes': 'Flavor text for yes', 'no': 'Flavor text for no'}}
off: {no: Flavor text for no, yes: Flavor text for yes}

This works because PyYAML keeps a global dict (Resolver.yaml_implicit_resolvers) which maps first letters to a list of (tag, re.match_pattern) values. For for o, O, y and Y there is only one such pattern (and it can be deleted), but for n/N you can also match null/Null, so you have to delete the right pattern.

After that removal yes, no, on, Off are no longer recognised as bool, but True and False still are.