In Ruby, how to be warned of duplicate keys in hashes when loading a YAML document?

Using Psych, you can traverse the AST tree to find duplicate keys. I'm using the following helper method in my test suite to validate that there are no duplicate keys in my i18n translations:

def duplicate_keys(file_or_content)
  yaml = file_or_content.is_a?(File) ? file_or_content.read : file_or_content
  duplicate_keys = []

  validator = ->(node, parent_path) do
    if node.is_a?(Psych::Nodes::Mapping)
      children = node.children.each_slice(2) # In a Mapping, every other child is the key node, the other is the value node.
      duplicates = children.map { |key_node, _value_node| key_node }.group_by(&:value).select { |_value, nodes| nodes.size > 1 }

      duplicates.each do |key, nodes|
        duplicate_key = {
          file: (file_or_content.path if file_or_content.is_a?(File)),
          key: parent_path + [key],
          occurrences: nodes.map { |occurrence| "line: #{occurrence.start_line + 1}" },
        }.compact

        duplicate_keys << duplicate_key
      end

      children.each { |key_node, value_node| validator.call(value_node, parent_path + [key_node.try(:value)].compact) }
    else
      node.children.to_a.each { |child| validator.call(child, parent_path) }
    end
  end

  ast = Psych.parse_stream(yaml)
  validator.call(ast, [])

  duplicate_keys
end

One of the things I do to help maintain the YAML files I use, is write code to initially generate it from a known structure in Ruby. That gets me started.

Then, I'll write a little snippet that loads it and outputs what it parsed using either PrettyPrint or Awesome Print so I can compare that to the file.

I also sort the fields as necessary to make it easy to look for duplicates.


There is a solution involving a linter, but I'm not sure it will be relevant to you since it's not a 100% Ruby solution. I'll post it anyway since I don't know any way to do this in Ruby:

You can use the yamllint command-line tool:

sudo pip install yamllint

Specifically, it has a rule key-duplicates that detects duplicated keys:

$ cat test.yml
{ one: 1, one: 2 }

$ yamllint test.yml
test.yml
  1:11      error    duplication of key "one" in mapping  (key-duplicates)

Tags:

Ruby

Yaml