Check whether equal string literals are stored at the same address

The tacklelib C++11 library have a macro with the tmpl_string class to hold a literal string as a template class instance. The tmpl_string contains a static string with the same content which guarantees the same address for the same template class instance.

https://sourceforge.net/p/tacklelib/tacklelib/HEAD/tree/trunk/include/tacklelib/tackle/tmpl_string.hpp

Tests:

https://sourceforge.net/p/tacklelib/tacklelib/HEAD/tree/trunk/src/tests/unit/test_tmpl_string.cpp

Example:

const auto s = TACKLE_TMPL_STRING(0, "my literl string")

I've used it in another macro to conveniently and consistently extract a literal string begin/end:

#include <tacklelib/tackle/tmpl_string.hpp>
#include <tacklelib/utility/string_identity.hpp>

//...

std::vector<char> xml_arr;

xml_arr.insert(xml_arr.end(), UTILITY_LITERAL_STRING_WITH_BEGINEND_TUPLE("<?xml version='1.0' encoding='UTF-8'?>\n"));

https://sourceforge.net/p/tacklelib/tacklelib/HEAD/tree/trunk/include/tacklelib/utility/string_identity.hpp


Although C++ does not seem to allow for any way that works with string literals, there is an ugly but somewhat workable way around the problem if you don't mind rewriting your string literals as character sequences.

template <typename T, T...values>
struct static_array {
  static constexpr T array[sizeof...(values)] { values... };
};

template <typename T, T...values>
constexpr T static_array<T, values...>::array[];

template <char...values>
using str = static_array<char, values..., '\0'>;

int main() {
  return str<'a','b','c'>::array != str<'a','b','c'>::array;
}

This is required to return zero. The compiler has to ensure that even if multiple translation units instantiate str<'a','b','c'>, those definitions get merged, and you only end up with a single array.

You would need to make sure you don't mix this with string literals, though. Any string literal is guaranteed not to compare equal to any of the template instantiations' arrays.


Is there any macro, in any C++ implementation, but mainly g++ and clang, whose definition guarantees that several equal string literals are stored at the same address?

  • gcc has the -fmerge-constants option (this is not a guarantee) :

Attempt to merge identical constants (string constants and floating-point constants) across compilation units.

This option is the default for optimized compilation if the assembler and linker support it. Use -fno-merge-constants to inhibit this behavior.

Enabled at levels -O, -O2, -O3, -Os.

  • Visual Studio has String Pooling (/GF option : "Eliminate Duplicate Strings")

String pooling allows what were intended as multiple pointers to multiple buffers to be multiple pointers to a single buffer. In the following code, s and t are initialized with the same string. String pooling causes them to point to the same memory:

char *s = "This is a character buffer";
char *t = "This is a character buffer";

Note: although MSDN uses char* strings literals, const char* should be used

  • clang apparently also has the -fmerge-constants option, but I can't find much about it, except in the --help section, so I'm not sure if it really is the equivalent of the gcc's one :

Disallow merging of constants


Anyway, how string literals are stored is implementation dependent (many do store them in the read-only portion of the program).

Rather than building your library on possible implementation-dependent hacks, I can only suggest the usage of std::string instead of C-style strings : they will behave exactly as you expect.

You can construct your std::string in-place in your containers with the emplace() methods :

    std::unordered_set<std::string> my_set;
    my_set.emplace("Hello");