How can pattern matching be done on text?

If you want to pattern match on the head of a charlist, there's one slight difference you need to make in your second code snippet.

'a' is actually a charlist with one element, so comparing with the head of a charlist will always be false. A charlist is really a list of integer values:

iex> 'abcd' == [97, 98, 99, 100]
true

The char a equates to integer 97. You can get the integer code of a character in Elixir by preceding it with a ?, so:

iex> ?a == 97
true
iex> ?a == hd('a')
true

So in your guard clause, you'll want to match head == ?a, or more simply:

defmodule MatchStick do
    def doMatch([?a | _tail]), do: 1
    def doMatch(_), do: 0
end

In Elixir, single quoted strings are quite different from double quoted strings. Single quoted strings are basically lists of integers, where each integer represents a character. Therefore, they are also called character lists. They are mainly used for compatibility with Erlang, because that's how Erlang strings work. You can use single quoted strings just like you would use lists:

iex> hd('a')
97

iex> [97 | rest] = 'abcd'
'abcd'
iex> rest
'bcd'

iex> 'ab' ++ rest = 'abcd'
'abcd'
iex> rest
'cd'

The match function for single quoted strings would look like this:

def match('a' ++ rest), do: 1
def match(_), do: 0

Elixir will hide the list from you and display it as a string, when all of the integers represent valid characters. To trick Elixir into showing you the internal representation of a character list, you can insert a 0, which is an invalid character:

iex> string = 'abcd'
'abcd'
iex> string ++ [0]
[97, 98, 99, 100, 0]

However, one would typically use double quoted strings in Elixir, because these handle UTF-8 correctly, are much easier to work with and are used by all internal Elixir modules (for example the useful String module). Double quoted strings are binaries, so you can treat them as any other binary type:

iex> <<97, 98, 99, 100>>
"abcd"
iex> <<1256 :: utf8>>
"Ө"

iex> <<97>> <> rest = "abcd"
"abcd"
iex> rest
"bcd"

iex> "ab" <> rest = "abcd"
"abcd"
iex> rest
"cd"

The match function for double quoted strings would look like this:

def match("a" <> rest), do: 1
def match(_), do: 0

Elixir will hide the internal representation of binary strings as well. To reveal it, you can again insert a 0:

iex> string = "abcd"
"abcd"
iex> string <> <<0>>
<<97, 98, 99, 100, 0>>

Lastly, to convert between single quoted strings and double quoted strings you can use the functions to_string and to_charlist:

iex> to_string('abcd')
"abcd"
iex> to_charlist("abcd")
'abcd'

To detect them, you can use is_list and is_binary. These also work in guard clauses.

iex> is_list('abcd')
true
iex> is_binary('abcd')
false
iex> is_list("abcd")
false
iex> is_binary("abcd")
true

For example, to make the double quoted version compatible with single quoted strings:

def match(str) when is_list(str), do: match(to_string(str))
def match("a" <> rest), do: 1
def match(_), do: 0

Just in case someone needed. If you need to match on the part of the string that is in the known middle and you aware of its length then you can use binary matching:

iex(1)> <<"https://", locale::binary-size(2), ".wikipedia.com" >> = "https://en.wikipedia.com" 
"https://en.wikipedia.com"
iex(2)> locale
"en"

defmodule MatchStick do
  def doMatch("a" <> rest) do 1 end
  def doMatch(_) do 0 end
end

You need to use the string concatenation operator seen here

Example:

iex> "he" <> rest = "hello"
"hello"
iex> rest
"llo"

Tags:

Elixir