SPARQL Optional query

This question is old, but the answer is still hard to understand clearly. Allow me to try in natural English with thanks to SPARQL_Order_Matters

When OPTIONALS appear at the beginning of a query, they either

  • Don't match, and nothing happens
  • Do match, and now this is the starting dataset against which the rest of the query must match

When OPTIONALS appear after some statement has already matched some data, they either

  • Don't match, and nothing happens
  • Do match, and some new triples are added to the results

So the real non-obvious behavior happens when an OPTIONAL is first, and it matches some triples. Now all query results match the contents of that OPTIONAL.


The ordering is important here

The semantics of SPARQL queries are expressed via the SPARQL algebra and the two queries here produce very different algebra. I use the SPARQL Query Validator provided by the Apache Jena project (disclaimer - I am a committer on that project) to generate the algebra.

Your first query produces the following algebra:

(base <http://example/base/>
  (prefix ((ab: <http://learningsparql.com/ns/addressbook#>))
    (project (?first ?last)
      (leftjoin
        (leftjoin
          (bgp (triple ?s ab:lastName ?last))
          (bgp (triple ?s ab:nick ?first)))
        (bgp (triple ?s ab:firstName ?first))))))

And your second query produces the following algebra:

(base <http://example/base/>
  (prefix ((ab: <http://learningsparql.com/ns/addressbook#>))
    (project (?first ?last)
      (join
        (leftjoin
          (leftjoin
            (table unit)
            (bgp (triple ?s ab:nick ?first)))
          (bgp (triple ?s ab:firstName ?first)))
        (bgp (triple ?s ab:lastName ?last))))))

As you can see the triple patterns in your query appear in different order and the operators differ. Importantly your second query has a join which only preserves compatible solutions from both sides whereas the first query uses only leftjoin which preserves LHS solutions as-is if there are no compatible solutions.

So in the first query you first find things with a ab:lastName and then optionally add the ab:nick or ab:firstName if present hence you get all the people in your data returned.

In the second query you first find things with a ab:nick and then optionally add things with a ab:firstName before requiring that everything has a ab:lastName. Therefore you can only get the person with a last name returned.

I thought the period in SPARQL query is the same as "and" operator.

No it merely terminates a triple pattern and may optionally follow other clauses (but is not required to do so), it is not an "and" operator.

Adjacent basic graph patterns are joined unless an alternative join operator (e.g. leftjoin or minus) is implied by the presence of an OPTIONAL or MINUS clause

Edit - What is table unit?

table unit is a special operator that corresponds to the empty graph pattern in a SPARQL query.

For example SELECT * WHERE { } would produce the algebra (table unit)

It produces a single empty row which in the semantics of SPARQL means it can be joined to anything and returns the other thing so in essence it acts like a join identity. In many cases a SPARQL engine can simplify the algebra to remove table unit since in most cases it has no effect on the semantics of the query.

In your first query there is technically another join between table unit and the join operator but in the case of a normal join the presence of table unit will have no effect (as it's the join identity) and so it can and is simplified out.

However with an OPTIONAL the SPARQL specification requires that the algebra produced is a left join of the thing inside the clause with whatever the preceding clause was. In the case of your second query there is no preceding clause before your first OPTIONAL (technically there is an implicit empty graph pattern there) so the first leftjoin generated has table unit on its left hand side. Unlike a normal join the table unit has to be preserved in this case because the semantics of leftjoin say that the results from the LHS are preserved if there are no compatible solutions form the RHS.

We can illustrate this with a more trivial query:

SELECT *
WHERE
{
  OPTIONAL { ?s a ?type }
}

Produces the algebra:

(base <http://example/base/>
  (leftjoin
    (table unit)
    (bgp (triple ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type))))

Tags:

Sparql