Converting SQL to SOQL

The answers so far have given examples of how you could approximate your given query in SOQL, but don't really go over how the two are connected. You've been given the answer, but nobody has shown their work. I think that this may end up making you more confused than you already are.

This is a very long answer, so get comfortable.

SQL vs SOQL

I don't know if I can explain it very well, but to me, SQL feels more like I'm looking over an entire forest, whereas SOQL only allows me to look at a few specific trees at a time. In a way, I guess that SOQL is closer to an Object-Relational Mapper (ORM) like Hibernate (Java) or Doctrine (PHP) than it is to SQL itself.

In SOQL, you can't join arbitrary tables on arbitrary conditions. A query in SOQL mostly revolves around just a single object. You can bring other tables (sObjects) into the mix, but there are restrictions.

Converting your query

Let's take your example SQL, and start working with it. First, I'll assume that there is a one-to-many relationship between A and B (i.e. there can be many B records related to a single A record, or in SQL parlance there is a foreign key to A on B)

Further, I'll assume that A and B are custom objects in Salesforce, meaning that we'll be using A__c and B__c in the SOQL.

first step

Now, A__c is the base sObject/table in your query, so we start off like this

[SELECT Id FROM A__c]

Even if you leave it off of the query, the Id of the base object is implicitly queried. To my knowledge, this is the only field that is implicitly queried. Everything else needs to be explicitly queried (no wildcards are allowed in the SELECT clause).

+edit 2018-10-18:

If you have record types defined for an SObject, the recordTypeId is also implicitly queried

second step

Next, tackling the join. In the SOQL way of thinking, B__c is a child object of A__c. One parent (A__c) has many children (B__c). SOQL allows you to query up to one level down the parent-child hierarchy (any more and the volume of the datasets returned would explode. Not a good thing when you have multiple companies sharing Salesforce's server resources).

The 'foreign key' I talked about on B__c will be one of two types of relationship fields, either a Lookup or a Master-Detail (it doesn't really matter which, for our purposes). The important bit to know is that this is a single field which contains the Id of the parent A__c record. For ease, let's call the 'foreign key' A_lookup__c. There is no such thing as a composite key in SOQL.

A relationship between objects in Salesforce also has a 'relationship name', which is typically just the plural of the child object, so Bs__r in our case.

To traverse down the relationship hierarchy, you use an inner select/subquery. This is what Salesforce calls a Left Outer Join (though LEFT, OUTER, and JOIN are not keywords in SOQL).

[SELECT Id, (SELECT Id, Field2__c, Field3__c FROM Bs__r) FROM A__c]

Assuming that Field1 in your example is the foreign key, notice that we don't explicitly join on that field. Salesforce takes care of that for us when we use the relationship name.

Unfortunately, SOQL cannot compare a field to another field, so we can't take care of the a.field2 = b.field2 part of your example SQL in the 'equivalent' SOQL. That type of filtering needs to be done in apex, outside of a query.

third step

The first part of the WHERE clause maps easily between SQL and SOQL. It ends up just being copy/paste. Your equivalent query is now

[SELECT Id, (SELECT Id, Field2__c, Field3__c FROM Bs__r) FROM A__c WHERE Field3__c < 100]

Because SOQL only allows one Object/table to appear in the FROM clause, there's no need to prefix the field with the object/table name. In fact, aliasing in general is pretty rare in SOQL.

final step

The second part of your WHERE clause can't be added to the WHERE clause for the query on A__c because the field you're referencing is on B__c, not A__c (and there are potentially many B__c records, so Salesforce wouldn't know which one to choose). Instead, you'll make this part of the WHERE clause in the inner query

[SELECT Id, 
    (SELECT Id, Field2__c, Field3__c FROM Bs__r WHERE Field3__c > 100)
FROM A__c 
WHERE Field3__c < 100]

An alternate method

What I've covered so far is for left outer joins in Salesforce, aka parent-child subqueries.

The rules are a bit different if you try to structure your query the other way around, using B__c as the base object.

The biggest changes are that instead of traversing down the relationship hierarchy (from parent to child), we'll be traversing up the relationship hierarchy (from child to parent). Salesforce allows you to traverse upwards up to 5 levels in the heirarchy (so from a child, to its parent, to its grandparent, to its great-grandparent, to its great-great-grandparent, to its great-great-great-grandparent).

To do so, we prefix the field name on one of the objects up the heirarchy with the full path from the child object to the target object.

[SELECT Id, A_Lookup__c, A_Lookup__r.Field2, A_Lookup__r.Field3 FROM B__c]

Yes, it's that verbose, and yes, you'd need to use the prefix for every field that you wanted to query on A__c

If you had objects A, B, C, and D, were querying object D, and wanted a field on object A, you'd use C_Lookup__r.B_Lookup__r.A_Lookup__r.Field2

You still can't compare a field to another field, but when traversing up a relationship hierarchy, you can include parent fields in the WHERE clause of the child

[SELECT Id, A_Lookup__c, A_Lookup__r.Field2__c, A_Lookup__r.Field3__c 
FROM B__c
WHERE Field3__c > 100 AND A_Lookup__r.Field3__c < 100]

some final notes

I was tossing suffixes like __c and __r around pretty freely, but never really went over them.

__c is the suffix that Salesforce automatically adds to the end of any custom sObject or custom field (on any sObject) that you (or anyone else) adds to Salesforce. I like to think the 'c' stands for 'custom'.

If you're traversing up or down a relationship hierarchy, the __c suffix becomes __r (for 'relationship', in my mind) until you finally get to the field that you're looking for (which still ends in __c if it's a custom field).

Not all fields, nor all objects, have these suffixes in them. In that case, the objects or fields are called 'Standard' (as in, they're automatically provided by Salesforce to everyone). Account is an example of a standard sObject, and Id is an example of a standard field.

querying standard fields just uses the field name

[SELECT Id, CreatedDate, OwnerId FROM Account]

OwnerId is an example of a standard relationship field (in most cases it points to a User).

Querying up a standard relationship, in most cases, requires that you simply drop the 'Id' from the standard relationship field

[SELECT Id, CreatedDate, Owner.Name FROM Account]

Querying down a standard relationship doesn't require __r, just the relationship name.

[SELECT Id, Name, (SELECT Id, Name FROM Accounts) FROM User]

I think it can be considered as Left Outer Join and SOQL will look like this

SELECT Field1, Field2, 
    (SELECT Field3 FROM table2 WHERE Field3 > 100) 
FROM table1
WHERE Field3 < 100

One more important point, (SELECT Field3 FROM table2 WHERE Field3 > 100) in the query part, table2 will be relationship which will be determined by table2__r

For more information refer A Deeper look at SOQL and Relationship Queries on Force.com


In SOQL, you can do this type of thing as a nested query. For the example above, I'll use Contact and Account as table1 and table 2

Select Id,Name, (Select ID,FName,LName from Contacts where MailingCity='Albany') From Account Where BillingState='NY'

This will return a list of Accounts in NY, and each Account record will have a sublist for Contacts in Albany that are linked to the Account.