Datamigration Salesforce to Salesforce with TalenD - Best Practice for Self-Referreces

For objects with self-reference (aka "Hierarchy") fields, the best method is to load in two steps:

  1. Load all records but leaving out the ParentId field (remove it from Talend schema).
  2. Update all records to set ParentId values (map only Id or Ext-Id, and the ParentId).

For 2 objects with cyclical references, the approach is similar:

  1. Load all records of Object-A, but leaving out the Lookup-B field.
  2. Load all records of Object-B, including its Lookup-A field.
  3. Update all records of Object-A to set Lookup-B values.

Note: Your suggestion also works, but only for 2-level hierarchies. You assume that parent records have no ParentId, but in a 3-level hierarchy, the "middle" record is both a parent and a child, so the "where ParentId = null" is not sufficient. The first method I recommended above will work for any # of levels.

An alternative method takes only 1 loading step, but I don't like it because you can't use BulkAPI w/parallel, sorting gets hard for 3+ levels, and you are at mercy of Salesforce honoring the order. But FYI, this method is: 1) Sort the source data so parent records appear first, then the children. 2) Load all data, including the ParentId. Using standard API (not Bulk/Parallel) the data normally loads in same order it appears in your source file, so the parents get created before the children that reference them.

Your other questions belong in a separate question post. But regarding how to ensure referred-to records are included/added in partial-data migrations, I always approach these scenarios by first building a list of distinct Id's (External-Id's) that are needed, and then loading records whose Id is in that list (using Inner Joins). For example, say you want to migrate a sampling of Accounts, only those whose name begins with 'A', plus ensuring all of their parent Accounts (regardless of Name) are also included. I would create a list of all account Id's that begin with 'A', then append to that list all the ParentId's of accounts that begin with 'A' but which are not already in the list. Finally, when loading the Accounts, I would inner-join Accounts to that list. The overhead of building the list seems high for one object, but it pays off because the very same list can be inner-joined to drive which Contacts, Opportunities, Cases, etc you load. Also note, this may be time consuming to build in Talend, but it is super easy in a database with some SQL.