Best data modelling approach to handle redundant foreign keys in a database about Surveys, Questions and Responses

As per my understanding of your specifications, your business environment involves a conceptual-level ternary relationship. In this regard, you need to define:

  1. the relationship (or association) type between the entity types Person and Survey;
  2. the relationship type between Survey and Question;
  3. the relationship type that establishes the connection between the two aforesaid relationship types and, as a consequence, between Person, Survey and Question, i.e., Response (a shorter name that simplifies interpretation, from my point of view).

So, I consider that you are on the right track with your Approach 1, although it requires some small (yet important) refinements in order to make it more accurate. I will detail such refinements and other relevant considerations in the following sections.

Business rules

Let us expand the applicable business rules a bit and reformulate them in the following way:

  • A Person registers in zero-one-or-many Surveys
  • A Survey gets the registration of zero-one-or-many Persons
  • A Survey is integrated by one-to-many Questions
  • A Question integrates zero-one-or-many Surveys
  • A Question receives zero-one-or-many Responses
  • A Response is provided by exactly-one Person in the context of exactly-one Survey

Expository IDEF1X diagram

Then, I have created the IDEF1Xa diagram that is presented in Figure 1, which synthesizes the business rules formulated above:

Fig.1 Simplified Survey IDEF1X


a Integration Definition for Information Modeling (IDEF1X) is a highly recommendable modeling technique that was established as a standard in December 1993 by the United States National Institute of Standards and Technology (NIST). It is solidly based on theoretical work authored by the sole founder of the relational model, i.e., Dr. E. F. Codd and also on the entity-relationship view developed by Dr. P. P. Chen.


The PersonSurvey relationship

As I see it, the PersonSurvey relationship is required to provide a means of authorization so that a Person can take part in a given Survey. In this way, once a certain Person has been registered in a specific Survey, he or she is authorized to provide Responses to the Questions that integrate the respective Survey.

The SurveyQuestion relationship

I assume that the property (or attribute) called suvery_question.question_number in your diagram is used to represent the Order of presentation of a given Question instance with respect to a particular Survey. As you can see, I have denoted such property as SurveyQuestion.PresentationOrder, and I think that you should prevent that (i) two or more Question.QuestionNumber values share (ii) the same PresentationOrder value in (iii) the same SurveyQuestion occurrence.

To portray that need, I have included a composite ALTERNATE KEY (AK) in the box representing this entity type, which is comprised of the combination of properties (SurveyNumber, QuestionNumber, PresentationOrder). As you are well aware, a composite AK can be declared in a logical DDL design with the aid of a multi-column UNIQUE constraint (as I exemplified in the SurveyQuestion table that is part of the expository DDL layout expounded a few sections below).

The Response entity type

Yes, with the Response entity type I am depicting a relationship between two other relationships; it may seem awkward at first glance but there is nothing wrong with this approach, as long as it (a) represents the features of the business context of interest accurately and (b) is represented properly in a logical-level layout.

Yes, you are totally correct, it would be an error to portray that part of the scenario at the logical level of abstraction by means of two Response.SurveyNumber (or, say, Response.SurveyId) values referenced from two different columns in the same Response row.

Derived logical SQL-DDL layout

-- You should determine which are the most fitting 
-- data types and sizes for all your table columns 
-- depending on your business context characteristics.

-- As one would expect, you are free to make use of 
-- your preferred (or required) naming conventions.

CREATE TABLE Person (
    PersonId        INT      NOT NULL,
    FirstName       CHAR(30) NOT NULL,
    LastName        CHAR(30) NOT NULL,
    GenderCode      CHAR(3)  NOT NULL,
    BirthDate       DATE     NOT NULL,
    CreatedDateTime DATETIME NOT NULL,
    --
    CONSTRAINT Person_PK PRIMARY KEY (PersonId),
    CONSTRAINT Person_AK UNIQUE      (
        FirstName,
        LastName,
        GenderCode,
        BirthDate
    )
);

CREATE TABLE Survey (
    SurveyNumber    INT       NOT NULL,
    Description     CHAR(255) NOT NULL,
    CreatedDateTime DATETIME  NOT NULL,
    --
    CONSTRAINT Survey_PK PRIMARY KEY (SurveyNumber),
    CONSTRAINT Survey_AK UNIQUE      (Description)
);

CREATE TABLE PersonSurvey (
    PersonId           INT      NOT NULL,
    SurveyNumber       INT      NOT NULL,
    RegisteredDateTime DATETIME NOT NULL,
    --
    CONSTRAINT PersonSurvey_PK         PRIMARY KEY (PersonId, SurveyNumber),
    CONSTRAINT PersonSurveyToPerson_FK FOREIGN KEY (PersonId)
        REFERENCES Person (PersonId),
    CONSTRAINT PersonSurveyToSurvey_FK FOREIGN KEY (SurveyNumber)
        REFERENCES Survey (SurveyNumber)
);

CREATE TABLE Question (
    QuestionNumber  INT       NOT NULL,
    Wording         CHAR(255) NOT NULL,
    CreatedDateTime DATETIME  NOT NULL,
    --
    CONSTRAINT Question_PK PRIMARY KEY (QuestionNumber),
    CONSTRAINT Question_AK UNIQUE      (Wording)
);

CREATE TABLE SurveyQuestion (
    SurveyNumber       INT      NOT NULL,
    QuestionNumber     INT      NOT NULL,
    PresentationOrder  TINYINT  NOT NULL,
    IsMandatory        BIT      NOT NULL,
    IntegratedDateTime DATETIME NOT NULL,
    --
    CONSTRAINT SurveyQuestion_PK PRIMARY KEY (SurveyNumber, QuestionNumber),
    CONSTRAINT SurveyQuestion_AK UNIQUE      (
        QuestionNumber,
        SurveyNumber,
        PresentationOrder
    ),
    CONSTRAINT SurveyQuestionToSurvey_FK   FOREIGN KEY (SurveyNumber)
        REFERENCES Survey   (SurveyNumber),
    CONSTRAINT SurveyQuestionToQuestion_FK FOREIGN KEY (QuestionNumber)
        REFERENCES Question (QuestionNumber)
);

CREATE TABLE Response (
    SurveyNumber     INT      NOT NULL,
    QuestionNumber   INT      NOT NULL,
    PersonId         INT      NOT NULL,
    Content          TEXT     NOT NULL,
    ProvidedDateTime DATETIME NOT NULL,
    --
    CONSTRAINT Response_PK                 PRIMARY KEY (SurveyNumber, QuestionNumber, PersonId),
    CONSTRAINT ResponseToPersonSurvey_FK   FOREIGN KEY (PersonId, SurveyNumber)
        REFERENCES PersonSurvey   (PersonId, SurveyNumber),
    CONSTRAINT ResponseToSurveyQuestion_FK FOREIGN KEY (SurveyNumber, QuestionNumber)
        REFERENCES SurveyQuestion (SurveyNumber, QuestionNumber)
);

Two composite FOREIGN KEYs in the Response table

This is, probably, the most important point to discuss: the references made from a given Response row to

  1. SurveyQuestion.SurveyNumber, and
  2. SurveyPerson.SurveyNumber

must have matching values. As far as I am concerned, the best option to enforce this condition in a declarative way is by making use of two composite FOREIGN KEYs (FKs).

As shown in the DDL design, the first FK is making a reference to the PersonSurvey table PRIMARY KEY (PK), i.e., (PersonId, SurveyNumber), and is conformed by the columns Response.PersonId and Response.SurveyNumber.

The second FK is pointing to the SurveyQuestion table PK, i.e., (SurveyNumber, QuestionNumber), and is, accordingly, made up of the columns Response.SurveyNumber and Response.QuestionNumber.

In this way, the Response.SurveyNumber column is quite instrumental since it is used as part of a FK reference in two different constraints.

With this method, one ensures database management system-guaranteed referential integrity from

  • (a) Response to the PersonSurvey;
  • (b) Response to the SurveyQuestion; and
  • (c) each of the tables representing an associative entity type to the tables standing for independent entity types, namely Person, Survey and Question.

Derived data to avoid update anomalies

I have noticed in your diagram two elements that I esteem are worth mentioning. These elements are related to two PersonSurvey columns that can (should) be derived.

In that regard, you can derive the PersonSurvey.IsStarted datum by querying if a given Person occurrence has provided one or more Responses to Questions that integrate an exact Survey via the SurveyQuestion table.

And you can also obtain the PersonSurvey.IsCompleted data point by determining if a given Person instance has supplied a Response to all the Questions that cointain a value of 'TRUE' in the IsMandatory column in a specific SurveyQuestion row.

By way of the derivation of these values, you are preventing some update anomalies that would have eventually arisen in case you had kept such values in the SurveyQuestion column.

Important consideration

As @Dave rightly points out in his comment, if you face a future requirement demanding the management of different kinds of responses that imply managing dates, numeric values, multiple choice, and another possible aspects, you will have to extend this database layout.