Best design to reference multiple tables from single column?

I personally don't like to use a multi-table schema for this purpose.

  • It's hard to ensure integrity.
  • It's hard to maintain.
  • It's difficult to filter results.

I've set a dbfiddle sample.

My proposed table schema:

CREATE TABLE #Brands
(
BrandId int NOT NULL PRIMARY KEY,
BrandName nvarchar(100) NOT NULL 
);

CREATE TABLE #Clothes
(
ClothesId int NOT NULL PRIMARY KEY,
ClothesName nvarchar(100) NOT NULL 
);

-- Lookup table for known attributes
--
CREATE TABLE #Attributes
(
AttrId int NOT NULL PRIMARY KEY,
AttrName nvarchar(100) NOT NULL 
);

-- holds common propeties, url, price, etc.
--
CREATE TABLE #BrandsClothes
(
BrandId int NOT NULL REFERENCES #Brands(BrandId),
ClothesId int NOT NULL REFERENCES #Clothes(ClothesId),
VievingUrl nvarchar(300) NOT NULL,
Price money NOT NULL,
PRIMARY KEY CLUSTERED (BrandId, ClothesId),
INDEX IX_BrandsClothes NONCLUSTERED (ClothesId, BrandId)
);

-- holds specific and unlimited attributes 
--
CREATE TABLE #BCAttributes
(
BrandId int NOT NULL REFERENCES #Brands(BrandId),
ClothesId int NOT NULL REFERENCES #Clothes(ClothesId),
AttrId int NOT NULL REFERENCES #Attributes(AttrId),
AttrValue nvarchar(300) NOT NULL,
PRIMARY KEY CLUSTERED (BrandId, ClothesId, AttrId),
INDEX IX_BCAttributes NONCLUSTERED (ClothesId, BrandId, AttrId)
);

Let me insert some data:

INSERT INTO #Brands VALUES 
(1, 'Brand1'), (2, 'Brand2');

INSERT INTO #Clothes VALUES 
(1, 'Pants'), (2, 'T-Shirt');

INSERT INTO #Attributes VALUES
(1, 'Color'), (2, 'Size'), (3, 'Shape'), (4, 'Provider'), (0, 'Custom');

INSERT INTO #BrandsClothes VALUES
(1, 1, 'http://mysite.com?B=1&C=1', 123.99),
(1, 2, 'http://mysite.com?B=1&C=2', 110.99),
(2, 1, 'http://mysite.com?B=2&C=1', 75.99),
(2, 2, 'http://mysite.com?B=2&C=2', 85.99);

INSERT INTO #BCAttributes VALUES
(1, 1, 1, 'Blue, Red, White'),
(1, 1, 2, '32, 33, 34'),
(1, 2, 1, 'Pearl, Black widow'),
(1, 2, 2, 'M, L, XL'),
(2, 1, 4, 'Levis, G-Star, Armani'),
(2, 1, 3, 'Slim fit, Regular fit, Custom fit'),
(2, 2, 4, 'G-Star, Armani'),
(2, 2, 3, 'Slim fit, Regular fit'),
(2, 2, 0, '15% Discount');

If you need to fetch common attributes:

SELECT     b.BrandName, c.ClothesName, bc.VievingUrl, bc.Price
FROM       #BrandsClothes bc
INNER JOIN #Brands b
ON         b.BrandId = bc.BrandId
INNER JOIN #Clothes c
ON         c.ClothesId = bc.ClothesId
ORDER BY   bc.BrandId, bc.ClothesId;

BrandName   ClothesName   VievingUrl                  Price
---------   -----------   -------------------------   ------
Brand1      Pants         http://mysite.com?B=1&C=1   123.99
Brand1      T-Shirt       http://mysite.com?B=1&C=2   110.99
Brand2      Pants         http://mysite.com?B=2&C=1    75.99
Brand2      T-Shirt       http://mysite.com?B=2&C=2    85.99

Or you can easily get Clothes by Brand:

Give me all clothes of Brand2

SELECT     c.ClothesName, b.BrandName, a.AttrName, bca.AttrValue
FROM       #BCAttributes bca
INNER JOIN #BrandsClothes bc
ON         bc.BrandId = bca.BrandId
AND        bc.ClothesId = bca.ClothesId
INNER JOIN #Brands b
ON         b.BrandId = bc.BrandId
INNER JOIN #Clothes c
ON         c.ClothesId = bc.ClothesId
INNER JOIN #Attributes a
ON         a.AttrId = bca.AttrId
WHERE      bca.ClothesId = 2
ORDER BY   bca.ClothesId, bca.BrandId, bca.AttrId;

ClothesName   BrandName   AttrName   AttrValue
-----------   ---------   --------   ---------------------
T-Shirt       Brand1      Color      Pearl, Black widow
T-Shirt       Brand1      Size       M, L, XL
T-Shirt       Brand2      Custom     15% Discount
T-Shirt       Brand2      Shape      Slim fit, Regular fit
T-Shirt       Brand2      Provider   G-Star, Armani

But for me, one of the best of this schema is that you can filter by Attibutes:

Give me all Clothes that has the attribute: Size

SELECT     c.ClothesName, b.BrandName, a.AttrName, bca.AttrValue
FROM       #BCAttributes bca
INNER JOIN #BrandsClothes bc
ON         bc.BrandId = bca.BrandId
AND        bc.ClothesId = bca.ClothesId
INNER JOIN #Brands b
ON         b.BrandId = bc.BrandId
INNER JOIN #Clothes c
ON         c.ClothesId = bc.ClothesId
INNER JOIN #Attributes a
ON         a.AttrId = bca.AttrId
WHERE      bca.AttrId = 2
ORDER BY   bca.ClothesId, bca.BrandId, bca.AttrId;

ClothesName   BrandName   AttrName   AttrValue
-----------   ---------   --------   ----------
Pants         Brand1      Size       32, 33, 34
T-Shirt       Brand1      Size       M, L, XL

Using a multi-table schema whatever of the previous queries will require to deal with an unlimited number of tables, or with XML or JSON fields.

Another option with this schema, is that you can define templates, for example, you could add a new table BrandAttrTemplates. Every time you add a new record you can use a trigger or a SP to generate a set of a predefined attributes for this Branch.

I'm sorry, I'd like to extend my explanations by I think it is more clear than my English.

Update

My current answer should works on no matter which RDBMS. According to your comments, if you need to filter attributes values I'd suggest small changes.

As far as MS-Sql doesn't allow arrays, I've set up a new sample mantaining same table schema, but changing AttrValue to an ARRAY field type.

In fact, using POSTGRES, you can take advantatge of this array using a GIN index.

(Let me say that @EvanCarrol has a good knowledge about Postgres, certainly better than me. But let me add my bit.)

CREATE TABLE BCAttributes
(
BrandId int NOT NULL REFERENCES Brands(BrandId),
ClothesId int NOT NULL REFERENCES Clothes(ClothesId),
AttrId int NOT NULL REFERENCES Attrib(AttrId),
AttrValue text[],
PRIMARY KEY (BrandId, ClothesId, AttrId)
);

CREATE INDEX ix_attributes on BCAttributes(ClothesId, BrandId, AttrId);
CREATE INDEX ix_gin_attributes on BCAttributes using GIN (AttrValue);


INSERT INTO BCAttributes VALUES
(1, 1, 1, '{Blue, Red, White}'),
(1, 1, 2, '{32, 33, 34}'),
(1, 2, 1, '{Pearl, Black widow}'),
(1, 2, 2, '{M, L, XL}'),
(2, 1, 4, '{Levis, G-Star, Armani}'),
(2, 1, 3, '{Slim fit, Regular fit, Custom fit}'),
(2, 2, 4, '{G-Star, Armani}'),
(2, 2, 3, '{Slim fit, Regular fit}'),
(2, 2, 0, '{15% Discount}');

Now, you can additionally query using individual attributes values like:

Give me a list of all pants Size:33

AttribId = 2 AND ARRAY['33'] && bca.AttrValue

SELECT     c.ClothesName, b.BrandName, a.AttrName, array_to_string(bca.AttrValue, ', ')
FROM       BCAttributes bca
INNER JOIN BrandsClothes bc
ON         bc.BrandId = bca.BrandId
AND        bc.ClothesId = bca.ClothesId
INNER JOIN Brands b
ON         b.BrandId = bc.BrandId
INNER JOIN Clothes c
ON         c.ClothesId = bc.ClothesId
INNER JOIN Attrib a
ON         a.AttrId = bca.AttrId
WHERE      bca.AttrId = 2
AND        ARRAY['33'] && bca.AttrValue
ORDER BY   bca.ClothesId, bca.BrandId, bca.AttrId;

This is the result:

clothes name | brand name | attribute | values 
------------- ------------ ----------  ---------------- 
Pants          Brand1       Size        32, 33, 34

What you are describing is, at least in part, a product catalog. You have several attributes which are common to all products. These belong in a well normalized table.

Beyond that, you have a series of attributes which are brand specific (and I expect could be product specific). What does your system need to do with these specific attributes? Do you have business logic that depends on the schema of these attributes or are you just listing them in a series of "label":"value" pairs?

Other answers are suggesting using what is essentially a CSV approach (whether this is JSON or ARRAY or otherwise) - These approaches forego regular relational schema handling by moving the schema out of metadata and into the data itself.

There is a portable design pattern for this which fits relational databases very well. It is EAV (entity-attribute-value). I'm sure you've read in many, many places that "EAV is Evil" (and it is). However, there is one particular application where the problems with EAV are not important, and that is product attribute catalogs.

All of the usual arguments against EAV don't apply to a product feature catalog, since product feature values are generally only regurgitated into a list or worst case into a comparison table.

Using a JSON column type takes your ability to enforce any data constraints out of the database and forces it into your application logic. Also, using one attributes table for every brand has the following disadvantages:

  • It doesn't scale well if you eventually have hundreds of brands (or more).
  • If you change the allowable attributes on a brand you have to change a table definition instead of just adding or removing rows in a brand field control table.
  • You may still end up with sparsely populated tables if the brand has many potential features, only a small subset of which are known.

It is not especially difficult to retrieve data about a product with brand-specific features. It is arguably easier to create a dynamic SQL using the EAV model than it would be using the table-per-category model. In table-per-category, you need reflection (or your JSON) to find out what the feature column names are. Then you can build a list of items for a where clause. In the EAV model, the WHERE X AND Y AND Z becomes INNER JOIN X INNER JOIN Y INNER JOIN Z, so the query is a little more complicated, but the logic to build the query is still totally table-driven and it will be more than scalable enough if you have the proper indexes built.

There are a lot of reasons not to use EAV as a general approach. Those reasons don't apply to a product feature catalog so there is nothing wrong with EAV in this specific application.

To be sure, this is a short answer for a complex and controversial topic. I have answered similar questions before and gone into more detail about the general aversion to EAV. For example:

  • If EAV is evil, what to use for dynamic values?
  • Entity-Attribute-Value Table Design

I would say EAV is used less often lately than it used to be, for mostly good reasons. However, I think it is also not well understood.


Here's my problem: different brands of clothing require differing information. What is the best practice for dealing with a problem like this?

Using JSON and PostgreSQL

I think you're making this harder than it needs to be and you'll get bitten with it later. You don't need Entity–attribute–value model unless you actually need EAV.

CREATE TABLE brands (
  brand_id     serial PRIMARY KEY,
  brand_name   text,
  attributes   jsonb
);
CREATE TABLE clothes (
  clothes_id   serial        PRIMARY KEY,
  brand_id     int           NOT NULL REFERENCES brands,
  clothes_name text          NOT NULL,
  color        text,
  price        numeric(5,2)  NOT NULL
);

There is absolutely nothing wrong with this schema.

INSERT INTO brands (brand_name, attributes)
VALUES
  ( 'Gucci', $${"luxury": true, "products": ["purses", "tawdry bougie thing"]}$$ ),
  ( 'Hugo Boss', $${"origin": "Germany", "known_for": "Designing uniforms"}$$ ),
  ( 'Louis Vuitton', $${"origin": "France", "known_for": "Designer Purses"}$$ ),
  ( 'Coco Chanel', $${"known_for": "Spying", "smells_like": "Banana", "luxury": true}$$ )
;

INSERT INTO clothes (brand_id, clothes_name, color, price) VALUES
  ( 1, 'Purse', 'orange', 100 ),
  ( 2, 'Underwear', 'Gray', 10 ),
  ( 2, 'Boxers', 'Gray', 10 ),
  ( 3, 'Purse with Roman Numbers', 'Brown', 10 ),
  ( 4, 'Spray', 'Clear', 100 )
;

Now you can query it using a simple join

SELECT *
FROM brands
JOIN clothes
  USING (brand_id);

And any of the JSON operators work in a where clause.

SELECT *
FROM brands
JOIN clothes
  USING (brand_id)
WHERE attributes->>'known_for' ILIKE '%Design%';

As a side note, don't put the urls in the database. They change over time. Simply create a function that takes them.

generate_url_brand( brand_id );
generate_url_clothes( clothes_id );

or whatever. If you're using PostgreSQL you can even use hashids.

Also of special note, jsonb is stored as binary (thus the -'b') and it is also index-able, or SARGable or whatever else the cool kids are calling it these days: CREATE INDEX ON brands USING gin ( attributes );

The difference here is in the simplicity of the query..

Give me all clothes of Brand2

SELECT * FROM clothes WHERE brand_id = 2;

Give me all Clothes that has the attribute: Size

SELECT * FROM clothes WHERE attributes ? 'size';

How about a different one..

Give me all clothes and attributes for any clothes available in large.

SELECT * FROM clothes WHERE attributes->>'size' = 'large';