Possible benefits of storing multiple values in one field of one row instead of as separate rows

To start with, the current Question title referring to "storing data as string instead of columns" is a little confusing. When speaking of storing data as strings instead of something else, that usually refers to serializing everything to a string format instead of a proper / strong datatype (e.g. INT or DATETIME). But if asking about storing data as multiple values in a single field as opposed to separate rows, that is a bit different. And to be fair, while concatenating values is most easily done with strings, it can also be done with INT and BINARY types as well, either by bit-masking or similarly reserving certain positions to have different meanings. Since the second interpretation is what is actually being asked about, based on the text of the Question, let's address that.

In a word: No. If you are storing actual data points then it will only bring pain (in terms of code and performance) as it is unnecessary complication. If it is a value that will only ever be stored as a single unit, updated as a single unit, and never disassembled within the database, then that could be ok as it is roughly analogous to storing an image or PDF. Otherwise, any attempt to parse the data will invalidate using any indexes (e.g. using LIKE '%something%', or CHARINDEX, or PATINDEX, or SUBSTRING, etc).

If you need to store separate values in a single field of a single row then there are more appropriate means of doing that: XML or JSON. These are parseable formats (XML / JSON) and XML can even be indexed. But ideally this data would be stored in properly-typed fields so that it can be truly useful.

And please do not forgot that the purpose of an RDBMS is to store data such that it can be retrieved and manipulated as efficiently as possible, within the constraints imposed by being ACID-compliant. Retrieving concatenated values is bad enough due to the need to parse the values first, and that is not indexable. But manipulating often means replacing the entire blob just to update a part of it (assuming that no pattern exists to use with a REPLACE function). The XML datatype at least allows for XML DML for simplistic updates, though those are still not as fast as a simple update of properly modeled data.

Also, given a scenario such as what is shown in the Question above, by concatenating all of the StateCodes together, you would be unable to Foreign Key (in either direction) those values.

And what if the business requirements change over time and you need to track additional properties of these items? In terms of "states", what about the capitals, or population, or a sort-order, or anything else? Stored properly as rows you can add more columns for additional properties. Sure, you can have multiple levels of parsable data, such as |StateCode,Capital,Population |StateCode,Capital,Populate|... but hopefully anyone can see the problem growing exponentially out of control. Of course, this particular issue is rather easily dealt with the XML and JSON formats, and that is their value as mentioned above. But you would still need a very good reason for using either of those as an initial means of modeling as neither will ever be as efficient as using discrete fields in separate rows.


I have actually used something like that for a very limited purpose. We created a table of headers for output files. They were specifically constructed and were mostly just the column headings but not quite. So the data looked something like

OutputType   OutputHeader
PersonalData Name|Address|City|State|Zip
JobInfo      Name|JobName|JobTitle

Essentially it looked like it was a delimited list. And in one way it was. But for our purposes it was a single long string.

That's the trick here. If you never plan on parsing the list then it's worth saving the list. If however you will or even might need to parse the list then it's worth the extra space & time to split it out and save it in separate rows.