How do long columns impact performance and disk usage?

The specific answer to your question (at least for Oracle and probably other databases) is that the length of the field doesn’t matter, only the length of the data. However, this shouldn’t be used as a determining factor concerning whether to set the field to its maximum allowable length or not. Here are some other issues you should consider before maxing out field sizes.

Formatting Any client tool that formats the data based on the size of the fields will require special formatting considerations. Oracle’s SQL*Plus for example by default displays the maximum size of Varchar2 columns even if the data is only one character long. Compare…

create table f1 (a varchar2(4000), b varchar2(4000));
create table f2 (a varchar2(5), b varchar2(5));
insert into f1 values ('a','b');
insert into f2 values ('a','b');
select * from f1;
select * from f2;

Bad Data Field length provides an additional mechanism to catch/prevent bad data. An interface shouldn’t attempt to insert 3000 characters into a 100 character field, but if that field is defined to be 4000 characters, it just might. The error woudn’t be caught at the data entry stage, but the system may have trouble further down when another application tries to process the data and chokes. As an example, if you later decide to index the field in Oracle you would exceed the maximum key length (depending on block size and concatenation). See…

create index i1 on f1(a);

Memory If the client application allocates memory using the maximum size, the application would allocate significantly more memory than is necessary. Special considerations would have to be done to avoid this.

Documentation The size of the field provides another data point of documentation about the data. We could call all tables t1, t2, t3, etc. and all fields f1, f2, f3, etc., but by specifying meaningful names we better understand the data. For example, if an address table for a company with customers in the U.S. has a field called State that is two characters we expect the two character state abbreviation to go in it. On the other hand if the field is one hundred characters we might expect the full state name to go in the field.


All that being said, it does seem prudent to be prepared for change. Just because all your product names today fit in 20 characters doesn’t mean they always will. Don’t go overboard and make it 1000, but do leave room for plausible expansion.


Here is a good starting point for you.

http://www.sqlskills.com/BLOGS/KIMBERLY/post/Disk-space-is-cheap.aspx

I may have misunderstood your original question. Let me see if I can find you a few other links for reference.

Here is good reference on data type selections: http://sqlfool.com/2009/05/performance-considerations-of-data-types/

Changing from varchar(20) to varchar(30) may seem like something small, but you need to understand more about how database structures work in order to be aware of the potential issues. For example, going to varchar(30) could push you past the tipping point of your columns (should all 30 bytes get used) being able to be stored on one page (less than 8060 bytes). This will lead to an increase in disk space used, a decrease in performance, and even some additional overhead with your transaction logs.

Here is a link for database structures: http://technet.microsoft.com/en-us/sqlserver/gg313756.aspx

Here is one for page splits and trx logging: http://sqlskills.com/BLOGS/PAUL/post/How-expensive-are-page-splits-in-terms-of-transaction-log.aspx

HTH


I thought I'd share another interesting point, which I found in a Stack Overflow question.

Original answer by: Nick Kavadias

A reason NOT to use max or text fields is that you cannot perform online index rebuilds i.e. REBUILD WITH ONLINE= ON even with SQL Server Enterprise Edition.

I would consider this to be a big disadvantage when adding n/varchar(max) columns arbitrarily, and according to the MS Site this restriction against doing online index rebuilds remains in SQL Server 2008, 2008 R2 and Denali; so it's not specific to SQL Server 2005.