Best practices on common person fields (Name, email, address, gender etc...)

I would tend to be very suspicious of any set of universal best practices because, for most of these fields, the devil is in the details. Just because the information is relatively common doesn't mean that your application uses the data in exactly the same way that other applications use it. That means your data model may need to be slightly different.

  • First & last name: Why are you capturing the name? If you have a requirement to capture a person's full legal name (i.e. you are preparing legal documents or birth certificates), you probably want to allow more space for people to type than you would if you're just asking for a person's name so you have something to call them in your new web app.
  • Address: What are you going to do with the address? What sort of addresses are you storing? If you are storing the address of a property in the United States that you're creating a mortgage on, you likely care very much about getting a fully standardized address in which case the data model will probably want to hew very closely to whatever your address standardization tool returns. If you just want people to be able to type in an address to deliver a product, a couple lines for freeform text is probably sufficient. The length of the lines there may depend on the requirements of the downstream processes that do things like print address labels.
  • State: Assuming you can identify the valid state values, it probably makes sense to create a STATE table and create a foreign key relationship between the STATE and ADDRESS tables. But the ability to identify the valid values implies that you're limiting the set of valid addresses at least to a particular set of countries. That's fine for many sites but then you've got to do a bit of work to support a new country.
  • City: If you are dealing with data where there are potentially city-level regulations in place (i.e. where there are different sorts of tax rates that are applied based on the city), you may want to treat it much like the state and have a CITY table with the valid cities and a foreign key relationship between the CITY and ADDRESS tables. On the other hand, if you're just trying to get a product delivered and you don't much care if you have various versions of the same city in your table, letting the user free-form enter text is sufficient. Of course, if you are storing foreign keys, you'll have a fair amount of work to make sure that you have all the valid values. But there are products where the whole point is that the company has already done that work (i.e. sales tax databases).
  • Phone: What are you doing with phone numbers and why? Some applications will want to take in phone numbers in whatever format the user decides to enter them and preserve that formatting for all subsequent queries. This would be common if you are designing a personal address book where users have their own preferences for how phone numbers are stored and displayed. Other applications would want to ignore the formatting that is entered, extract only the numeric characters, and then format the data on retrieval so that all phone numbers have similar formatting. If you're catering to businesses, you may want a separate field for users to enter an extension. If you are trying to support an outbound calling process, you may want to store the area code and country code in separate columns because you'll want to make sure that you have time zone specific windows for calling people in different area codes (making a call to someone in the Eastern time zone at 10am is going to go over much better than making that same call to someone in the Pacific time zone where it is 7am).
  • Gender: For a great many applications, it's perfectly reasonable to store a gender code ('M' or 'F') in a table. On the other hand, there are cases when you may want additional options (Other, Intersex, Transgendered) or where you need to store something like the gender at birth and the current gender.

You may as well guess based on sample data and expected audience. It depends on your location.

Some notes:

Addresses:

  • I don't have a state and it irritates me when I have to pick "Outside USA"
  • UK postal counties bear no resemblance top local government regions: see "Column type and size for international country subdivisions (states, provinces, territories etc)" on SO where I broke someone's assumptions

Names:

  • Don't forget Family name, Given Name, Known As, Tribal name, Patronymic, Generation Name etc
  • See "Falsehoods Programmers Believe About Names" and then consider crying
  • See "Last Name First" and just cry
  • SO "What is the longest human name you can expect?" for a worked example

Phone number: International code, length, mobile vs house, allow mobile as only number


In addition to the great answers above, don't forget to accept unicode characters. Just because you are in the US doesn't mean that you don't want to accept foreign characters into your columns.

That said, I usually recommend 50 characters for names. 320 should be more than enough for an email address (you can check the ANSI standard to be sure). For address error on the side of caution with 255 characters. While you'll probably never need an address that big, you might if you include C/O lines and stuff like that. City should be pretty big, there are some pretty long city names out there. For state go with a child table, same with country. For Zip code don't forget about international postal codes which are longer than US zip codes. Just because you don't support international you still might be. There are lots of US citizens who live in different counties including military folks.

Don't forget that state should be optional as many countries don't have states.