Where to start to understand a unknown database

Your best bet to start is to document your database using SQL Power Doc

SQL Server & Windows Documentation Using Windows PowerShell

SQL Power Doc is a collection of Windows PowerShell scripts and modules that discover, document, and diagnose SQL Server instances and their underlying Windows OS & machine configurations. SQL Power Doc works with all versions of SQL Server from SQL Server 2000 through 2014, and all versions of Windows Server and consumer Windows Operating Systems from Windows 2000 and Windows XP through Windows Server 2012 R2 and Windows 8. SQL Power Doc is also capable of documenting Windows Azure SQL Databases.

Note: I have used it and it will give you a really good start in documenting and understanding your database server instance.


Three very quick steps to get you started:

1)

USE DatabaseName

SELECT    [TableName] = OBJECT_NAME(object_id),
last_user_update, last_user_seek, last_user_scan, last_user_lookup
FROM    sys.dm_db_index_usage_stats
WHERE    database_id = DB_ID('DatabaseName')

Will tell you the last time each index was used, including the clustered index. So at least give you a flavor for which tables are being accessed (and which aren't.)

2) Turn on an Extended Events session (or server-side Profiler trace if you're running pre-SQL 2012) for an hour or so while the app is being used. You can also ask a user to perform various actions in the application in a specific order so you can correlate it with the trace / session.

A helpful suggestion: if you can modify the connection string the app uses at all, append ";Application Name=AppNameGoesHere" so you can run a trace filtering on that particular Application Name. Good practice anyway.

3) Get a version of the application working on a non-production server. Develop a list of behavioral-driven tests for the application ("When the user clicks the New Item button, it creates a new item for that user," etc.) Begin soft deleting objects you feel have no bearing on the tests by renaming them (I use a format like objectName_DEPRECATED_YYYYMMDD - with the date being the day I plan to actually delete it.) Reverify all of your tests.

Through a combination of the Extended Events session, the index usage DMV, and your soft deleting, you should be able to identify the main objects being used by the application and a good general consensus on which object does what.

Good luck!


Since I was once in a similar situation, i can tell you that this will be a hard to impossible job. I only had sourcecode (>100k lines of code), the running service, the running database (~50 tables) and no documentation and no one to ask about it except a user of this application and a copy of the database and services running in a test environment (which was a few version numbers ahead but without sourcecode). Another requirement was that services had to run 24/7 because they were external to the customers. The situation arised because most staff left at about the same time including the developers and the documentation vanished in the chaos. It took me more than 6 months to get a rough overview/documentation. There were many tables and functions which had no effect because they were for future use or never fully implementated, faulty or deprecated or unreleased features. After the 6 months I had to rewrite the documentation because I discovered new things or relationships between things and I had wrong assumptions before.

Why I am telling this? Because sometimes in such a situation it is easier and cheaper to start from scratch and write a new application fullfilling the requirements of the old one (or new ones if they changed over time or you want a new major release). Or to tell you what you will have to expect.

If you really want to reverse engineer it, I would recommend the following steps:

  • Make a backup of the whole system! (First: You will never know when you will need it. Second: You need it for the next step)
  • recreate a copy of the system (services and database) to work with and write down how to create it because you will surely have to do this multiple times in the next months because you will mess it up multiple times while reverse engineering
  • create a ER-diagram with the dependencies between the tables
  • view and document the dependencies of each table, stored procedure, ... because these are mostly not included in ER-diagrams
  • understand what the software should do by asking users and using it itself (best do it on the test system)
  • If sourcecode of the services is available: get an overview of it and calls to the DB and document it (doxygen is good tool for getting a rough documentation with function call hierarchies)
  • try to get a rough overview of the DB by looking at the tablenames and their columns
  • watch the database while using it
  • with the previous 4 steps divide the tables in 3 categories (might differ for you depending on your application): static data (data which doesnt change while running the server like serverconfiguration, enums to restrict valid values in other tables by using foreign keys to it, ...), configuration data (data which rarely changes like user settings, ...) and OLTP data (user messages in chat server, posts in a forum, meassurement values in a machine control system, battles in an online game, ...)
  • repeat previous 5 steps until you are satisfied or you give up
  • Document and code as if the guy who ends up maintaining YOUR code/system/database will be a violent psychopath who knows where you live.

Wish you good luck ;)