Featured

The Case of the Automated DBA

If you are thinking the title of this blog post sounds like an Erle Stanley Gardner book, that is where the inspiration comes from. Gardner wrote the books about Perry Mason and his cases that were the basis for the Raymond Burr television series. Sometimes as DBAs we wish we had a Perry Mason type figure to defend us, but we rarely ever get that.

In this story we’ll pretend that we have a defense attorney, because the prosecution is making a to put us in jail for life. Maybe it isn’t quite that extreme but for some people it feels that way. The prosecutor claims that the defendant (DBAs) can’t keep the systems running, and performing. Additionally time and time again when they make changes they are done correctly and cause further impact. For all these charges the prosecution seeks the strictest punishment in the land, life in jail (AKA automate the DBA and fire the person).

I know many people out there have been in this position before, I have. At first I challenged the thought of automation. “How can they automate this?”, “They are going to automate me out of a job!”, and “No computer can do what I do!” were some of the thoughts and statements I have made. The turning point for me came when I started seeing all the things that I couldn’t get done. Things like patching SQL Server, planning and executing migrations to newer versions of SQL Server, and sleeping at night because I would get paged out for something that is a 5 second fix and as I said at 3am “A monkey could do this, why am I getting paged?”

So a show of hands how many people have been in that place before? It is time to be realistic about what automation can do to help us and dispel the fear of it replacing us.

So why should we automate?

  1. Consistency
  2. Remove Human Error
  3. Reduce alerts
  4. Reduce repetitive tasks
  5. Improve reliability
  6. Sleep easier

What should be automated?

  1. SQL Installs
  2. Patching
  3. Anything that triggers an Alert and is fixed with a simple script
  4. Deployments (The great white whale of the DBA)

I plan to dive into these topics in more depth over time. What I want to illustrate is that while the prosecution is going overboard asking to get rid of the DBA in favor of automation, DBAs are going overboard in refusing to consider it. We need to admit that there are things better suited to automation, and agree to a bargain with the prosecution that meets in the middle. In the end the goal should be letting the DBA focus on bigger issues and reducing how many times they get called at night.

My T-SQL Coding Guidelines

So a few years ago I compiled a list of Coding Guidelines, Tips and Tricks for T-SQL performance.  I am posting the content of that document here for general information.

 

 

Version 1.0

10/14/2013

 

This document is intended to provide a condensed summary of Best Practices and methods to improve code efficiency.  These will be periodically reviewed and updated as needed.  If you have any questions about any of these please reach out to a member of the Database Administration team. 

 

  1. Working with Data inside a database is not like procedural programming. Avoid working with data one row at a time design queries that will work with sets of data. 
  2. When referring to objects in the database always use the schema name even if it is dbo.
  3. Ordering data will take longer. Using order by clauses should be reserved for a final display or even better let the application that is handling the data return do the sorting.  If you need to sort for using the TOP function consider first dropping the entire result set to a temp table and then taking the top off the temp tables.
    1. Inside the database engine an Order By will cause the database engine to perform a cursor in order to present the data it correct order, this means that it is processing the data one row at time rather than in sets.
  4. Keep queries as simple as possible. The more complex the logic in a single query the harder it is for the optimizer to come up with a good plan.  It may not be a problem when there is a smaller amount of data in the database or table but as that data grows it will make finding an efficient harder.
    1. Limit the number of joins and sub queries in single query to 5 or less. And always make sure that inner joins are done first.  If you need to break down a large query you can make a temp table to store the data of several inner joins together first and then use that against the outer joins.
      1. If there is several joins and other criteria in the where clause, any criteria that applies to inner joins should be included in the join. This will help the database engine recognize a better query plan faster.
      2. Remember that for Inner Joins the On clause is a filtering clause, for Outer Joins the On statement is a matching clause and all filtering happens in the where clause.
  • Joins are evaluated Left to Right so the result of the first join is then the input for the next join.
  1. If you are doing a sub query to find certain values, is there a list table that contains a smaller set of data but still has all the values that the list can be pulled from rather than doing a distinct on a table with a large amount of data.
    1. Sub Queries are also good candidates to be broken out into temp tables ahead of time, this is especially true if you are using the same sub query in multiple places.
  2. Avoid negative logic this includes <>, !=, not in, and not exists. Negative logic causes the database engine to have to scan the entire column that is being used in the criteria to make sure that the value is not in the data.  This will make it harder for the query plan to use indexes.
    1. One option for turning negative logic into positive logic is to load the possible values into a temp table and then delete from the temp table the values that are unwanted.
    2. If it is not possible to eliminate the negative logic try to do it against as small a dataset or table as possible.
  3. If you are doing an IN statement with literal values and this will account for more than 1/3 of the total table then consider breaking this into a Union statement and in each of the statements of the union only do one or two of the literals. This will help the database engine because each query of the union is a separate plan and it will not have to go through a large amount of data.
  4. Avoid using Select *, this returns more columns than you need and if the underlying structure of the database changes it can break code.
    1. Also remember that the order of the columns in a table does not make a difference to the database engine. When columns are added they are added at the end of the table, which is why it is a Best Practice to always specify the precise columns in the order that they are to be returned.
    2. On insert statement always specify the columns that are being inserted into. Insert into table (column1, column2) values(1,2)
  5. If there are parameters being passed in that will be used as criteria on a table, make sure they are the same data type as the column they will be compared against.
    1. If a new table is created that will have a foreign key in it to another table, those columns need to have the same data types so that when they are joined the join will be more efficient.
      1. If a column needs to be added it should be the smallest data type as possible because this will help speed up queries.
    2. If table functions are used to return data to a query for joins, they should only return a small number of rows. The database engine can not estimate how many rows the function might return so if a large number of rows are returned it will cause the performance to slow down.
      1. This same thing also applies to table variables as well.
    3. NULLS are special cases, whenever criteria are evaluated the database engine evaluates things to True, False or NULL (Unknown). NULLS are always evaluated to false regardless of what the operator is.  There is a setting that allows a change to the behavior of evaluating NULL = NULL but changing that setting (ANSI_NULLS) in the connection string is against ANSI standards. That is a very bad practice, if there are nulls in the columns used by criteria of a query those must be handled as well unless the query only wants data with known values.
      1. One way to reduce the amount of NULL values would be to have Applications, load processes or in some cases the columns themselves have default values if no other is provided.
    4. If a criteria uses a LIKE statement it will perform better if the first part of the comparison is static where column1 like ‘aaaa%’ if there is an index on the column the database engine can scan the index and limit the amount of data that it has to work through to complete the comparison.
      1. Avoid using functions on columns in criteria because this can prevent the database engine from using indexes left(columns1,4) = ‘aaaa’ would not perform as well as where column1 like ‘aaaa%’.
    5. If data is being loaded and there is the possibility of some rows being inserted and some rows being updated it is more efficient to write a Merge statement so that database engine only has to read through the data once.
      1. If the load would need to delete rows also, TSQL provides for a method of doing that in a merge statement as well (Consult a Sr. DBA if you need more info on this).
      2. If the data being loaded is coming from a file, loading it to a permanent staging table first is the recommended method. This allows for recoverability if issues arise during the load and allows for useful indexes to already be established on the table.
    6. Avoid using deprecated features, this makes it more difficult to upgrade in the future. A full list of deprecated features for the current version of SQL and the previous 2 to 3 versions can be found at http://technet.microsoft.com/en-us/library/ms143729.aspx
    7. Avoid complex logic in triggers, triggers should be a very quick bit of code that runs
    8. Encapsulate more complex tasks in stored procedures rather than views, this will allow use of temp tables to make the complex logic simpler.
      1. Some applications may not allow stored procedures to be used, provided that the database is 2005 or later writing queries that use Common Table Expressions or CTE’s gives similar benefits for breaking out the complex logic into smaller chunks.
    9. Look at existing indexes, the development and testing environments should have the same structure unless a change is currently being developed. Looking at the existing indexes can help structure queries more efficiently.  Indexes with multiple columns, if they are to be used need to have the criteria in the same order as the index.  The first column should be the most selective column meaning that it has lowest amount of repeated values.  Columns that are not used as criteria but are returned in the select should be added as included columns.
    10. Use of linked servers should be kept to a minimum, but if needed it is best to use a simple query to pull the data back from the linked server and store it in temp table or a stage table rather than doing complex joins across a linked server.
    11. Always set nocount on as the first command in stored procedures.
    12. When writing stored procedures where DDL statements will be issued, always try and group the DDL statements first in the Stored Procedure and then execute the DML statements.
    13. Use Begin/End blocks for Stored Procedures, If logic, While loops so that it is easier to identify code blocks.
    14. When calling stored procedures it is always best use the variable names in the call.

Exec dbo.myproc @var1 = ‘test’

 

 

 

 

 

 

Doctor DBA in the Multiple Domains of Insanity

I don’t know about you but, very few companies I have worked for in the last 10 years have had only 1 domain for me to support. At the very least you might get 2 or 3 to separate out prod, uat and dev. The companies I have worked for though have been growth through acquisition. Rarely do the old domains ever go away. Sometimes you are lucky and they all trust each other and you only need 1 account to log onto each of those domains. Usually you have 1 or 2 that don’t have any trust and usually that means you have to RDP into the servers to do anything because you can’t use management studio from your local machine.

That is not strictly true, barring firewalls preventing you from getting to the servers I am going to show you how to trick SSMS into thinking it is someone else. If there is trust between domains and your account in the other domain is allowed to be used on your local machine, then that is easy you shift right click on your SSMS icon and do a run as different user.

Let’s be honest though when is it ever that easy. So the first thing we need is the command to use:

C:\Windows\System32\runas.exe /netonly /user:bz\bzugg “C:\Program Files (x86)\Microsoft SQL Server Management Studio 18\Common7\IDE\Ssms.exe”

You can see we are doing a runas command but we are specifying the switch /netonly which according to the help for runas:

Next we give it the domain and the user account followed by the path and exe to run. Notice we are not storing the password, which is a glorious sacrifice to the Info Sec gods and brings them much happiness. So where to place this mystical command? I copy the shortcut to SSMS to my desktop that way I don’t mess up any of the standard shortcuts. The normal shortcut looks like this:

Once we have it copied to the desktop we can right click on it and go to properties.

Once we are in properties we are going to add this part to the being of what is already in the target. Be sure to leave a space after the user.

C:\Windows\System32\runas.exe /netonly /user:bz\bzugg

Once you have added the runas command then click ok. You will now see that the icon for this SSMS shortcut has changed.

And now when you run your edited shortcut it prompts you for your password.

Type your password for your account and hit enter. I will warn you that it may take longer to load and connect to servers but the trade off maybe worth it. For instance if you can’t copy and paste scripts via RDP or you have to pass it through several file-share servers to get it to all your target servers, this could be really helpful.

One other word of warning once you have connected it will look like you are connected with the account you have logged into your local machine with.

But if you run:

select distinct login_name from sys.dm_exec_sessions

You will see that you are actually connected with the account you specified in the runas command.

I hope this helps calm some of the insanity that torments the multiple domains that we all deal with.