Sep 5 2008

DataBase Performance - Critical Hardware Configuration Tips

Posted by Mike Brunt at 3:25 PM
1 comments
- Categories: Default | ColdFusion | JRun-J2EE

In my opinion there are four critical and separate areas that govern the performance characteristics of a Database server, also known as RDBMs. In this article I will focus on hardware considerations and hard-drive selection, in particular. However it is good to mention all four areas and I see these as follows:

  1. Code: This covers the efficiency of in-line SQL. For instance, avoid Select * at all times, I see it as never justifiable and the difference in performance between explicitly naming every column in a table and using Select * is dramatic; Select * can levy a 10-12 times performance penalty causing full table scans every time the query runs. In addition carefully examine SQL containing joins and sub queries. Also, take great care when looping over code that will cause queries, make sure you know the maximum number of loops and the performance impact of that. Always use CFQUERYPARAM and where possible use Stored Procedures. This last two points have both performance and security benefits. There are many more coding related items and I am only listing the major ones here. 
  2. Data modeling - Database Design: One of the biggest problems I come across is applications performing badly because the initial data model did not consider application usage growth and/or reporting needs. This latter item was covered in a blog posting I did for Alagad which can be found here. The discovery phase of application design is critical particularly in relation to the data modeling. Try to make sure there is plenty of time to get this right, think of the database as a heart, if it falters or fails your application will suffer the same fate. 
  3. Ongoing Maintenance: As applications change so should the components needed to ensure ongoing efficiency. In a database (RDBMs) effective table indexing can bring about immense gains in performance. Start right, by establishing efficient indexes and keep right by constantly reviewing them, this not a one time job. Also look at things such as views and temporary tables, in case that is a more efficient way to process data. 
  4. Database Hardware – Hard Drives: One area which is often overlooked when selecting hardware to house our database(s) is the hard-drive configuration. When creating specification requirements we tend to think of CPU clock speed and RAM as the main points of consideration for a server. In my experience, however; how the hard-drives are configured is just as critical if not more so.

Before getting into the details, here is why good Database Administrators will home in on millisecond improvements. Let’s consider an operation that takes 10 extra milliseconds to complete and which needs to perform 100 times an hour, for a year. That seemingly insignificant 10 milliseconds turns into 2.4 hours. Let’s also assume that operation runs 1,000 times an hour and I can tell you from my experience that queries running 1,000 times per hour is very common, I have seen many clients where this could be tens of thousand times per hour. Even at a lowly 1,000 times per hour that 10 milliseconds becomes 24 hours of wasted time. Hopefully this example demonstrates why good DBA’s think in milliseconds and why selecting the correct hard drive configuration is very important.

In understanding the subject of optimal hard drive selection we should consider the different parts which make up an enterprise level RDBMs. These parts or operations fall into three areas of functionality; reading, writing and reading and writing. So the transaction logs (critical if ever up to the minute restoration is necessary) and the tempdb mainly run write operations. So for these operations where, writing to the hard drives in ongoing and frenetic we need the least number of physical hard drives that will give us adequate redundancy. Getting basic, if we only have one hard-drive and it fails we have a big problem. So we can turn to something called RAID (Redundant Array of Inexpensive Disks). There is a good article here which describes the different kinds of RAID arrays. In the case of heavy write operations such as those the transaction logs and tempdb need , we should ideally select RAID Level 10. In RAID Level 10 we have a minimum of two drives which are striped and mirrored and contain the same data. As we only need to write to two drives at once, operations that are heavily using writes work efficiently on RAID Level 10 and If one drive fails we still have service.

For the data itself we should use RAID Level 5 or 6. In a typical RDBMs the amount of data will continue to grow and the advent of Sarbanes Oxley means that deleting data is no longer a safe option. In addition, the data itself will be read from and written to frequently. So we need something that gives us plenty of storage capacity. RAID 5 or 6 arrays require a minimum of 3 hard-drives and one drive is always dedicated to maintaining parity for the array.  So if we have 5 x 200GB hard drives which total 1,000GB we can only actually use 800GM of that space. Again, data is mirrored across all drives so if one fails we still have service. This means that with each write operation we are actually writing to 5 drives in an array with 5 drives, obviously the more drives the more concurrent writing.  This multiple disk writing is why RAID Level 10 is a better option for heavy writing processes

There is one last part of the database (RDBMs) we have not yet covered, the engine itself, the binaries. I would recommend that these be on a separate RAID
Level 1 array to give us the basic redundancy that we need.

Finally, we should consider the item which allows data to flow to and from the hard drives within the server-hardware itself. These items are typically called “controllers”. At minimum, each drive array should have its own dedicated controller. So in the example above we actually have three RAID arrays.

  • RAID 10 for the transaction logs and tempdb
  • RAID 5 or 6 for the actual data itself
  • RAID 1 for the database engine

In this case we need a minimum of three hard-drive controllers. Ideally though a dedicated controller per hard drive is even better and that is what I would recommend. It is important to consider that a slow or failing database will cost far more than any number of controllers either in direct loss of revenue (eCommerce for instance) or a loss of clients and your good name which can be even costlier.

In my next article on this subject I will delve into some ways of quantifying hard drive requirements for an RDBMs.

 

Comments

Mike Brunt

Mike Brunt wrote on 09/07/08 6:45 AM

@Michael thanks for your comment and the points you make. In my experience even if you explicitly name each column in a query and run that under load doing the same thing by using Select * is inordinately slower.

Your point about more data than needed being sent is of course correct and efficient indexing is another very important factor in all of this.

Write your comment



(it will not be displayed)



Leave this field empty: