Code Sins: Coding Lonely, without a Map

One of the symptoms - or maybe sub-sins - of coding lonely is coding without requirements.  Let us start, as we so often do, with an analogy.

Road Trip!

We're planning a big road trip, you and I. We're going somewhere on vacation.  Here's what we've decided so far: we're taking my van, the SQLbus. We're going somewhere warm, we're taking your friend Steve, and we're splitting all the costs three ways.  Now, given this plan, how successful do you think we're going to be at achieving a decent vacation?  Nobody's even talked about specifically where we're headed - do we even agree on what "warm" means?? Nobody's talked about supplies, where or how we're sleeping and eating - campgrounds? motel? B&B? - the direction we're headed, how long we're staying, how much we're willing to spend, etc etc etc. 

With a vacation like that, you're going to head out in a random direction; drive around for four times longer than you want to in THE most boring, backwater, ugly places possible; spend TEN times what you wanted to, and get next to no value out of it; and argue with your roadmates the whole way through.

This, my friend, is how software is developed and sold in shops all over the world every day.  Make no mistake: there are shops that do it up right, with requirements, tech specs, and solid code. May their days be fruitful and their nights free of support calls.
  But we're not preaching to those shops. We're preaching to the poor lost souls who wander, nearly directionless, through the deep mists of scope creep.

Okay, okay, enough waxing poetic.

Without clear directions from the business owners, you can't create clear tech specs, without which you can't create a system that works. In a shop that works without real requirements, everyone is used to coding lonely.  A revolutionary idea like gathering and documenting business rules will most likely meet with a lot of resistance.  Even if it doesn't, getting real requirements is something of an art form. You can't just send the BA an email to say "We need your requirements".  You have to sit down in a room with the business folk and developers, talk it out, write it down, poke holes in the use cases.  As you move through the development lifecycle, you'll have to revisit and update those requirements.

This stuff isn't fun, which is why nobody does it.  But while you can just hit the road with a friend and a credit card, you can't pin the success of your business on the planning methodology of a college road trip.

Happy days,
Jen McCown
http://www.MidnightDBA.com

P.S. Yes, I know I should've photoshopped the image to read "Code Trip", but gimme a break...it's not my strong suit, and it's past midnight!


Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
Tags:
Categories: Dev | SQLServerPedia Syndication

1 Comments
Actions: E-mail | Permalink | Comment RSSRSS comment feed

Fun With TSQL - The Almost Question

Content rating: Beginner, tips

Right click to download the setup and example code here.

I had a neat little request come across my desk yesterday (we'll abstract the details a bit for separation of company and blog).  In essence, it was "find me all the salespeople this month that have met ALMOST all the requirements to get the Superdude bonus". They want to see who's meeting most, but not quite all, of their sales goals so that management can encourage them and send reminders. 

Let's say we already have a convenient rollup table with all the salespeople's monthly summary info in it, so we can focus.

Table 

Here's the rollup table:

create table salesRollup (
salesGuyID int,
SalesGuyName varchar(100),
salesPeriodMonth tinyint,
salesPeriodYear smallint,
bridgesSold int,
bridgeSales money,
totalSales money
)

Tip: By the way, I can't stress enough the importance of knowing and using your datatypes properly. Tinyint is only 1 byte, and holds integers from 0 to 255...why would I need int (4 bytes), or even smallint (2 bytes), to hold a month number? Save space where you can. (Now, I am using money instead of smallmoney, but hey, these guys are selling bridges and ladders to the moon....that's likely to rack up numbers higher than the $214,748 limit on smallmoney).

Requirements 

Let's say that the requirements for the superdude bonus stretch over three consecutive months, and are:

  • Total sales of over $100,000 in month 1
  • Total sales of over $150,000 in month 2
  • Total sales of over $200,000 in month 3
  • Sold at least 2 bridges in month 2
  • Sold at least 4 bridges in month 3
  • Bridge sales should be more than $90,000 in month 3

Hey, we don't make the rules; we just get them from the business owners, and confirm them very, very well.   

First step: Query to Meet All Sales Requirements 

I know we're looking for those guys who ALMOST qualify, but "almost" is harder than "does", so let's start working on "does qualify", and go from there. Now, all this data is coming from one table, but we're going to self-join that table to itself to get the different month's information, like so:

select month1.*,
   month2.*,
   month3.*
FROM salesRollup month1 -- Month1 = January 2010
INNER JOIN salesRollup month2 ON month2.salesGuyID = month1.salesGuyID
   AND month2.salesPeriodYear = month1.salesPeriodYear
   AND month2.salesPeriodMonth = month1.salesPeriodMonth + 1 -- Month2 = February 2010.
INNER JOIN salesRollup month3 ON  month3.salesGuyID = month1.salesGuyID
    AND month3.salesPeriodYear = month1.salesPeriodYear
    AND month3.salesPeriodMonth = month1.salesPeriodMonth + 2 -- Month3 = March 2010.
WHERE month1.salesPeriodMonth = 1
   AND month1.salesPeriodYear = 2010

Tip: I don't actually advocate the use of SELECT *, but while we're developing - on our DEVELOPMENT box, not on our PRODUCTION box! - it's just shorthand. 

See what we did there? "month1", "month2", and "month3" are aliases for our joined tables; this lets us join a table to itself, and makes it comprehensible. Now we can ACTUALLY think of month1, month2, and month3 as different tables, because the join for each one returns a different set of data. Next up, let's turn our requirements into TSQL:

  • Total sales of over $100,000 in month 1 => month1.totalSales > 100000
  • Total sales of over $150,000 in month 2 => month2.totalSales > 150000
  • Total sales of over $200,000 in month 3 => month3.totalSales > 200000
  • Sold at least 2 bridges in month 2 => month2.BridgesSold >=2
  • Sold at least 4 bridges in month 3 => month3.BridgesSold >=3
  • Bridge sales should be more than $90,000 in month 3 => month3.bridgeSales > 90000
We'll just tack that on to the WHERE clause from the query before (changes only affect the WHERE clause, so I'm cutting out the body of the query for now):
... 
WHERE month1.salesPeriodMonth = 1
  AND month1.salesPeriodYear = 2010
  AND month1.totalSales > 100000
  AND month2.totalSales > 150000
  AND month3.totalSales > 200000
  AND month2.BridgesSold >=2
  AND month3.BridgesSold >=3
  AND month3.bridgeSales > 90000

Next: Change to "Almost" Met Sales Requirements 

This query gets us the salespeople who DO qualify for the bonus. But what we need are the people who ALMOST qualify. Now, "almost" is going to be a business rule...do they want to see a report of people with enough total sales, but not enough bridge? People who qualify in month1 and 2, but not quite yet in 3? Find out what the business owners specifically mean by "almost", and then tailor your query. In our case, we've been asked for people who have month1 and 2 requirements, have 2 bridges sold in month 3, and are within 50% of their total and bridge sales in month 3. This changes the query thusly (ignoring the body of the query again, only WHERE clause changes):

... 
WHERE month1.salesPeriodMonth = 1
  AND month1.salesPeriodYear = 2010
  AND month1.totalSales > 100000
  AND month2.totalSales > 150000
  AND month3.totalSales >= CAST((.10 * 200000) AS int)
  AND month2.BridgesSold >=2
  AND month3.BridgesSold >=2
  AND month3.bridgeSales >= CAST((.10 * 90000) AS int)

Different "Almost"

If the business had decided another tack for "almost" - like, anyone who has met month 1 and month 2 requirements, and at least one month 3 requirement - we'd have a different WHERE:

... 
WHERE month1.salesPeriodMonth = 1
  AND month1.salesPeriodYear = 2010
  AND month1.totalSales > 100000
  AND month2.totalSales > 150000
  AND month2.BridgesSold >=2
  AND
     (month3.totalSales > 200000
      OR month3.BridgesSold >=3
      OR month3.bridgeSales > 90000)

No "Did Meet" Guys

You get the idea.  If we like, we can also eliminate those guys who DO meet all the criteria - after all, this is a report on potential bonus awardees:

... 
WHERE month1.salesPeriodMonth = 1
  AND month1.salesPeriodYear = 2010
  AND month1.totalSales > 100000
  AND month2.totalSales > 150000
  AND month2.BridgesSold >=2
  AND
     (month3.totalSales > 200000
      OR month3.BridgesSold >=3
      OR month3.bridgeSales > 90000)
  AND NOT
     (month3.totalSales > 200000
      AND month3.BridgesSold >=3
      AND month3.bridgeSales > 90000)

That last AND NOT says, if a guy also meets all thre month3 criteria, I don't want to see him.

Make it Readable 

There's one more thing I'd like to do to this query, and that is to make it easily readable/usable. So let's change our SELECT statement (we'll ignore the FROM, JOIN, and WHERE clauses for now; they stay the same).  We'll use this latest set of requirements...that is, all month1 and month2 criteria are met. So let's show them WHICH month3 criteria are being met:

select
  -- Total sales
  case WHEN month3.totalSales > 200000 THEN 'YES'
  ELSE 'no'
  END AS TotalSalesMet_ThisMonth,
  -- Bridges Sold
  case WHEN month3.BridgesSold >=3 THEN 'YES'
  ELSE 'no'
  END AS BridgesSoldMet_ThisMonth,
  -- Bridge sales
  case WHEN month3.bridgeSales > 90000 THEN 'YES'
  ELSE 'no'
  END AS BridgeSalesMet_ThisMonth,
  month3.salesGuyID,
  month3.SalesGuyName,
  month3.salesPeriodMonth,
  month3.salesPeriodYear,
  month3.bridgesSold CurrentBridgesSold,
  month3.bridgeSales CurrentBridgeSales,
  month3.totalSales CurrentTotalSales
... 

This will present the user with a lovely readout, something like this:

TotalSalesMet_ThisMonth BridgesSoldMet_ThisMonth BridgeSalesMet_ThisMonth salesGuyID SalesGuyName salesPeriodMonth salesPeriodYear CurrentBridgesSold CurrentBridgeSales CurrentTotalSales
YES no no 1 Dwight 3 2010 1 10000 300000
no YES YES 16 Pam 3 2010 3 260000 387000

 

Beautiful. Happy days, all!

-Jen McCown

http://www.MidnightDBA.com


Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
Tags: , ,
Categories: Dev | SQLServerPedia Syndication | T-SQL

5 Comments
Actions: E-mail | Permalink | Comment RSSRSS comment feed

Code Sin: Coding Lonely

“All things appear and disappear because of the concurrence of causes and conditions. Nothing ever exists entirely alone; everything is in relation to everything else.” -Hindu Prince Gautama Siddharta, the founder of Buddhism, 563-483 B.C.

Don't code lonely. Just don't do it...it's one of the major sins that will cause the gods of IT to unforgivingly smite you.  You don't want to be smitten smited smote smoten, do you?  Good then.

What do I mean by "code lonely"? I mean a lot of things, actually.  You code lonely when you code in a vacuum - without good interaction with the rest of your team. When we do this, we run a very high risk of missing an important element to the module, or slamming up against some other module in a way that breaks the application. (Go ahaed, ask me how I know...)  Talk to your team, attend meetings, send emails.  Don't code lonely.

I also mean coding maverick style: Sure, you may be one awesome Top Gun, but no one is immune from random mistakes and oversights. (Again, ask me how I know.)I'm remembering Michael Bolton in Office Space, "I probably put a decimal wrong or something.  S***, I'm always DOING that!"  Implement code reviews and stick with them.


This is not the scene I'm talking about. Still funny though.

If you're reading this blog, you likely read other blogs. I'd say you're probably on Twitter, maybe Facebook or LinkedIn...so I'm preaching to the choir on this point. But just in case the MidnightDBA Blog is your only link to the world of SQL Server professionals, understand my third meaning of "coding lonely": It is absolutely invaluable, imperative (and lots of other "i" words) that you CONNECT with your SQL community.  You have SO many options, and no excuses:

"We must all hang together, or most assuredly, we will all hang seperately." Ben Franklin had it right.

Happy days,

Jen McCown

http://www.midnightdba.com/


Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
Tags:
Categories: Dev | SQLPASS | SQLSaturday | SQLServerPedia Syndication

3 Comments
Actions: E-mail | Permalink | Comment RSSRSS comment feed

Correlating Data in SQL Trace, Part 1

I’ve been chasing down a series of issues with a system. One of  the symptoms we’ve been getting is massive lock timeouts, so I’ve been taking (server side) profiler traces to see what code is requesting the locks that are being timed out.  Today I landed on a lovely solution for correlating the lock timeouts I see in the trace file with the associated code.  Let’s talk setup, and then we’ll walk through the thinking that led to our solution.

Setup

Take a server side trace, be sure to get “existing connections”, RPC:Starting, RPC:Completed, and Lock Timeout (Timeout > 0) (see the previous blog “Lock Timeout Revelation on Twitter”).  For the RPCs and Timeouts,  get your basic measures, plus ObjectName and DatabaseID.

Once you’ve captured enough of a trace, save it to a table, preferably on your local box. 

Depending on how much time you’re going to spend digging through this data, you may want to index your trace table.  Here are the indexes I developed for this query:

CREATE INDEX ix_trace ON trace (eventclass, spid, rownumber, databaseID, objectName) INCLUDE (starttime, endTime)
CREATE INDEX ix_trace2 ON trace (spid, eventclass, rownumber, databaseID, objectName)

Developing

Once you save the trace into a table, you'll need to know the event id (event class names aren't displayed). (*See footnote about trace in tables...) 

Trace event 189 is Lock Timeout (timeout > 0).  So when you find a 189, it’s easy to go back up the trace rows until you find the closest one with a corresponding SPID, which will have an eventclass of 11: RPC:Started.  Rough pseudocode:

1.       Find a timeout eventclass = 189

2.       Find the closest row with rownumber < timeout.rownumber and spid = timeout.spid

Part 1 is easy:

SELECT TOP 10
        [Timeout].RowNumber ,
        [Timeout].EventClass ,
        [Timeout].DatabaseID ,
        [Timeout].spid ,
        [Timeout].starttime ,
        [Timeout].endtime ,
FROM    Trace AS [Timeout]       
WHERE   [Timeout].EventClass = 189

Note that I’m limiting the query to the TOP 10 for now, just so I don’t have to look through 10,000 rows at once.

So Part 2: for each timeout, find the closest previous RPC Started with the same spid.  On first look, I started falling into the cursor trap.  You say it to yourself procedurally: “for each timeout” is actually direct pseudocode for a cursor. But don’t go there….just don’t…

What do we know about the timeout and its associated code call?

  • They have the same spid
  • They’re on the same database
  • The statement will have an RPC start (eventclass = 11), but not necessarily an RPC complete (eventclass = 10)
  • The statement will precede the timeout, and therefore have an earlier row number

So we could say that for a timeout, the corresponding statement is

  • the MAX rownum less than the timeout rownum
  • where the spids and databases match,
  • and eventclass=11. 

Well partner, that’s a correlated subquery right there! Pretend for a moment that we only have one lock timeout, SPID = 52, rownumber = 11,304.  So to find the code that goes with it:

SELECT TOP 1 rownumber
      FROM trace
WHERE spid = 52         --Timeout spid
AND rownumber < 11304   -–Timeout rownumber
AND eventclass = 11
ORDER BY rownumber DESC 

That’s our subquery. Let’s plug it into the part 1 query, and correlate the thing! 

SELECT TOP 10
        [Timeout].RowNumber ,
        [Timeout].EventClass ,
        [Timeout].DatabaseID ,
        [Timeout].spid ,
        [Timeout].starttime ,
        [Timeout].endtime ,
        Statement.*
FROM    Trace AS [Timeout]
        INNER JOIN Trace AS [Statement] ON Statement.spid = [Timeout].spid
               AND Statement.spid = [Timeout].spid
               AND Statement.DatabaseID = [Timeout].DatabaseID
               AND statement.rownumber = ( SELECT TOP 1 rownumber
                                  FROM trace
                                  WHERE spid = [Timeout].spid
                                  AND rownumber < [Timeout].rownumber
                                  AND eventclass = 11
                                  ORDER BY rownumber DESC
                                  )
WHERE   [Timeout].EventClass = 189

And finally, let’s do better with our select than just Statement.*:

SELECT TOP 10
        [Timeout].RowNumber ,
        [Timeout].EventClass ,
        [Timeout].DatabaseID ,
        [Timeout].spid ,
        [Timeout].starttime ,
        [Timeout].endtime ,
'STATEMENT:  ' AS spacer,
        Statement.rownumber ,
        Statement.objectname ,
        Statement.textdata,
        Statement.loginname ,
        Statement.spid ,
        Statement.starttime ,
        Statement.endtime
FROM    Trace AS [Timeout]
        INNER JOIN Trace AS [Statement] ON Statement.spid = [Timeout].spid
               AND Statement.spid = [Timeout].spid
               AND Statement.DatabaseID = [Timeout].DatabaseID
               AND statement.rownumber = ( SELECT TOP 1 rownumber
                                  FROM trace
                                  WHERE spid = [Timeout].spid
                                  AND rownumber < [Timeout].rownumber
                                  AND eventclass = 11
                                  ORDER BY rownumber DESC
                                  )
WHERE   [Timeout].EventClass = 189

I like the “STATEMENT” spacer; it’s a nice break between your timeout columns, and your statement columns.

I found another great application for this, but we’ve gone a bit long here. So, come back tomorrow for Correlating Data in SQL Trace, Part 2!

-Jen McCown

http://www.MidnightDBA.com

 

*Footnote: Again, I'll have to come back later with a link, but I also learned today that when you port your trace results into a table, duration is stored in microseconds (1/1,000,000 s), instead of milliseconds (1/1,000).  This seems like it would be pretty good info to have.   Edit: Linky! Here's the Brian Moran article "Duration and CPU Values" that mentions this.


Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
Tags: , ,
Categories: Admin | Dev | Performance | SQLServerPedia Syndication | T-SQL

134 Comments
Actions: E-mail | Permalink | Comment RSSRSS comment feed

Brief Intro to Indexes and INCLUDE

Content rating: Beginner  

This is a companion blog to "Get Index Included Column Info".

What are indexes for?

We need to talk indexes first, before we get to INCLUDE.  When you index, you're creating a reference that your query can look at to find the WHERE and JOIN terms quickly.  So for example, for this query:

SELECT col1 ,
col2
FROM dbo.BORK
WHERE col3 = 6

You'd ideally want an index like this: CREATE INDEX IX_BORK_111 ON dbo.BORK (col3) This index will help the query find the rows that it needs, but a nonclustered index like this is a separate structure from the table itself. Each time the SQL engine finds a "6" in the index, it then has to hop over to the row in the table itself to get the values for col1 and co2...that's called a "bookmark lookup". Bookmark lookups are bad because they involve extra overhead - you have to read from the index AND the table, instead of just the index.

To fix this, you can create a covering index - an index that covers all the columns you'll need for your query: CREATE INDEX IX_BORK_222 ON dbo.BORK (col3, col1, col2)  Notice that I kept col3 as the lefthand index column, because it's our search column.  Covering indexes are wonderful, but they can start to take up a lot of room. Imagine if the select statement had been SELECT col1, col2, col3, col4 .... col10.  Covering a query like that will make a rather wide index.  INCLUDE can mitigate some of those space concerns.

Index structure made easy (I hope)

To understand INCLUDE, you first must understand (a little bit) the structure of an index. An index is organized in a b-tree hierarchy - each node of data (in the case of our index IX_BORK_111, col3) has two nodes beneath it - the left node higher in the sort range, and the right node lower, like this:

..........5........
...3..........7....
1...4......6...9..

3 and 7 are the child nodes of 5.  1 and 4 are the child nodes of 3.  The lowest level nodes are called leaf nodes, so 1,4,6,and 9 are the leaf nodes.  The SQL engine walks through the tree to find the values it needs...

If we create our covering index like this: CREATE INDEX IX_BORK_222 ON dbo.BORK (col3, col1, col2)  then each node in the tree will contain values for col3, col1, and col2.  That can take up a lot of space.  But in our query, we don't need to search based on col1 and col2....we only search based on col3.  The other two columns are just there to be returned in the SELECT statement.

About INCLUDE 

INCLUDE lets us make the index smaller, while still supplying our SELECTed columns.  First, here's the index declaration: CREATE INDEX IX_BORK_333 ON dbo.BORK (col3) INCLUDE (col1, col2)

Each node in the tree will contain the value for col3. But only the leaf level nodes will hold the values for col1 and col2.  That's how INCLUDE makes the index smaller - the rest of the tree doesn't contain extra, unused information.  The SQL engine performs its search based on col3, finds the leaf level node(s) that match, and there at the leaf level is the rest of the information we need.

Key Ideas Review

Key ideas mentioned:

  • Indexes let queries find data faster.
  • Nonclustered indexes are separate objects from tables.
  • A covering index is ideal - it gives the query everything it needs, without having to touch the table itself (that's called a bookmark lookup).
  • Nonclustered indexes take up space, so you don't necessarily want a lot of really wide covering indexes. Use with discretion.

One final point: Remember that indexes must be updated every time data is updated or inserted into the table. The more indexes, the longer your inserts and updates will take.  Indexing is a balance between supplying enough to support your read operations, and keeping insert/update overhead low. 

As always, please email or comment if you have questions or corrections.  Happy days!

Jen McCown

http://www.MidnightDBA.com


Currently rated 5.0 by 1 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
Tags: , ,
Categories: Dev | SQLServerPedia Syndication | T-SQL

25 Comments
Actions: E-mail | Permalink | Comment RSSRSS comment feed

Get Index Included Column Info

Content rating: Beginner, tip

One of my pet peeves with SQL is that there isn't a quick and easy way to get all the information about indexes. Sure, you have sp_help tablename or sp_helpindex tablename to get the index name and keys for a table, but that's not even half the story.  I use a lot of indexes with INCLUDE - more on that in the next blog - and INCLUDEd columns don't show up in sp_help and sp_helpindex.

To get around this, you can keep this query on hand, or just turn it into a view:

SELECT OBJECT_NAME(i.[object_id]) TableName ,
i.[name] IndexName ,
c.[name] ColumnName ,
ic.is_included_column ,
i.index_id ,
i.type_desc ,
i.is_unique ,
i.data_space_id ,
i.ignore_dup_key ,
i.is_primary_key ,
i.is_unique_constraint
FROM sys.indexes i
JOIN sys.index_columns ic ON ic.object_id = i.object_id
AND i.index_id = ic.index_id
JOIN sys.columns c ON ic.object_id = c.object_id
AND ic.column_id = c.column_id
ORDER BY tableName ,
ic.index_id ,
ic.index_column_id

That'll give you a lovely bit of information: the table, index, and column names, along with all your day to day index info needs.  Slap a WHERE i.[name] = 'indexname' in there, and you get information for a particular index.

Jen McCown

http://www.MidnightDBA.com


Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
Tags: , ,
Categories: Dev | Performance | SQLServerPedia Syndication | T-SQL

82 Comments
Actions: E-mail | Permalink | Comment RSSRSS comment feed

Deadlocks: Share/Insert Lock Deadlocks, With a Twist

Content rating: intermediate/tips.  We'll do an intro deadlocking blog sometime soon.

I had a bear of a deadlocking issue last week, and reached out to my SQL Tweeps for help. Jonathan Kehayias (blog, SQLSarg on twitter) responded with email, repros, and solutions.  I was trying to think of a funny way to say thank you, or thumbs up, in a MidnightDBA way...but I couldn't come up with anything better than "Thanks, Jonathan....that's one full moon up for you!"  And that's just lame, and a little bit surreal in the Yo Gabba Gabba way, so we'll leave it at, thanks SQLSarg!

Through profiler traces capturing deadlock graphs, I was able to narrow the deadlock down to a very, very specific piece of code - a stored procedure called UpdateOrder was deadlocking on an insert to the [Order] table.  The deadlock graph (images forthcoming) showed Owner Mode Range S-S, and Request Mode Range I-N locks on the primary key for that table.

Do a search for "Range I-N" in books online, and you'll get the article on Key-Range Locking. The first thing that leaps out at you is that these are associated with the serializable isolation level.  Now, my database uses the default READ COMMITTED isolation level, and the stored procedure itself wasn't specifically set for Serializable, so I was very confused.  The solution to the first mystery: this SP is called as part of a BizTalk process, and BizTalk not only applies Serializable to all the code it calls, but it encapsulates it in a transaction.  That'll be important a little bit later.

In the meantime, we're stuck with Serializable. That increases contention, but it shouldn't cause a deadlock by itself, so now we take a closer look at the code.  This is somewhat simplified:

UPDATE  @tmp   
SET     inOrder = 1   
FROM    @tmp tp    
INNER JOIN [Order] op ON op.OrderID = @BookingNumber 
                          
             
INSERT  INTO [Order]   
                ( PolicyName ,   
                  Text             
                )   
                SELECT DISTINCT   
                        @BookingNumber ,   
                        tp.[Message]   
                FROM    @tmp tp   
                WHERE   tp.inOrder= 0    
 

The RANGE I-N lock clearly comes from the INSERT statement, but what about the RANGE S-S? If we look closely, we'll see that the UPDATE statement doesn an inner join on the [Order] table, which is to say, we're SELECTing from [Order] via that join.  When we run two or more processes running this stored procedure, both procedures get shared locks for the JOIN [Order] statement, and they request insert locks (Range I-N) for the insert statement. Each proces is in one container transaction, so it can't let go of the shared lock before it also gets the insert lock.  Boom, deadlock.

The three suggested solutions, in order of preference, are

  1. Change the isolation level from serializable to Read Commited.  I can't do that, since BizTalk is enforcing Serializable. Next option....
  2. Move any pre-Insert SELECT (in my case, the update with the inner join) outside of the transaction that is performing the INSERT.  This doesn't apply to me, also courtesy of BizTalk. Remember that encapsulating transaction we can't get out of? Yeah. So what we gotta do is....
  3. Use an UPDLOCK hint.  To quote Jonathan, the UPDLOCK hint will "increase the lock type to an update lock from a shared lock which blocks the other executions of the stored procedure until the transaction commits". 

UPDATE  @tmp   
SET     inOrder = 1   
FROM    @tmp tp    
INNER JOIN [Order] op WITH (UPDLOCK) ON op.OrderID = @BookingNumber 
                          
...

Tah-dah! No more deadlocks.  The update lock prevents any other process from getting their mitts on [Order] long enough to finish this update and the insert.

-Jen McCown

http://www.MidnightDBA.com

 

 

 


Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
Tags: ,
Categories: Dev | SQLServerPedia Syndication | T-SQL

64 Comments
Actions: E-mail | Permalink | Comment RSSRSS comment feed

Remix! Optimized: Query by Hour, Day, Week, or Month

I originally posted this on 11/16/09. I repost here for the joy of TSQL Tuesday, and to add a very important change to the original solution.

Here's the accompanying video - same material, different format. The video doesn't include the change, by the way.

A solution is ever so much more elegant if you know the path that led to it.  Which is another way of saying, you only know how totally smart I am if you see how much work I had to do :)

I had to do a lot of work today - we have a stored procedure that's being called a few times a second, and maxing out 8 processors on our beefy test server during a moderate load test.  So I dug in and had some fun. When it came to the final fine tuning, I was looking at a query full of date maninpulation.  That sort of thing is performance death, so I stepped through it.  We started with something like this, a query that allows the user to get a SUM by hour, day, week, or month.  Here we'll just show week and month for berevity:

SELECT SUM(o.Sale) AS Sales
FROM Orders o
WHERE o.BizID = @BizID
AND (
-- Interval: Week
( @Interval = 'wk'
AND DATEPART(wk, o.OrderDate) = DATEPART(wk, GetDate())
AND DATEPART(yyyy, o.OrderDate) = DATEPART(yyyy, GetDate())
)
-- Interval: Month
OR ( @Interval = 'mm'
AND DATEPART(mm, o.OrderDate) = DATEPART(mm, GetDate())
AND DATEPART(yyyy, o.OrderDate) = DATEPART(yyyy, GetDate()) 
)
)
GROUP BY O.BizID

Principle #1: Free the Functions 

Right now we're looking disdainfully at all those DATEPARTs and GetDates().   Principle #1: Get functions out of the WHERE clause, if possible.  That's easy enough, we can declare @wk @mm and @yyyy variables and set them before our query. Let's also change this from OR to IF (for more on that, see my previous blog IF is better than OR), :

SELECT @wk =  DATEPART(wk, GetDate()) , @mm =  DATEPART(mm, GetDate()) , @yyyy =  DATEPART(yyyy, GetDate())

-- Interval: Week 
IF @Interval = 'wk'
SELECT SUM(o.Sale) AS Sales
FROM Orders o
WHERE o.BizID = @BizID
AND DATEPART(wk, o.OrderDate) = @wk
AND DATEPART(yyyy, o.OrderDate) = @yyyy
GROUP BY O.BizID
-- Interval: Month
ELSE IF @Interval = 'mm'
SELECT SUM(o.Sale) AS Sales
FROM Orders o
WHERE o.BizID = @BizID
AND DATEPART(mm, o.OrderDate) = @mm
AND DATEPART(yyyy, o.OrderDate) = @yyyy
GROUP BY O.BizID

Principle #2: Free the Optimizer 

That'll help a lot, but we're not through yet.  It occurrs to me that this still isn't an elegant solution.  The real brainstorm comes: If we compare date part to date part, we'll never get away from DATEPART.  What's better than that?  How about comparing the whole date to another whole date, allowing the optimizer to use that column's index?  Booya!  Principle #2, let the optimizer do its job (and that includes making indexes availble). Use @DateStart and @DateEnd instead...see:

        SET @date = CONVERT(VARCHAR(10), GETDATE(), 101) -- This gives us a clean date with time set at midnight, 00:00:00
-- Interval: Week 
IF @Interval = 'wk'
BEGIN
                        SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dw,GETDATE()) ) + 1,@date)
                        SELECT  @dateEnd = DATEADD(dd, 7, @dateStart)
SELECT SUM(o.Sale) AS Sales
FROM Orders o
WHERE o.BizID = @BizID
AND o.OrderDate BETWEEN @dateStart AND @DateEnd
GROUP BY O.BizID
END
-- Interval: Month
ELSE IF @Interval = 'mm'
BEGIN
               SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dd,@date) ) + 1,@date)
               SELECT  @dateEnd = DATEADD(m, 1, @dateStart)
SELECT SUM(o.Sale) AS Sales
FROM Orders o
WHERE o.BizID = @BizID
AND o.OrderDate BETWEEN @dateStart AND @DateEnd 
GROUP BY O.BizID
END

Principle #3: Happify Your Code 

We'll go through all that date math here in a minute, but first things first. Notice we've gotten all the funky functions and date manipulation out of our WHERE clause, where it'd mess up our friend the SQL optimizer, and placed it where it belongs: as a one-shot above the query itself.  Oh and hey, see how the two SELECT statements are identical?  Let's pull those out of the IF/ELSE and make the code more compact, readable, and easy to manage (oh hai, principle #3!):

SET @date = CONVERT(VARCHAR(10), GETDATE(), 101) -- This gives us a clean date with time set at midnight, 00:00:00
-- Interval: Week 
IF @Interval = 'wk'
BEGIN
                        SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dw,GETDATE()) ) + 1,@date)
                        SELECT  @dateEnd = DATEADD(dd, 7, @dateStart)
END
-- Interval: Month
ELSE IF @Interval = 'mm'
BEGIN
               SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dd,@date) ) + 1,@date)
               SELECT  @dateEnd = DATEADD(mm, 1, @dateStart)
END
SELECT SUM(o.Sale) AS Sales
FROM Orders o
WHERE o.BizID = @BizID
AND o.OrderDate BETWEEN @dateStart AND @DateEnd 
GROUP BY O.BizID

Now, let's sit back and have a plate of #bacon. Wait! We still have date math, and the big change...

Date Math 

We were originally just thinking of matching month to month and year to year, for our first example.  But that's not set based logic.  To compare a date to a date, we needed a range to compare the column value to.  In our example we're always going for the current date range (e.g., THIS month, THIS week, etc).  So start with a clean, no-time @date:

SET @date = CONVERT(VARCHAR(10), GETDATE(), 101) -- This gives us a clean date with time set at midnight, 00:00:00

Today that gives us 11/16/2009.  Pseudocode: Use that to get the first of the month: Today's date, minus number of days (16), plus one = @date - datepart(dd,@date) + 1 = @date + (-1 * datepart(dd,@date)) + 1 =

SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dd,@date) ) + 1,@date) 

And DateEnd is just midnight on the first day of next month, or DATEADD(m,1,@dateStart).  You can use this same kind of logic for hour:

                SELECT  @dateStart = DATEADD(hh, DATEPART(hh, GETDATE()),@date -- Clean date at midnight, plus the current hour

                SELECT  @dateEnd = DATEADD(hh, 1, @dateStart-- Start date plus 1 hour

And day is even simpler:

               SELECT  @dateStart = @date               
               SELECT  @dateEnd = DATEADD(dd, 1, @dateStart) 

For me, week was the tricky part, as we didn't wind up using the "week" date part at all. Pseudocode: start with the day of the week (Monday is 2, for today's example), and subtract that from today's date to get 0, add 1 to get Sunday of this week =   today - day of week(today) + 1 =

                SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dw,@date) ) + 1,@date)
                SELECT  @dateEnd = DATEADD(dd, 7, @dateStart)

Nice, huh?

One More Thing: The Big Change

That'd be perfect and beautiful and awesome, except for one thing. The way we have this set up, we'll get orders back for the start of the next interval.  For example, if we want orders for the 10:00 hour, the time for our end date is 11:00...if an order was placed at precisely 11:00:00.000, it'll return. We don't want that...we want rows for orders placed up to BUT NOT INCLUDING the start of the next hour.  To enforce this, we subtract 2 ms from our end date.

We don't subtract 1 ms from the time because of an interesting behavior of SQL Server: if you subtract 1 millisecond (SELECT DATEADD(ms,-1,'2009-01-01 12:00:00')), you'll notice that it returns '2009-01-01 12:00:00.000' instead of '2009-01-01 11:59:59.999'.  But if you subtract 2 milliseconds (SELECT DATEADD(ms,-2,'2009-01-01 12:00:00')), it returns '2009-01-01 11:59:59.997' instead of '2009-01-01 11:59:59.998'. 

The explanation can be found in Kimberly Tripp's article on Partitioned Tables and Indexes in SQL Server 2005:

"...datetime data does not guarantee precision to the millisecond. Instead, datetime data is precise within 3.33 milliseconds. In the case of 23:59:59.999, this exact time tick is not available and instead the value is rounded to the nearest time tick that is 12:00:00.000 A.M. of the following day."

So, our final solution just needs a minor modification to @endDate:

                SELECT  @dateStart = DATEADD(dd,-1 * ( DATEPART(dw,@date) ) + 1,@date)
                SELECT  @dateEnd = DATEADD(ms,-2,DATEADD(dd, 7, @dateStart))

The end.

Review

Principles:

  • Principle #1: Get functions out of the WHERE clause, if possible.  They're performance death.  
  • Principle #2: Let the optimizer do its job.  Make sure indexes are available, and the query can use them.
  • Principle #3: If all else remains equal, use fewer lines of code....make code more compact, readable, and easy to manage.
  • And don't forget to exclude the start of the next time interval with Dateadd (ms, -2, yourdate)

The major 2 things we did for this query were to get rid of the ORs (see IF is better than OR), and change our thinking from comparing date parts, to comparing the whole OrderDate to a range of dates. 

Eat some #bacon. The end. Really.

-Jen McCown, http://www.MidnightDBA.com


Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
Tags: ,
Categories: Dev | Performance | SQLServerPedia Syndication | T-SQL

129 Comments
Actions: E-mail | Permalink | Comment RSSRSS comment feed

Highly Amusing Index Frag Anecdote

I can't believe I haven't told this story from SQLPASS yet... and by the way, here's Paul Randal's blog on this same event/topic.

One of the few sessions I managed to attend was a really cool Grant Fritchey (blog, twitter) session on DMVs.  He was talking about the DMV that includes index fragmentation data (sys.dm_db_index_usage_stats) and mentioned briefly that below a certain page level, he doesn't bother defragmenting.  During the QnA (and Tweeting all the while) I asked Grant, "For you, how many pages does a table need to have before you start worrying about the fragmentation level?" 

He answered that his lower cutoff is in the single digits, because defragmenting a table that small will make no difference at all.  And "my [upper] cutoff is at about 100 pages, but Microsoft says 1,000 pages. But even a 30 page table can wreak havoc on a query if highly fragmented." 

Paul Randal (blog, twitter) was very active on Twitter during the summit, and saw the tweets. He replied immediately: "#sqlpass 1000 is a number I made up when back at MS, but it only applies if the table is in memory. Otherwise, sure, defrag."  I got to read that out while we were still in Grant's session, and it got a big laugh.

-Jen McCown

http://www.MidnightDBA.com


Currently rated 5.0 by 1 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
Tags: , , ,
Categories: Admin | Dev | SQLPASS | SQLServerPedia Syndication

80 Comments
Actions: E-mail | Permalink | Comment RSSRSS comment feed

Restore Database WITH MOVE - A Review

Content rating: Beginner/review material.

The most common database restore I do is the WITH MOVE format.  I get DB backups from friends, book or presenter sites, and restore them on my box to play with.  The RESTORE syntax WITH MOVE, from BOL, is:

RESTORE DATABASE TestDB
FROM DISK = 'C:\AdventureWorks.bak'
WITH MOVE 'AdventureWorks_Data' TO 'C:\testdb.mdf',
MOVE 'AdventureWorks_Log' TO 'C:\testdb.ldf'
GO

See, database backups store information about the database they came from...namely, the logical file names, the source file locations, and so on.  When I restore a backup on my box, I'm very likely going to restore to a different filepath - I want to save my files in a folder on C:\, not F:\ (I don't even HAVE an F drive).  

Let's take a look.  First, locate the backup file itself - in this case, Brent Ozar's Twittercache database - and get the full filepath (e.g., is it saved in "c:\temp"?).  Then you can run this in SSMS:

RESTORE FILELISTONLY FROM DISK='C:\temp\TwitterCache.bak'

That'll get you the backup information:

TwitterCache F:\MSSQL\DATA\TwitterCache.mdf D PRIMARY 134217728 35184372080640 1 0 0 6DC5C93F-811F-40D3-9C8F-56692E098701 0 0 57147392 512 1 NULL 26000000004400037 B9297268-D556-44FE-9940-13A7B8550289 0 1
TwitterCache_log F:\MSSQL\DATA\TwitterCache_log.ldf L NULL 201138176 2199023255552 2 0 0 B5C0DCCF-9EBA-4A62-8B8A-6B09844B953C 0 0 0 512 0 NULL 0 00000000-0000-0000-0000-000000000000 0 1

All you really need are the LogicalNames. In this example, we get these logical names:

TwitterCache
TwitterCache_log

Now build your restore statement, using the Logical Names (in bold) and the filepath to your SQL data and log folder(s):

RESTORE DATABASE TwitterCache
FROM DISK='C:\temp\TwitterCache.bak'
WITH MOVE 'TwitterCache' TO 'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\TwitterCache.mdf',
MOVE 'TwitterCache_log' TO 'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\TwitterCache_log.ldf' 

The easiest part to mess up is the comma between the MOVE statements, so don't miss it.

The end.
-Jen McCown
http://www.MidnightDBA.com


Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
Tags: ,
Categories: Admin | Dev | SQLServerPedia Syndication | T-SQL

88 Comments
Actions: E-mail | Permalink | Comment RSSRSS comment feed