Wednesday, March 28, 2012

Query with multiple tables Use of JOIN vs WHERE

I have inherited a database, written a few years back and the people who
designed the stored procedures seem to do things differently than I learned
and it seems to work faster, but I cannot figure out why. I check the
execution plans and they appear identical, but the statistics show a HUGE
(to me) difference in reads (147 for the first method and 40 for the second)
. The second method takes about 10 seconds less to return the results. (O
f
course the code is wrapped in a stored procedure - but I was curious about
this "new" way of doing things so I extracted the select statements to do
comparisons).
I'm sure that the explanation is simple and is just an area of SQL coding I
hadn't been exposed to yet.
Any info appreciated,
Nancy
CODE:
It is a simple query to get a count joining 3 tables
I would normally do the query using:
Select count(CODE1)
from Table1
JOIN x_hcfa_cpt
ON
Table1.CLAIMNO = Table2.CLAIM_NUMBER
JOIN x_hcfa_cpt_mas
ON
Table1.TERM_NUMBER = Table3.TERM_NUMBER
where
Table1.TERM_NUMBER = 'asdftcdww'
AND
(Table1.STATUS='false'
or
Table1.STATUS='true')
and
Table1.FLAG='false'
But in the code I inherited they used:
Select count(CODE1)
from Table1, Table2, Table3
where
Table1.X_TERMINAL_NUMBER = 'asdftcdww'
AND
Table1.CLAIMNO = Table2.CLAIM_NUMBER
and
Table1.TERM_NUMBER = Table3.TERM_NUMBER
and
(Table1.STATUS='false'
or
Table1.STATUS='true')
and
Table1.FLAG ='false'> Select count(CODE1)
> from Table1, Table2, Table3
This is non-standard code and should be avoided (especially for outer joins
due to non-conforming behavior, but for inner joins as well). My suggestion
is to re-write the code with JOIN statements, and to avoid ambiguity, I make
it a standard practice to include the type of JOIN, so I would use the INNER
keyword as well (even though it is the default).|||There should be no difference between SQL-92 JOINs (1st statement) and
the older style (2nd statement).
I'm guessing that x_hcfa_cpt and x_hcfa_cpt_mas are Table2 and Table3 in
the first query?
Also, in the first query, the where clause uses Table1.TERM_NUMBER
whereas the 2nd query uses Table1.X_TERMINAL_NUMBER. Could that be the
difference in performance?
Nancy Lytle wrote:

>I have inherited a database, written a few years back and the people who
>designed the stored procedures seem to do things differently than I learned
>and it seems to work faster, but I cannot figure out why. I check the
>execution plans and they appear identical, but the statistics show a HUGE
>(to me) difference in reads (147 for the first method and 40 for the second
)
>. The second method takes about 10 seconds less to return the results. (O
f
>course the code is wrapped in a stored procedure - but I was curious about
>this "new" way of doing things so I extracted the select statements to do
>comparisons).
>I'm sure that the explanation is simple and is just an area of SQL coding I
>hadn't been exposed to yet.
>Any info appreciated,
>Nancy
>CODE:
>It is a simple query to get a count joining 3 tables
>I would normally do the query using:
>Select count(CODE1)
>from Table1
>JOIN Table2
>ON
>Table1.CLAIMNO = Table2.CLAIM_NUMBER
>JOIN Table3
>ON
>Table1.TERM_NUMBER = Table3.TERM_NUMBER
>where
>Table1.TERM_NUMBER = 'asdftcdww'
>AND
> (Table1.STATUS='false'
> or
> Table1.STATUS='true')
>and
> Table1.FLAG='false'
>But in the code I inherited they used:
>Select count(CODE1)
>from Table1, Table2, Table3
>where
>Table1.X_TERMINAL_NUMBER = 'asdftcdww'
>AND
>Table1.CLAIMNO = Table2.CLAIM_NUMBER
>and
>Table1.TERM_NUMBER = Table3.TERM_NUMBER
>and
>(Table1.STATUS='false'
>or
>Table1.STATUS='true')
>and
>Table1.FLAG ='false'
>
>|||On Tue, 13 Sep 2005 16:37:16 -0400, Nancy Lytle wrote:

>I have inherited a database, written a few years back and the people who
>designed the stored procedures seem to do things differently than I learned
>and it seems to work faster, but I cannot figure out why.
(snip)
Hi Nacny,
I'll assume that the different table names and column names are a result
of you renaming some tables and columns when preparing the post, and the
code you actually tested this on didn't have these differences :-)
The difference in the queries are the two different styles of join
notation. In old versions of SQL Server (note that I'm talking real old
here - older than SQL Server 6.5), only the version with the
comma-delimited list of tables is allowed. The more verbose version with
infixed join operators was added later, to adhere to the ANSI standard.
For inner joins, there is absolutely no difference between the two
versions. They are both defined in the ANSI standard, both acccepted by
SQL Server and they will both return the same results. They'll also use
the same execution plan, so that there's no performance difference
either.
For outer joins, things are different. The "old-style notation" (that
uses =* and *= in the WHERE clause to define inner and outer tables) is
ambiguous. It's not defined in the ANSI standard. MS has announced that
it will drop support for =* and *= in a future version. In fact, I
recall reading somewhere that SQL Server 2005 will only accept =* and *=
in the backward compatibility mode.

> I check the
>execution plans and they appear identical, but the statistics show a HUGE
>(to me) difference in reads (147 for the first method and 40 for the second
)
>. The second method takes about 10 seconds less to return the results.
Did you run both tests on an empty cache? I suspect not - and that's
what causes the difference.
Test it like this:
DECLARE @.start datetime
DECLARE @.end datetime
-- Flush all dirty buffers to disk
CHECKPOINT
-- Remove all previously read pages from the data cache
DBCC DROPCLEANBUFFERS
-- Remove all previously compiled execution plans as well
DBCC FREEPROCCACHE
-- Now start the real test
SET @.start = CURRENT_TIMESTAMP
#### ####
#### YOUR QUERY GOES HERE ####
#### ####
SET @.end = CURRENT_TIMESTAMP
SELECT @.start AS StartTime,
@.end AS EndTime,
DATEDIFF(ms, @.Start, @.End) AS "Elapsed (ms)"
If the code that you are testing returns many rows, change the SELECT to
a SELECT ... INTO #temp_table to eliminate the speed of the network and
the display speed of your client from the equation.
I'd be VERY surprised if you still get significant differences if you
test the queries like this. (Small differences are to be expected,
especially if the server you're running this on has other things to do
as well).
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Thanks, Hugo, Trey and Aaron!
I guess there is more sql code I am going to have to change, the SP's (and
they are all named sp_ !) are dotted with uses of comma delimited lists of
tables for joins and usage of *=, not to mention tons of select *'s, and the
sp_ naming convention.
I used Hugo query and that helped me see the real difference between the
two, which is actually very slight and leans toward the use of JOINs.
Thanks again, this is a great group!
Nancy
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:m1fei1paaf5g8qo9ef509q9cqocdmds466@.
4ax.com...
> On Tue, 13 Sep 2005 16:37:16 -0400, Nancy Lytle wrote:
>
> (snip)
> Hi Nacny,
> I'll assume that the different table names and column names are a result
> of you renaming some tables and columns when preparing the post, and the
> code you actually tested this on didn't have these differences :-)
> The difference in the queries are the two different styles of join
> notation. In old versions of SQL Server (note that I'm talking real old
> here - older than SQL Server 6.5), only the version with the
> comma-delimited list of tables is allowed. The more verbose version with
> infixed join operators was added later, to adhere to the ANSI standard.
> For inner joins, there is absolutely no difference between the two
> versions. They are both defined in the ANSI standard, both acccepted by
> SQL Server and they will both return the same results. They'll also use
> the same execution plan, so that there's no performance difference
> either.
> For outer joins, things are different. The "old-style notation" (that
> uses =* and *= in the WHERE clause to define inner and outer tables) is
> ambiguous. It's not defined in the ANSI standard. MS has announced that
> it will drop support for =* and *= in a future version. In fact, I
> recall reading somewhere that SQL Server 2005 will only accept =* and *=
> in the backward compatibility mode.
>
> Did you run both tests on an empty cache? I suspect not - and that's
> what causes the difference.
> Test it like this:
> DECLARE @.start datetime
> DECLARE @.end datetime
> -- Flush all dirty buffers to disk
> CHECKPOINT
> -- Remove all previously read pages from the data cache
> DBCC DROPCLEANBUFFERS
> -- Remove all previously compiled execution plans as well
> DBCC FREEPROCCACHE
> -- Now start the real test
> SET @.start = CURRENT_TIMESTAMP
> #### ####
> #### YOUR QUERY GOES HERE ####
> #### ####
> SET @.end = CURRENT_TIMESTAMP
> SELECT @.start AS StartTime,
> @.end AS EndTime,
> DATEDIFF(ms, @.Start, @.End) AS "Elapsed (ms)"
> If the code that you are testing returns many rows, change the SELECT to
> a SELECT ... INTO #temp_table to eliminate the speed of the network and
> the display speed of your client from the equation.
> I'd be VERY surprised if you still get significant differences if you
> test the queries like this. (Small differences are to be expected,
> especially if the server you're running this on has other things to do
> as well).
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)|||On Wed, 14 Sep 2005 08:59:40 -0400, Nancy Lytle wrote:

>Thanks, Hugo, Trey and Aaron!
>I guess there is more sql code I am going to have to change, the SP's (and
>they are all named sp_ !) are dotted with uses of comma delimited lists of
>tables for joins and usage of *=, not to mention tons of select *'s, and th
e
>sp_ naming convention.
>I used Hugo query and that helped me see the real difference between the
>two, which is actually very slight and leans toward the use of JOINs.
>Thanks again, this is a great group!
>Nancy
Hi Nancy,
I guess that the "very slight" difference you see falls within the
bounds of statistic inaccuracy. If you repeat the test a few times, you
should see that there really is no difference between the two.
As far as rewriting code, I'd say: find the right path between
religiously rewriting everything (costly, time-consuming, and will
introduce bugs, if only by typo's and copy/paste errors) on the one end,
and leaving working code untouched on the other end.
If you decide to start rewriting where it's needed most, then begin with
the use of =* and *= for outer joins, as they are on the deprecated
feature list. Next should be the sp_ prefix and the use of SELECT *
(both are performance killers in their own ways; both induce a risk of
unexpectedly breaking your code when some change is made elsewhere).
The join syntax for inner joins (i.e. the use of comma-delimited table
list without any =* or *=) should be last on your list, as this is only
a readability improvement (and not all experts would agree that it's an
improvement - there are a few SQL experts who think that the "old style"
join notation is often better, though I'm not one of them).
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Thanks for the suggestions.
I did a complete search and the locations for the *= code is in dynamic sql
written in .asp pages. So, since I am changing the dynamic sql to a stored
procedure, I can get rid of the *= and select * and created proper procedure
naming all at the same time.
I will wait to modify the names of the sp_'s that are already written until
I have a chance to sit down with the other programmers, etc, and we come up
with a plan. My initial thought was to simply recreate the SPs changing
only the name, so we would have essentially 2 sp's that did the same thing,
just one sp_ and one usp_ names. Then we could start cutting over the names
in the code without breaking anything, that couldn't be fixed almost
immediately.
But this is my first time really taking on a task like this, does this sound
like a plan?
Do you have any articles or books to recommend to a new DBA/developer?
Thanks again,
Nancy
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:3rvgi15u21emupk0gfqe9pl5dpfk8kgl0s@.
4ax.com...
> On Wed, 14 Sep 2005 08:59:40 -0400, Nancy Lytle wrote:
>
> Hi Nancy,
> I guess that the "very slight" difference you see falls within the
> bounds of statistic inaccuracy. If you repeat the test a few times, you
> should see that there really is no difference between the two.
> As far as rewriting code, I'd say: find the right path between
> religiously rewriting everything (costly, time-consuming, and will
> introduce bugs, if only by typo's and copy/paste errors) on the one end,
> and leaving working code untouched on the other end.
> If you decide to start rewriting where it's needed most, then begin with
> the use of =* and *= for outer joins, as they are on the deprecated
> feature list. Next should be the sp_ prefix and the use of SELECT *
> (both are performance killers in their own ways; both induce a risk of
> unexpectedly breaking your code when some change is made elsewhere).
> The join syntax for inner joins (i.e. the use of comma-delimited table
> list without any =* or *=) should be last on your list, as this is only
> a readability improvement (and not all experts would agree that it's an
> improvement - there are a few SQL experts who think that the "old style"
> join notation is often better, though I'm not one of them).
> Best, Hugo
> --
> (Remove _NO_ and _SPAM_ to get my e-mail address)|||On Thu, 15 Sep 2005 09:51:15 -0400, Nancy Lytle wrote:

>Thanks for the suggestions.
>I did a complete search and the locations for the *= code is in dynamic sql
>written in .asp pages. So, since I am changing the dynamic sql to a stored
>procedure, I can get rid of the *= and select * and created proper procedur
e
>naming all at the same time.
Hi Nancy,
Wow, that's a major improvement - getting rid of two major pitfalls at
once!

>I will wait to modify the names of the sp_'s that are already written until
>I have a chance to sit down with the other programmers, etc, and we come up
>with a plan. My initial thought was to simply recreate the SPs changing
>only the name, so we would have essentially 2 sp's that did the same thing,
>just one sp_ and one usp_ names. Then we could start cutting over the name
s
>in the code without breaking anything, that couldn't be fixed almost
>immediately.
>But this is my first time really taking on a task like this, does this soun
d
>like a plan?
Discussing things with the developers is definitely a great idea. As
long as your modifications are invisible to them (such as replacing
dynamic =* crap with non-dynamic OUTER JOINs), you could do you work in
silence (though I'd even recommend communicating your actions in that
case). But if your changes are going toa ffect the developers (and they
will if you intend to eventually remove the badly named stored
procedures), they should be informed, and invited to participate.
But if you are renaming, then I'd just drop the prefix completely. I've
never managed to see the added value of
EXEC usp_MakeMonthlyReport
over
EXEC MakeMonthlyReport

>Do you have any articles or books to recommend to a new DBA/developer?
http://www.aspfaq.com/show.asp?id=2423
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)

No comments:

Post a Comment