Query Syntax: Query training -- Complex queries

Can anyone please suggest a book (or a training class) that gets into
the guts of complex queries? I need to understand, for example, the
internal differences between this update query:
Update PositionsEOM
Set MktVal = PM.Price * PositionsEOM.Quantity_ASD
From PositionsEOM Inner Join [Price Master] As PM On
PositionsEOM.Fund_ID = PM.Fund_ID And
PositionsEOM.Position_Date = PM.Price_Date
and this one:
Update PositionsEOM
Set MktVal = Derived.MktVal From
(Select PM.Price * PositionsEOM.Quantity_ASD As MktVal
From PositionsEOM Inner Join [Price Master] As PM On
PositionsEOM.Fund_ID = PM.Fund_ID And
PositionsEOM.Position_Date = PM.Price_Date) As Derived
The second example ran for 20 minutes and I cancelled it; the first one
finished in 2.5 minutes. You wouldn't KNOW that you could write "Set"
expressions like the first example, from ANY of the online
documentation. Even though the second example uses a derived table, I
don't see much of a logical difference between them, and would not
expect the run time to be different.
Also, I decided that when you mix inner and outer joins in a select,
which I do sometimes, if you change the order, the result set WILL be
different (not just the time it takes). (I think this is true...) When
you build complicated select statements in Enterprise Manager's view
designer window, it doesn't really take this into account (you can say
"Select all rows from tableA", but you might really be getting all rows
from the result of the inner join on tableA and tableB).
I found one online article that mentioned that if you mix inner and
outer joins in a query, you CAN control the order of the joins -- and it
shows various ways to do that, including parentheses and derived or
virtual tables. But only ONE online article. It's VERY rare to see any
of this documented, and SQL Books Online sure doesn't show
examples.......
The examples for Update queries usually show that you can set price to
price*1.05, and helpfully explains that this will increase all prices by
5 percent. That's trivial. Then they move on to other subjects...
I don't know where to come across such learning; SQL Books Online sure
doesn't go into these kinds of examples. Their syntax diagrams are
probably complete, but of course they are huge and somewhat hard to
understand.
Any help will be greatly appreciated.
David WalkerYou might want to check out _Advanced Transact-SQL For SQL Server_ by Itizik
Ben-Gan and Tom Moreau. Pretty good book.
As for your queries, the first one filters the rows to update with the JOIN.
The second one will update every single row in the PositionsEOM table once
for every row returned in the derived table, as there is no correlation
between the two tables (you have not JOINed them.)
"DW" <None> wrote in message news:u#t6Wj##DHA.3272@.TK2MSFTNGP09.phx.gbl...
> Can anyone please suggest a book (or a training class) that gets into
> the guts of complex queries? I need to understand, for example, the
> internal differences between this update query:
> Update PositionsEOM
> Set MktVal = PM.Price * PositionsEOM.Quantity_ASD
> From PositionsEOM Inner Join [Price Master] As PM On
> PositionsEOM.Fund_ID = PM.Fund_ID And
> PositionsEOM.Position_Date = PM.Price_Date
> and this one:
> Update PositionsEOM
> Set MktVal = Derived.MktVal From
> (Select PM.Price * PositionsEOM.Quantity_ASD As MktVal
> From PositionsEOM Inner Join [Price Master] As PM On
> PositionsEOM.Fund_ID = PM.Fund_ID And
> PositionsEOM.Position_Date = PM.Price_Date) As Derived
> The second example ran for 20 minutes and I cancelled it; the first one
> finished in 2.5 minutes. You wouldn't KNOW that you could write "Set"
> expressions like the first example, from ANY of the online
> documentation. Even though the second example uses a derived table, I
> don't see much of a logical difference between them, and would not
> expect the run time to be different.
> Also, I decided that when you mix inner and outer joins in a select,
> which I do sometimes, if you change the order, the result set WILL be
> different (not just the time it takes). (I think this is true...) When
> you build complicated select statements in Enterprise Manager's view
> designer window, it doesn't really take this into account (you can say
> "Select all rows from tableA", but you might really be getting all rows
> from the result of the inner join on tableA and tableB).
> I found one online article that mentioned that if you mix inner and
> outer joins in a query, you CAN control the order of the joins -- and it
> shows various ways to do that, including parentheses and derived or
> virtual tables. But only ONE online article. It's VERY rare to see any
> of this documented, and SQL Books Online sure doesn't show
> examples.......
> The examples for Update queries usually show that you can set price to
> price*1.05, and helpfully explains that this will increase all prices by
> 5 percent. That's trivial. Then they move on to other subjects...
> I don't know where to come across such learning; SQL Books Online sure
> doesn't go into these kinds of examples. Their syntax diagrams are
> probably complete, but of course they are huge and somewhat hard to
> understand.
> Any help will be greatly appreciated.
> David Walker|||DW (None) writes:
> Can anyone please suggest a book (or a training class) that gets into
> the guts of complex queries? I need to understand, for example, the
> internal differences between this update query:
> Update PositionsEOM
> Set MktVal = PM.Price * PositionsEOM.Quantity_ASD
> From PositionsEOM Inner Join [Price Master] As PM On
> PositionsEOM.Fund_ID = PM.Fund_ID And
> PositionsEOM.Position_Date = PM.Price_Date
> and this one:
> Update PositionsEOM
> Set MktVal = Derived.MktVal From
> (Select PM.Price * PositionsEOM.Quantity_ASD As MktVal
> From PositionsEOM Inner Join [Price Master] As PM On
> PositionsEOM.Fund_ID = PM.Fund_ID And
> PositionsEOM.Position_Date = PM.Price_Date) As Derived
> The second example ran for 20 minutes and I cancelled it; the first one
> finished in 2.5 minutes.
As Adam pointed out, the problem with the second query is that the derived
table is not correlated with the updated table, so you get a magnificent
cross join, and I'm not surprise that it did not complete in 20 minutes.
Since I don't know your tables, I cannot really a suggest a better way
to use the derived table. What you could write, is this:
Update PositionsEOM
Set MktVal = PositionsEOM.Quantity_ASD *
(Select PM.Price
FROM [Price Master] As PM
WHERE PositionsEOM.Fund_ID = PM.Fund_ID And
AND PositionsEOM.Position_Date = PM.Price_Date)
Here you don't have a derived table, but a correlated subquery. This is
actually the way you should do it, if you stick to ANSI. I need to
confess, though, that I prefer the syntax with FROM over the using
a correlated subquery.

> You wouldn't KNOW that you could write "Set" expressions like the
> first example, from ANY of the online documentation.
Nah, I would say that this is quite clear from the syntax diagram for
UPDATE in Books Online.

> Also, I decided that when you mix inner and outer joins in a select,
> which I do sometimes, if you change the order, the result set WILL be
> different (not just the time it takes). (I think this is true...)
Yes, this is correct. As long as you are only using inner joins, you
can have them in any order, but with left joins the table are turn.
Consider:
SELECT ...
FROM a
LEFT JOIN b ON a.col = b.col
JOIN c ON b.col2 = c.col2
What you probably want is:
SELECT
FROM a
LEFT JOIN (b JOIN c ON b.col2 = c.col2) ON a.col = b.col
In the first case, you take all rows in a, and if there is no matching
rows in b, you set all b:s colums to NULL. But then join the table
(a LEFT JOIN b) with c, and you will include rows that matches in c,
so all those NULL rows in b are now dropped.
In the second case you first join b with c, and then join this table
with a.

> I found one online article that mentioned that if you mix inner and
> outer joins in a query, you CAN control the order of the joins -- and it
> shows various ways to do that, including parentheses and derived or
> virtual tables. But only ONE online article. It's VERY rare to see any
> of this documented, and SQL Books Online sure doesn't show
> examples.......
It should be in the ANSI-92 standard for SQL, but it's hardly the kind
of book you read first.
The best book for reading about the various join techniques, is Joe
Celko's "SQL For Smarties", but of course it has the drawback that it
is not written for SQL Server. Tom Moreau's and Itzik Ben-Gan's book
is a better choice, if you want something which is more on target. And
quite some of these subjects are covered in Kalen Delaney's "Inside
SQL Server 2000". It was from her SQL7 book, that I learnt to use
paranthesis with JOIN.
Erland Sommarskog, SQL Server MVP, sommar@.algonet.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Erland Sommarskog <sommar@.algonet.se> wrote in
news:Xns949B671D1EA5Yazorman@.127.0.0.1:

> As Adam pointed out, the problem with the second query is that the
> derived table is not correlated with the updated table, so you get a
> magnificent cross join, and I'm not surprise that it did not complete
> in 20 minutes.
Thanks to both of you for the answers. I will check out those books on
Amazon.
Regarding this join:
Update PositionsEOM
Set MktVal = Derived.MktVal From
(Select PM.Price * PositionsEOM.Quantity_ASD As MktVal
From PositionsEOM Inner Join [Price Master] As PM On
PositionsEOM.Fund_ID = PM.Fund_ID And
PositionsEOM.Position_Date = PM.Price_Date) As Derived
Why is this a cross join? It has the phrase "Inner Join" in it (with
two conditions, hence the AND). I'm confused...
Thanks.
David Walker|||Hi David,
Thanks for you update.
First, I recommend the book I always use: <Inside SQL Server 2000> by Kalen
Delaney.
Second, let's take a little example, you could run in you Query Analyzer:
use pubs
go
if exists (select * from dbo.sysobjects where id =
object_id(N'[dbo].[authorsx]') and OBJECTPROPERTY(id, N'IsUserTable'
) = 1)
drop table [dbo].[authorsx]
GO
CREATE TABLE [dbo].[authorsx] (
[au_id] [id] NOT NULL ,
[au_lname] [varchar] (40) COLLATE SQL_Latin1_General_CP1_CI_AS NOT N
ULL ,
[au_fname] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT N
ULL ,
[phone] [char] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[address] [varchar] (40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[city] [varchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[state] [char] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[zip] [char] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[contract] [bit] NOT NULL
) ON [PRIMARY]
GO
insert into authorsx select * from authors
Then
SET SHOWPLAN_TEXT ON
go
update authors
set authors.state=authorsx.state
from authors inner join authorsx
on authors.au_id=authorsx.au_id
go
update authors
set authors.state=xxx.state
from
(select authors.state
from authors inner join authorsx
on authors.au_id=authorsx.au_id) as xxx
--execution plan of first one ( similar to your first query)
|--Clustered Index
Update(OBJECT

[pubs].[dbo].[authors].[UPKCL_auidind]),
SET

[authors].[state]=[authorsx].[state]))
|--Top(ROWCOUNT est 0)
|--Sort(DISTINCT ORDER BY

[Bmk1000] ASC))
|--Nested Loops(Inner Join, OUTER
REFERENCES

[authorsx].[au_id]))
|--Table Scan(OBJECT

[pubs].[dbo].[authorsx]))
|--Clustered Index
Seek(OBJECT

[pubs].[dbo].[authors].[UPKCL_auidind]),
SEEK

[authors].[au_id]=[authorsx].[au_id]) ORDERED FORWARD)
--execution plan of the second one (simular to your second query)
|--Clustered Index
Update(OBJECT

[pubs].[dbo].[authors].[UPKCL_auidind]),
SET

[authors].[state]=[authors].[state]))
|--Top(ROWCOUNT est 0)
|--Sort(DISTINCT ORDER BY

[Bmk1000] ASC))
|--Nested Loops(Inner Join)
|--Nested Loops(Inner Join, OUTER
REFERENCES

[authorsx].[au_id]))
| |--Table Scan(OBJECT

[pubs].[dbo].[authorsx]))
| |--Clustered Index
Seek(OBJECT

[pubs].[dbo].[authors].[UPKCL_auidind]),
SEEK

[authors].[au_id]=[authorsx].[au_id]) ORDERED FORWARD)
|--Index
Scan(OBJECT

[pubs].[dbo].[authors].[test_index]), ORDERED F
ORWARD)
go
You will notice that there are tow nested loops in your second query but
there is only one in the first query.
So, in you last query, when update one row in your table authors, you have
to scan the table 'authorsx' one time. this is just like how the cross join
works:
From the SQL Server Books Online:
A cross join that does not have a WHERE clause produces the Cartesian
product of the tables involved in the join. The size of a Cartesian product
result set is the number of rows in the first table multiplied by the
number of rows in the second table. This is an example of a Transact-SQL
cross join:
USE pubs
SELECT au_fname, au_lname, pub_name
FROM authors CROSS JOIN publishers
ORDER BY au_lname DESC
The result set contains 184 rows (authors has 23 rows and publishers has 8;
23 multiplied by 8 equals 184).
Hope this helps.
Best regards
Baisong Wei
Microsoft Online Support
----
Get Secure! - www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
Please reply to newsgroups only. Thanks.|||Thanks for the response. I'll have to study this to understand why, when I
say "Inner Join", I get a cross join. I believe you, I just have to figure
it out in order to understand it.
Thanks again to all for helping me with the detailed responses.
David Walker|||I think I can sum it up for you quite simply:
If you INNER JOIN but fail to provide a correlation (e.g. this thing in this
table = this thing in that table), the server has no way of knowing which
row in each table belongs with each row in the other table. So instead it
does the only option left: It pairs EVERY row of each table with EVERY row
of the other table.
Does that make sense?
"DW" <None> wrote in message news:uv8SpTI$DHA.1456@.TK2MSFTNGP09.phx.gbl...
> Thanks for the response. I'll have to study this to understand why, when
I
> say "Inner Join", I get a cross join. I believe you, I just have to
figure
> it out in order to understand it.
> Thanks again to all for helping me with the detailed responses.
> David Walker|||Could you guys, who have been so helpful, please explain how the cross join
comes about in the "derived" case?
It's clearly there, I just want to know how to align my thinking so that
it's obvious to me!!!
Thanks.
David Walker|||"Adam Machanic" <amachanic@.air-worldwide.nospamallowed.com> wrote in
news:eIwyUbI$DHA.392@.TK2MSFTNGP12.phx.gbl:

> I think I can sum it up for you quite simply:
> If you INNER JOIN but fail to provide a correlation (e.g. this thing
> in this table = this thing in that table), the server has no way of
> knowing which row in each table belongs with each row in the other
> table. So instead it does the only option left: It pairs EVERY row
> of each table with EVERY row of the other table.
> Does that make sense?
>
Yes, that helps. It's becoming clearer as I read these responses.
So where I have:
Set tblA.field = derivedtbl.field from (inner join) as derivedtbl
...doesn't "correlate" the Set statement?
David Walker|||"DW" <None> wrote in message news:#7CLN2I$DHA.2072@.TK2MSFTNGP11.phx.gbl...
> Set tblA.field = derivedtbl.field from (inner join) as derivedtbl
> ...doesn't "correlate" the Set statement?
No; roughly translated to English, that says ONLY: "Set the value of
tblA.field to the value of derivedtbl.field."
What you want it to say is: "Set the value of tblA.field to the value of
derivedtbl.field when the row to update in tblA is identified as a row that
corresponds to a row in derivedtbl." (or even more precisely, "Update the
value of tblA.field only when it is identified as a corresponding row to one
in derivedtbl, and use only the value from derivedtbl.field from that
corresponding row to update tblA.field.")
In other words, the first statement gives the database no indication
about how or in what cases to update tblA.field; so it does so for every row
of tblA, and it does so once for every value it sees in derivedtbl. It has
no other choice. What you want to do instead is something like:
UPDATE tblA
SET tblA.field = derivedtbl.field
FROM TblA
INNER JOIN (SELECT field, id FROM ...) derivedtbl ON TblA.id =
derivedtbl.id
In this case, you're saying that for every row in derivedtbl, the server
should look for the row (or those rows) in tblA where the value of the
column "id" is the same as the value of the column "id" in derivedtbl. So
now the server knows which rows match and can therefore update the correct
rows, only once, rather than every row, for every possible value.

Wednesday, March 21, 2012

Query training -- Complex queries

No comments:

Post a Comment

Query Syntax

Blog Archive

About Me