Hi all, Iam new to DBA. I have tables with created indexes on all most all of
the columns. millions of records exists in this table. when i run the below
query, it is taking nearly 3 minutes to execute. How can i minimise the query
time ?
SELECT TOP 100 count(*) as cntDuplicatePat, FIRST_NAME, SURNAME, ADDRESS_1,
POSTCODE, DATE_OF_BIRTH
From TBL_Patient
where Merged_Into_ID is NULL AND inactive='N' and
PATIENT_ID NOT IN
(SELECT PATIENT_ID_1 FROM TBL_PATIENT_DUPLICATES WHERE STATUS <> 2)
GROUP BY First_name, Surname, Address_1, Postcode, Date_of_birth
HAVING (COUNT(*) >=2)
Also i would like to know how a query time can be minimized (on what factors
the query time depends?)On Wed, 2 Aug 2006 06:02:02 -0700, Vikas
<Vikas@.discussions.microsoft.com> wrote:
>Hi all, Iam new to DBA. I have tables with created indexes on all most all of
>the columns. millions of records exists in this table. when i run the below
>query, it is taking nearly 3 minutes to execute. How can i minimise the query
>time ?
Creating indexes on most of the columns is not usually the answer.
Creating them on the right columns, and on the right sets of columns
(with the columns in the right order) requires understanding of what
queries will be doing.
>SELECT TOP 100 count(*) as cntDuplicatePat, FIRST_NAME, SURNAME, ADDRESS_1,
>POSTCODE, DATE_OF_BIRTH
> From TBL_Patient
> where Merged_Into_ID is NULL AND inactive='N' and
> PATIENT_ID NOT IN
> (SELECT PATIENT_ID_1 FROM TBL_PATIENT_DUPLICATES WHERE STATUS <> 2)
> GROUP BY First_name, Surname, Address_1, Postcode, Date_of_birth
> HAVING (COUNT(*) >=2)
>Also i would like to know how a query time can be minimized (on what factors
>the query time depends?)
Using TOP without ORDER BY makes little sense. I would either drop
the TOP 100, or add "ORDER BY cntDuplicatePat DESC".
I would not expect indexing of TBL_Patient to help much with this
query. The only way it might is if a very small percentage of
patients were active, or a very small percentage satisfied the "
Merged_Into_ID is NULL" test.
However, indexing of TBL_PATIENT_DUPLICATES is another matter. How
many rows are in TBL_PATIENT_DUPLICATES? How many columns? How many
satisfy the test STATUS <> 2? Does it have an index on PATIENT_ID_1?
You might try a non-clustered index on the column pair (PATIENT_ID_1,
STATUS).
It probably will not perform any differently, but you could also try a
NOT EXISTS test in place of the IN.
SELECT count(*) as cntDuplicatePat,
FIRST_NAME, SURNAME, ADDRESS_1, POSTCODE, DATE_OF_BIRTH
FROM TBL_Patient as P
WHERE Merged_Into_ID is NULL
AND inactive='N'
AND NOT EXISTS
(SELECT * FROM TBL_PATIENT_DUPLICATES as D
WHERE P.PATIENT_ID = D.PATIENT_ID_1
AND STATUS <> 2)
GROUP BY First_name, Surname, Address_1, Postcode, Date_of_birth
Roy Harvey
Beacon Falls, CT|||Thanks Roy!
I could get some stuff from your side. I also want to know how to
design a database so that query time is minimized and also the updates are
faster. can you prefer any e-book or any single url please.
Vikas
"Roy Harvey" wrote:
> On Wed, 2 Aug 2006 06:02:02 -0700, Vikas
> <Vikas@.discussions.microsoft.com> wrote:
> >Hi all, Iam new to DBA. I have tables with created indexes on all most all of
> >the columns. millions of records exists in this table. when i run the below
> >query, it is taking nearly 3 minutes to execute. How can i minimise the query
> >time ?
> Creating indexes on most of the columns is not usually the answer.
> Creating them on the right columns, and on the right sets of columns
> (with the columns in the right order) requires understanding of what
> queries will be doing.
> >SELECT TOP 100 count(*) as cntDuplicatePat, FIRST_NAME, SURNAME, ADDRESS_1,
> >POSTCODE, DATE_OF_BIRTH
> > From TBL_Patient
> > where Merged_Into_ID is NULL AND inactive='N' and
> > PATIENT_ID NOT IN
> > (SELECT PATIENT_ID_1 FROM TBL_PATIENT_DUPLICATES WHERE STATUS <> 2)
> > GROUP BY First_name, Surname, Address_1, Postcode, Date_of_birth
> > HAVING (COUNT(*) >=2)
> >
> >Also i would like to know how a query time can be minimized (on what factors
> >the query time depends?)
> Using TOP without ORDER BY makes little sense. I would either drop
> the TOP 100, or add "ORDER BY cntDuplicatePat DESC".
> I would not expect indexing of TBL_Patient to help much with this
> query. The only way it might is if a very small percentage of
> patients were active, or a very small percentage satisfied the "
> Merged_Into_ID is NULL" test.
> However, indexing of TBL_PATIENT_DUPLICATES is another matter. How
> many rows are in TBL_PATIENT_DUPLICATES? How many columns? How many
> satisfy the test STATUS <> 2? Does it have an index on PATIENT_ID_1?
> You might try a non-clustered index on the column pair (PATIENT_ID_1,
> STATUS).
> It probably will not perform any differently, but you could also try a
> NOT EXISTS test in place of the IN.
> SELECT count(*) as cntDuplicatePat,
> FIRST_NAME, SURNAME, ADDRESS_1, POSTCODE, DATE_OF_BIRTH
> FROM TBL_Patient as P
> WHERE Merged_Into_ID is NULL
> AND inactive='N'
> AND NOT EXISTS
> (SELECT * FROM TBL_PATIENT_DUPLICATES as D
> WHERE P.PATIENT_ID = D.PATIENT_ID_1
> AND STATUS <> 2)
> GROUP BY First_name, Surname, Address_1, Postcode, Date_of_birth
> Roy Harvey
> Beacon Falls, CT
>|||Sorry, there are lots of participants here with a catalog of links to
articles and books, but I'm not one of them.
The place to start in database design is normalization. Until you
understand that, and follow it, and do it by reflex, any other attempt
to design for minimized query time is probably going to create more
problems than it answers. Once you have that, then most of it is
proper indexing and queries.
Good luck!
Roy
On Wed, 2 Aug 2006 07:48:01 -0700, Vikas
<Vikas@.discussions.microsoft.com> wrote:
>Thanks Roy!
> I could get some stuff from your side. I also want to know how to
>design a database so that query time is minimized and also the updates are
>faster. can you prefer any e-book or any single url please.
>Vikas|||Vikas wrote:
> Hi all, Iam new to DBA. I have tables with created indexes on all most all of
> the columns. millions of records exists in this table. when i run the below
> query, it is taking nearly 3 minutes to execute. How can i minimise the query
> time ?
> SELECT TOP 100 count(*) as cntDuplicatePat, FIRST_NAME, SURNAME, ADDRESS_1,
> POSTCODE, DATE_OF_BIRTH
> From TBL_Patient
> where Merged_Into_ID is NULL AND inactive='N' and
> PATIENT_ID NOT IN
> (SELECT PATIENT_ID_1 FROM TBL_PATIENT_DUPLICATES WHERE STATUS <> 2)
> GROUP BY First_name, Surname, Address_1, Postcode, Date_of_birth
> HAVING (COUNT(*) >=2)
> Also i would like to know how a query time can be minimized (on what factors
> the query time depends?)
I'm curious about one part of your WHERE clause:
PATIENT_ID NOT IN
(SELECT PATIENT_ID_1 FROM TBL_PATIENT_DUPLICATES WHERE STATUS <> 2)
Isn't this the same as:
PATIENT_ID IN
(SELECT PATIENT_ID_1 FROM TBL_PATIENT_DUPLICATES WHERE STATUS = 2)
Written the first way, you're most likely looking at an index or table
scan, whereas the second method will, assuming "status = 2" is selective
enough, use an index seek.
Also, if the second method is true, you might consider doing this as an
INNER JOIN instead of an IN.
--
Tracy McKibben
MCDBA
http://www.realsqlguy.com|||On Thu, 03 Aug 2006 08:55:52 -0500, Tracy McKibben
<tracy@.realsqlguy.com> wrote:
>I'm curious about one part of your WHERE clause:
> PATIENT_ID NOT IN
> (SELECT PATIENT_ID_1 FROM TBL_PATIENT_DUPLICATES WHERE STATUS <> 2)
>Isn't this the same as:
> PATIENT_ID IN
> (SELECT PATIENT_ID_1 FROM TBL_PATIENT_DUPLICATES WHERE STATUS = 2)
>Written the first way, you're most likely looking at an index or table
>scan, whereas the second method will, assuming "status = 2" is selective
>enough, use an index seek.
>Also, if the second method is true, you might consider doing this as an
>INNER JOIN instead of an IN.
Suppose there is NO row in TBL_PATIENT_DUPLICATES with a specific
PATIEND_ID. With the original version the NOT IN will be satisfied.
With the alternate version it will not be satisfied.
Roy Harvey
BeacoN Falls, CT|||Roy Harvey wrote:
> On Thu, 03 Aug 2006 08:55:52 -0500, Tracy McKibben
> <tracy@.realsqlguy.com> wrote:
>> I'm curious about one part of your WHERE clause:
>> PATIENT_ID NOT IN
>> (SELECT PATIENT_ID_1 FROM TBL_PATIENT_DUPLICATES WHERE STATUS <> 2)
>> Isn't this the same as:
>> PATIENT_ID IN
>> (SELECT PATIENT_ID_1 FROM TBL_PATIENT_DUPLICATES WHERE STATUS = 2)
>> Written the first way, you're most likely looking at an index or table
>> scan, whereas the second method will, assuming "status = 2" is selective
>> enough, use an index seek.
>> Also, if the second method is true, you might consider doing this as an
>> INNER JOIN instead of an IN.
> Suppose there is NO row in TBL_PATIENT_DUPLICATES with a specific
> PATIEND_ID. With the original version the NOT IN will be satisfied.
> With the alternate version it will not be satisfied.
> Roy Harvey
> BeacoN Falls, CT
Doh! Makes perfect sense, thanks...
Tracy McKibben
MCDBA
http://www.realsqlguy.com
Monday, February 20, 2012
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment