Query Syntax: Query to group sequential items

Let's say I have the following table:

entry product quality
1 A 80
2 A 70
3 A 80
4 B 60
5 B 90
6 C 80
7 D 80
8 A 50
9 C 70

I'm looking for a way to find the average "quality" value for a
SEQUENTIAL GROUPING of the same Product. For exmple, I need an
average of Entry 1+2+3 (because this is the first grouping of the same
product type), but NOT want that average to include row 8 (which is
also Product A, but in a different "group".)

I'm sure it can be done (because I can describe it!), but I'll be a
monkey's uncle if I can figure out how. I would imagine it would
involve some sort of running tally that references the next record as
it goes... to see if the product type has changed. Perhaps use of a
temporary table?

Muchas gracias!!
Cy.Easy way ... cursor or loop thru as you stated.

WARNING - THE FOLLOWING IS AN UNTESTED HALF BACKED IDEA -
CONSUME AT YOUR OWN RISK

The set oriented way would require the addition of a grouping column,
initially null and populated via update statements from a temp table

use something like this to generate a set of the grouping transition rows.

-- GENERATED GROUP IDS AND GET MAX ENTRY IN GROUP
select
identity(int,1,1) as groupid
a.product,
a.entry
into #groupings
from mytable a
join mytable b on a.product != b.product and a.entry = b.entry + 1

-- UPDATES BASE TABLE WITH FOR MAX ENTRY IN GROUP
update a
set groupid = g.groupid
from mytable a
join #grouping g on a.entry = g.entry

-- UPDATES PRIOR ENTRIES IN GROUP
update a
set a.groupid = g.groupid
from mytable a
join #grouping g on a.entry < g.entry
where a.groupid is null

-- QUERY TO RETURN RESULTS YOU ARE LOOKING FOR
select groupid , min( product ) , max( entry ) , min( entry) , sum (
quantity ) , count(*) , avg( quantity)
from mytable
group by groupid

----

I am not so sure about the 2nd update here, as I am tired and going to bed
soon. you may also need to join to the grouping temp table on the product
and also put a not exists() in the where clause, but you may be covered by
the simple is null to prevent muliple updates.

Let me know how you make out, and if this points you in a good direction or
throws you off track.

<cyrus.kapadia@.us.pm.com> wrote in message
news:1102562637.046747.292110@.c13g2000cwb.googlegr oups.com...
> Let's say I have the following table:
> entry product quality
> 1 A 80
> 2 A 70
> 3 A 80
> 4 B 60
> 5 B 90
> 6 C 80
> 7 D 80
> 8 A 50
> 9 C 70
> I'm looking for a way to find the average "quality" value for a
> SEQUENTIAL GROUPING of the same Product. For exmple, I need an
> average of Entry 1+2+3 (because this is the first grouping of the same
> product type), but NOT want that average to include row 8 (which is
> also Product A, but in a different "group".)
> I'm sure it can be done (because I can describe it!), but I'll be a
> monkey's uncle if I can figure out how. I would imagine it would
> involve some sort of running tally that references the next record as
> it goes... to see if the product type has changed. Perhaps use of a
> temporary table?
> Muchas gracias!!
> Cy.|||<cyrus.kapadia@.us.pm.com> wrote in message
news:1102562637.046747.292110@.c13g2000cwb.googlegr oups.com...
> Let's say I have the following table:
> entry product quality
> 1 A 80
> 2 A 70
> 3 A 80
> 4 B 60
> 5 B 90
> 6 C 80
> 7 D 80
> 8 A 50
> 9 C 70
> I'm looking for a way to find the average "quality" value for a
> SEQUENTIAL GROUPING of the same Product. For exmple, I need an
> average of Entry 1+2+3 (because this is the first grouping of the same
> product type), but NOT want that average to include row 8 (which is
> also Product A, but in a different "group".)
> I'm sure it can be done (because I can describe it!), but I'll be a
> monkey's uncle if I can figure out how. I would imagine it would
> involve some sort of running tally that references the next record as
> it goes... to see if the product type has changed. Perhaps use of a
> temporary table?
> Muchas gracias!!
> Cy.

CREATE TABLE ProductEntries
(
product_entry INT NOT NULL PRIMARY KEY,
product_code CHAR(1) NOT NULL,
product_quality INT NOT NULL
)

INSERT INTO ProductEntries (product_entry, product_code, product_quality)
VALUES (1, 'A', 80)
INSERT INTO ProductEntries (product_entry, product_code, product_quality)
VALUES (2, 'A', 70)
INSERT INTO ProductEntries (product_entry, product_code, product_quality)
VALUES (3, 'A', 80)
INSERT INTO ProductEntries (product_entry, product_code, product_quality)
VALUES (4, 'B', 60)
INSERT INTO ProductEntries (product_entry, product_code, product_quality)
VALUES (5, 'B', 90)
INSERT INTO ProductEntries (product_entry, product_code, product_quality)
VALUES (6, 'C', 80)
INSERT INTO ProductEntries (product_entry, product_code, product_quality)
VALUES (7, 'D', 80)
INSERT INTO ProductEntries (product_entry, product_code, product_quality)
VALUES (8, 'A', 50)
INSERT INTO ProductEntries (product_entry, product_code, product_quality)
VALUES (9, 'C', 70)

SELECT PR.product_code AS product_code,
PR.start_product_entry AS start_product_entry,
MAX(P.product_entry) AS end_product_entry,
AVG(CAST(P.product_quality AS DECIMAL)) AS avg_product_quality
FROM (SELECT MIN(PE.product_entry) AS start_product_entry,
PE.next_product_entry AS end_product_entry,
PE.product_code
FROM (SELECT P1.product_entry, P1.product_code,
MIN(P2.product_entry) AS next_product_entry
FROM ProductEntries AS P1
LEFT OUTER JOIN
ProductEntries AS P2
ON P2.product_entry > P1.product_entry AND
P2.product_code <> P1.product_code
GROUP BY P1.product_entry, P1.product_code) AS PE
GROUP BY PE.product_code, PE.next_product_entry) AS PR
INNER JOIN
ProductEntries AS P
ON P.product_code = PR.product_code AND
P.product_entry >= PR.start_product_entry AND
(PR.end_product_entry IS NULL OR
P.product_entry < PR.end_product_entry)
GROUP BY PR.product_code, PR.start_product_entry
ORDER BY start_product_entry

product_code start_product_entry end_product_entry avg_product_quality
A 1 3 76.666666
B 4 5 75.000000
C 6 6 80.000000
D 7 7 80.000000
A 8 8 50.000000
C 9 9 70.000000

--
JAG|||Sure, that may work as well.

"John Gilson" <jag@.acm.org> wrote in message
news:5zQtd.72060$Vk6.20781@.twister.nyc.rr.com...
> <cyrus.kapadia@.us.pm.com> wrote in message
> news:1102562637.046747.292110@.c13g2000cwb.googlegr oups.com...
>> Let's say I have the following table:
>>
>> entry product quality
>> 1 A 80
>> 2 A 70
>> 3 A 80
>> 4 B 60
>> 5 B 90
>> 6 C 80
>> 7 D 80
>> 8 A 50
>> 9 C 70
>>
>> I'm looking for a way to find the average "quality" value for a
>> SEQUENTIAL GROUPING of the same Product. For exmple, I need an
>> average of Entry 1+2+3 (because this is the first grouping of the same
>> product type), but NOT want that average to include row 8 (which is
>> also Product A, but in a different "group".)
>>
>> I'm sure it can be done (because I can describe it!), but I'll be a
>> monkey's uncle if I can figure out how. I would imagine it would
>> involve some sort of running tally that references the next record as
>> it goes... to see if the product type has changed. Perhaps use of a
>> temporary table?
>>
>> Muchas gracias!!
>> Cy.
> CREATE TABLE ProductEntries
> (
> product_entry INT NOT NULL PRIMARY KEY,
> product_code CHAR(1) NOT NULL,
> product_quality INT NOT NULL
> )
> INSERT INTO ProductEntries (product_entry, product_code, product_quality)
> VALUES (1, 'A', 80)
> INSERT INTO ProductEntries (product_entry, product_code, product_quality)
> VALUES (2, 'A', 70)
> INSERT INTO ProductEntries (product_entry, product_code, product_quality)
> VALUES (3, 'A', 80)
> INSERT INTO ProductEntries (product_entry, product_code, product_quality)
> VALUES (4, 'B', 60)
> INSERT INTO ProductEntries (product_entry, product_code, product_quality)
> VALUES (5, 'B', 90)
> INSERT INTO ProductEntries (product_entry, product_code, product_quality)
> VALUES (6, 'C', 80)
> INSERT INTO ProductEntries (product_entry, product_code, product_quality)
> VALUES (7, 'D', 80)
> INSERT INTO ProductEntries (product_entry, product_code, product_quality)
> VALUES (8, 'A', 50)
> INSERT INTO ProductEntries (product_entry, product_code, product_quality)
> VALUES (9, 'C', 70)
> SELECT PR.product_code AS product_code,
> PR.start_product_entry AS start_product_entry,
> MAX(P.product_entry) AS end_product_entry,
> AVG(CAST(P.product_quality AS DECIMAL)) AS
> avg_product_quality
> FROM (SELECT MIN(PE.product_entry) AS start_product_entry,
> PE.next_product_entry AS end_product_entry,
> PE.product_code
> FROM (SELECT P1.product_entry, P1.product_code,
> MIN(P2.product_entry) AS
> next_product_entry
> FROM ProductEntries AS P1
> LEFT OUTER JOIN
> ProductEntries AS P2
> ON P2.product_entry >
> P1.product_entry AND
> P2.product_code <>
> P1.product_code
> GROUP BY P1.product_entry, P1.product_code) AS
> PE
> GROUP BY PE.product_code, PE.next_product_entry) AS PR
> INNER JOIN
> ProductEntries AS P
> ON P.product_code = PR.product_code AND
> P.product_entry >= PR.start_product_entry AND
> (PR.end_product_entry IS NULL OR
> P.product_entry < PR.end_product_entry)
> GROUP BY PR.product_code, PR.start_product_entry
> ORDER BY start_product_entry
> product_code start_product_entry end_product_entry avg_product_quality
> A 1 3 76.666666
> B 4 5 75.000000
> C 6 6 80.000000
> D 7 7 80.000000
> A 8 8 50.000000
> C 9 9 70.000000
> --
> JAG|||That is too much work! Let's move the average calculation into a
scalar subquery that will be done last, after all the clusters are
found. The little-used = ALL predicate can replace a lot of your
logic. And we pull up the usual Sequence auxiliary table.

SELECT prod_code, MIN(start) AS start, finish,
(SELECT AVG(CAST(prod_quality AS DECIMAL(8,4)))
FROM ProductEntries AS P3
WHERE P3.prod_entry
BETWEEN MIN(start)
AND X.finish) AS avg_quality
FROM (SELECT P1.prod_code, S1.seq, MAX(S2.seq) AS finish
FROM ProductEntries AS P1, Sequence AS S1, Sequence AS S2
WHERE S1.seq <= S2.seq
AND S2.seq <= (SELECT MAX(prod_entry) FROM ProductEntries)
AND P1.prod_code
= ALL (SELECT P2.prod_code
FROM ProductEntries AS P2
WHERE P2.prod_entry BETWEEN S1.seq AND S2.seq)
GROUP BY P1.prod_code, S1.seq)
AS X (prod_code, start, finish)
GROUP BY prod_code, finish;

Another version requires two sentinal values
--
INSERT INTO ProductEntries VALUES (0, '?', 0);
INSERT INTO ProductEntries VALUES (10, '?', 0);

SELECT DISTINCT P1.prod_code, S1.seq AS start, S2.seq AS finish,
(SELECT AVG(CAST(prod_quality AS DECIMAL(8,4)))
FROM ProductEntries AS P3
WHERE P3.prod_entry
BETWEEN S1.seq AND S2.seq) AS avg_quality
FROM ProductEntries AS P1,
(SELECT seq FROM Sequence
UNION ALL SELECT 0) AS S1, Sequence AS S2
WHERE S1.seq <= S2.seq
AND S2.seq <= (SELECT MAX(prod_entry) + 1 FROM ProductEntries)
AND P1.prod_code
<> (SELECT P3.prod_code
FROM ProductEntries AS P3
WHERE P3.prod_entry = S1.seq - 1)
AND P1.prod_code
<> (SELECT P4.prod_code
FROM ProductEntries AS P4
WHERE P4.prod_entry = S2.seq + 1)
AND P1.prod_code
= ALL (SELECT P2.prod_code
FROM ProductEntries AS P2
WHERE P2.prod_entry BETWEEN S1.seq AND S2.seq);|||"--CELKO--" <jcelko212@.earthlink.net> wrote in message
news:1102618417.146169.127100@.c13g2000cwb.googlegr oups.com...
> That is too much work! Let's move the average calculation into a
> scalar subquery that will be done last, after all the clusters are
> found. The little-used = ALL predicate can replace a lot of your
> logic. And we pull up the usual Sequence auxiliary table.
> SELECT prod_code, MIN(start) AS start, finish,
> (SELECT AVG(CAST(prod_quality AS DECIMAL(8,4)))
> FROM ProductEntries AS P3
> WHERE P3.prod_entry
> BETWEEN MIN(start)
> AND X.finish) AS avg_quality
> FROM (SELECT P1.prod_code, S1.seq, MAX(S2.seq) AS finish
> FROM ProductEntries AS P1, Sequence AS S1, Sequence AS S2
> WHERE S1.seq <= S2.seq
> AND S2.seq <= (SELECT MAX(prod_entry) FROM ProductEntries)
> AND P1.prod_code
> = ALL (SELECT P2.prod_code
> FROM ProductEntries AS P2
> WHERE P2.prod_entry BETWEEN S1.seq AND S2.seq)
> GROUP BY P1.prod_code, S1.seq)
> AS X (prod_code, start, finish)
> GROUP BY prod_code, finish;
> Another version requires two sentinal values
> --
> INSERT INTO ProductEntries VALUES (0, '?', 0);
> INSERT INTO ProductEntries VALUES (10, '?', 0);
> SELECT DISTINCT P1.prod_code, S1.seq AS start, S2.seq AS finish,
> (SELECT AVG(CAST(prod_quality AS DECIMAL(8,4)))
> FROM ProductEntries AS P3
> WHERE P3.prod_entry
> BETWEEN S1.seq AND S2.seq) AS avg_quality
> FROM ProductEntries AS P1,
> (SELECT seq FROM Sequence
> UNION ALL SELECT 0) AS S1, Sequence AS S2
> WHERE S1.seq <= S2.seq
> AND S2.seq <= (SELECT MAX(prod_entry) + 1 FROM ProductEntries)
> AND P1.prod_code
> <> (SELECT P3.prod_code
> FROM ProductEntries AS P3
> WHERE P3.prod_entry = S1.seq - 1)
> AND P1.prod_code
> <> (SELECT P4.prod_code
> FROM ProductEntries AS P4
> WHERE P4.prod_entry = S2.seq + 1)
> AND P1.prod_code
> = ALL (SELECT P2.prod_code
> FROM ProductEntries AS P2
> WHERE P2.prod_entry BETWEEN S1.seq AND S2.seq);

Less work? Debatable. Also, this won't work if the product_entry values
aren't consecutive.

--
JAG|||>> Less work? Debatable. <<

Fewer nesting levels should be a bit faster. But trying to find the
start and finish points is going to get really bad as the number of row
increases.

>> Also, this won't work if the product_entry values
aren't consecutive. <<

It depends on the sequence of tests having no gaps.

This is one that might be better done with a cursor and a WHILE loop
that accumulates a count and total of each quality test to a working
table.

--CELKO--
Please post DDL in a human-readable format and not a machne-generated
one. This way people do not have to guess what the keys, constraints,
Declarative Referential Integrity, datatypes, etc. in your schema are.
Sample data is also a good idea, along with clear specifications.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!|||"--CELKO--" <remove.jcelko212@.earthlink.net> wrote in message
news:1102630139.1f7348d9b8d0e1527f37d16587e3cecc@.t eranews...
> >> Less work? Debatable. <<
> Fewer nesting levels should be a bit faster. But trying to find the
> start and finish points is going to get really bad as the number of row
> increases.
> >> Also, this won't work if the product_entry values
> aren't consecutive. <<
> It depends on the sequence of tests having no gaps.
> This is one that might be better done with a cursor and a WHILE loop
> that accumulates a count and total of each quality test to a working
> table.

You could be right but bite your tongue!

--
JAG

> --CELKO--
> Please post DDL in a human-readable format and not a machne-generated
> one. This way people do not have to guess what the keys, constraints,
> Declarative Referential Integrity, datatypes, etc. in your schema are.
> Sample data is also a good idea, along with clear specifications.
>
> *** Sent via Developersdex http://www.developersdex.com ***
> Don't just participate in USENET...get rewarded for it!|||Actually if this was an ongoing query: performance-wise it would be best to
have a trigger or "phase shift" grouping id set as part of the insert
operation.

"John Gilson" <jag@.acm.org> wrote in message
news:bo4ud.74477$Vk6.62153@.twister.nyc.rr.com...
> "--CELKO--" <remove.jcelko212@.earthlink.net> wrote in message
> news:1102630139.1f7348d9b8d0e1527f37d16587e3cecc@.t eranews...
>> >> Less work? Debatable. <<
>>
>> Fewer nesting levels should be a bit faster. But trying to find the
>> start and finish points is going to get really bad as the number of row
>> increases.
>>
>> >> Also, this won't work if the product_entry values
>> aren't consecutive. <<
>>
>> It depends on the sequence of tests having no gaps.
>>
>> This is one that might be better done with a cursor and a WHILE loop
>> that accumulates a count and total of each quality test to a working
>> table.
> You could be right but bite your tongue!
> --
> JAG
>> --CELKO--
>> Please post DDL in a human-readable format and not a machne-generated
>> one. This way people do not have to guess what the keys, constraints,
>> Declarative Referential Integrity, datatypes, etc. in your schema are.
>> Sample data is also a good idea, along with clear specifications.
>>
>>
>> *** Sent via Developersdex http://www.developersdex.com ***
>> Don't just participate in USENET...get rewarded for it!|||>> Actually if this was an ongoing query: performance-wise it would be
best to have a trigger or "phase shift" grouping id set as part of the
insert operation. <<

My impulse is for a "cluster group number" column as each test is done.
Look to see if the current quality test is on the same product as the
most recent one, etc.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

Monday, March 12, 2012

Query to group sequential items

No comments:

Post a Comment

Query Syntax

Blog Archive

About Me