ramblings on PHP, SQL, the web, politics, ultimate frisbee and what else is on in my life
back

Interesting twist on the groupwise problem

The following is a little SQL challenge I have so far been unable to complte myself. Lets say I have a table with transactions for different products.

CREATE TABLE transactions (
  product_id int(11) NOT NULL,
  transaction_id int(11) NOT NULL,
  price decimal(18,2) NOT NULL
);

The transaction_id's increase with time. In reality there is a transaction_date column, but I wanted to keep the DDL as portable as possible for this example. Here is some sample data.

INSERT INTO transactions VALUES (1, 1, '2.00');
INSERT INTO transactions VALUES (1, 2, '3.00');
INSERT INTO transactions VALUES (1, 4, '3.00');
INSERT INTO transactions VALUES (1, 5, '1.00');
INSERT INTO transactions VALUES (2, 6, '3.00');
INSERT INTO transactions VALUES (2, 7, '5.00');
INSERT INTO transactions VALUES (2, 8, '3.00');
INSERT INTO transactions VALUES (2, 9, '3.00');

Now a query to get the last price for each product could look like this. Note I kinda like subqueries for readability. There are numerous ways of doing things. The bulk of which Jan compiled on his website.

SELECT product_id, transaction_id, price
FROM transactions
WHERE transaction_id
IN (
  SELECT MAX( transaction_id )
  FROM transactions
  GROUP BY product_id
)

I might also be interested in seeing how the market is developing so I want to fetch the second to last price per product_id as well. I did not really consider efficiency here yet, but the following works (tm) from a functional perspective.

SELECT product_id, transaction_id, price
FROM transactions
WHERE transaction_id
IN (
  SELECT MAX( transaction_id )
  FROM transactions
  WHERE transaction_id NOT IN (
    SELECT MAX( transaction_id )
    FROM transactions
    GROUP BY product_id
  )
  GROUP BY product_id
)

Next up I come to the interesting twist. What if I do not want the second to last price per product_id? What if I actually want the last price per product_id that is different from the last transaction_id for that product_id?

For a single product that is fairly easy if we know the product_id we are interested along with the last price:

SELECT price
FROM transactions
WHERE transaction_id
IN (
  SELECT MAX( transaction_id )
  FROM transactions
  WHERE product_id =1
  AND price != 1.00
)

What I want though is get this price for all product_id's in a single query, because doing multiple queries unsuprisingly gave me horrible performance. Any suggestions would be greatly appreciated and could lead to a reduction in open items of someones wishlist :-)

Comments



Re: Interesting twist on the groupwise problem

I'm not sure this is what you're looking for, and I'm sure you can collapse at least one of the subqueries, but as a quick'n'dirty solution this one might do (tested with pgsql):


SELECT product_id, transaction_id, price
FROM transactions A
WHERE A.transaction_id
IN 
(
  SELECT MAX( transaction_id )
    FROM transactions B
   WHERE A.product_id=B.product_id 
     AND B.price NOT IN (
        SELECT price
          FROM transactions C
         WHERE C.product_id = B.product_id
           AND transaction_id = (
        SELECT MAX( transaction_id )
              FROM transactions D
                 WHERE D.product_id = B.product_id
       )
     )
  GROUP BY product_id
)

Re: Interesting twist on the groupwise problem

Seems to do the trick perfectly!
Now thats its there it does not look like rocket science, but I was just not able to do the necessary steps on my own.

I might have to rewrite things away from the subquery entirely it seems if I want to retain MySQL 4.0 compatibility. Though I only need that because thats the MySQL version on our current online demo server. Might just update the beast.

Thx Lorenzo!

Re: Interesting twist on the groupwise problem

SELECT r.product_id, r.transaction_id, r.price as recentPrice,
p.transaction_id as previousTransaction, p.price AS previousPrice

FROM transactions r

INNER JOIN (SELECT product_id, MAX(transaction_id) AS transaction_id FROM transactions
GROUP BY product_id) AS t
ON t.product_id = r.product_id AND t.transaction_id = r.transaction_id

-- All previous transactions for this product at a differing price
INNER JOIN transactions p ON p.product_id = r.product_id
AND p.transaction_id < r.transaction_id
AND p.price != r.price

-- Want no transactions between the recent transaction and the most recent price change
LEFT JOIN transactions u ON u.product_id = r.product_id
AND u.transaction_id > p.transaction_id AND u.transaction_id < r.transaction_id
AND u.price != r.price

WHERE u.transaction_id IS NULL

I think.

Re: Interesting twist on the groupwise problem

Also a very nice solution. Still uses a subquery (but like I said I should just upgrade that demo server) but has the nice touch of also fetching the last price. Though in order to also fetch rows with no previous transaction the second INNER JOIN needs to be modified to be a LEFT JOIN.

In order to test this I added a new row to the table.

INSERT INTO `transactions` VALUES (3, 10, '2.00');

Re: Interesting twist on the groupwise problem

Think probably could factor out the subquery.. with a join and IS NULL to ensure no higher transaction_ids are available.

And yes, if want the rows without previous differing prices the inner should be a left.

Re: Interesting twist on the groupwise problem

To make things slightly more interesting I changed the last price for product_id to 6.00, so that there would be both increasing and dropping prices.

I then inlined the subquery. Since I do not need the transaction_id's things worked out as follows:

SELECT r.product_id, MAX(r.price) last, MAX(p.price) previous
FROM transactions r

LEFT JOIN transactions t ON r.product_id = t.product_id
AND r.transaction_id < t.transaction_id
AND r.price != t.price

LEFT JOIN transactions p ON p.product_id = r.product_id
AND p.transaction_id < r.transaction_id
AND p.price != r.price

LEFT JOIN transactions u ON u.product_id = r.product_id
AND u.transaction_id > p.transaction_id
AND u.transaction_id < r.transaction_id
AND u.price != r.price

WHERE u.product_id IS NULL AND t.product_id IS NULL
GROUP BY product_id

Re: Interesting twist on the groupwise problem

*sigh* .. it now seems that its impossible to come up with a solution that works both on MySQL 4.0.x and Oracle 8i.

Since the latter is more important (and usually harder to upgrade) I had to rewrite things from ANSI style joins to alternatively using Oracle style outer joins. However Oracle 8i seems to be limited in the number of outer joins per table per query. So I went in and rewrite 2 of the left outer joins to sub queries. Thereby I killed off MySQL 4.0.x support. So it goes.

The following queries also handle a nasty side effect which is caused by transactions like the following:

INSERT INTO transactions VALUES (4, 11, '18.00');
INSERT INTO transactions VALUES (4, 12, '16.00');
INSERT INTO transactions VALUES (4, 13, '15.00');
INSERT INTO transactions VALUES (4, 14, '15.00');
INSERT INTO transactions VALUES (4, 15, '18.00');

So here they are:

SELECT L.product_id, MAX( L.price ) last, MAX( O.price ) previous
FROM transactions L
LEFT JOIN transactions O ON O.product_id = L.product_id
AND O.transaction_id < L.transaction_id
AND O.price != L.price
AND O.transaction_id
IN (

SELECT MAX( B.transaction_id )
FROM transactions B
WHERE O.product_id = B.product_id
AND B.price NOT
IN (

SELECT C.price
FROM transactions C
WHERE O.product_id = C.product_id
AND C.transaction_id = (
SELECT MAX( D.transaction_id )
FROM transactions D
WHERE O.product_id = D.product_id )
)
GROUP BY O.product_id
)
WHERE L.transaction_id = (
SELECT MAX( E.transaction_id )
FROM transactions E
WHERE E.product_id = L.product_id )
GROUP BY L.product_id

Optionally using Oracle style joins is just a single if-statement in the code and should work out to an SQL query like the following:

SELECT L.product_id, MAX( L.price ) last, MAX( O.price ) previous
FROM transactions L, transactions O
WHERE O.product_id (+) = L.product_id
AND O.transaction_id < L.transaction_id
AND O.price != L.price
AND O.transaction_id
IN (

SELECT MAX( B.transaction_id )
FROM transactions B
WHERE O.product_id = B.product_id
AND B.price NOT
IN (

SELECT C.price
FROM transactions C
WHERE O.product_id = C.product_id
AND C.transaction_id = (
SELECT MAX( D.transaction_id )
FROM transactions D
WHERE O.product_id = D.product_id )
)
GROUP BY O.product_id
)
WHERE L.transaction_id = (
SELECT MAX( E.transaction_id )
FROM transactions E
WHERE E.product_id = L.product_id )
GROUP BY L.product_id