The following is a little SQL challenge I have so far been unable to complte myself. Lets say I have a table with transactions for different products.
CREATE TABLE transactions ( product_id int(11) NOT NULL, transaction_id int(11) NOT NULL, price decimal(18,2) NOT NULL );
The transaction_id's increase with time. In reality there is a transaction_date column, but I wanted to keep the DDL as portable as possible for this example. Here is some sample data.
INSERT INTO transactions VALUES (1, 1, '2.00'); INSERT INTO transactions VALUES (1, 2, '3.00'); INSERT INTO transactions VALUES (1, 4, '3.00'); INSERT INTO transactions VALUES (1, 5, '1.00'); INSERT INTO transactions VALUES (2, 6, '3.00'); INSERT INTO transactions VALUES (2, 7, '5.00'); INSERT INTO transactions VALUES (2, 8, '3.00'); INSERT INTO transactions VALUES (2, 9, '3.00');
Now a query to get the last price for each product could look like this. Note I kinda like subqueries for readability. There are numerous ways of doing things. The bulk of which Jan compiled on his website.
SELECT product_id, transaction_id, price FROM transactions WHERE transaction_id IN ( SELECT MAX( transaction_id ) FROM transactions GROUP BY product_id )
I might also be interested in seeing how the market is developing so I want to fetch the second to last price per product_id as well. I did not really consider efficiency here yet, but the following works (tm) from a functional perspective.
SELECT product_id, transaction_id, price FROM transactions WHERE transaction_id IN ( SELECT MAX( transaction_id ) FROM transactions WHERE transaction_id NOT IN ( SELECT MAX( transaction_id ) FROM transactions GROUP BY product_id ) GROUP BY product_id )
Next up I come to the interesting twist. What if I do not want the second to last price per product_id? What if I actually want the last price per product_id that is different from the last transaction_id for that product_id?
For a single product that is fairly easy if we know the product_id we are interested along with the last price:
SELECT price FROM transactions WHERE transaction_id IN ( SELECT MAX( transaction_id ) FROM transactions WHERE product_id =1 AND price != 1.00 )
What I want though is get this price for all product_id's in a single query, because doing multiple queries unsuprisingly gave me horrible performance. Any suggestions would be greatly appreciated and could lead to a reduction in open items of someones wishlist :-)
I'm not sure this is what you're looking for, and I'm sure you can collapse at least one of the subqueries, but as a quick'n'dirty solution this one might do (tested with pgsql):
SELECT product_id, transaction_id, price FROM transactions A WHERE A.transaction_id IN ( SELECT MAX( transaction_id ) FROM transactions B WHERE A.product_id=B.product_id AND B.price NOT IN ( SELECT price FROM transactions C WHERE C.product_id = B.product_id AND transaction_id = ( SELECT MAX( transaction_id ) FROM transactions D WHERE D.product_id = B.product_id ) ) GROUP BY product_id )
Seems to do the trick perfectly! Now thats its there it does not look like rocket science, but I was just not able to do the necessary steps on my own.
I might have to rewrite things away from the subquery entirely it seems if I want to retain MySQL 4.0 compatibility. Though I only need that because thats the MySQL version on our current online demo server. Might just update the beast.
Thx Lorenzo!
SELECT r.product_id, r.transaction_id, r.price as recentPrice, p.transaction_id as previousTransaction, p.price AS previousPrice
FROM transactions r
INNER JOIN (SELECT product_id, MAX(transaction_id) AS transaction_id FROM transactions GROUP BY product_id) AS t ON t.product_id = r.product_id AND t.transaction_id = r.transaction_id
-- All previous transactions for this product at a differing price INNER JOIN transactions p ON p.product_id = r.product_id AND p.transaction_id < r.transaction_id AND p.price != r.price
-- Want no transactions between the recent transaction and the most recent price change LEFT JOIN transactions u ON u.product_id = r.product_id AND u.transaction_id > p.transaction_id AND u.transaction_id < r.transaction_id AND u.price != r.price
WHERE u.transaction_id IS NULL
I think.
Also a very nice solution. Still uses a subquery (but like I said I should just upgrade that demo server) but has the nice touch of also fetching the last price. Though in order to also fetch rows with no previous transaction the second INNER JOIN needs to be modified to be a LEFT JOIN.
In order to test this I added a new row to the table.
INSERT INTO `transactions` VALUES (3, 10, '2.00');
Think probably could factor out the subquery.. with a join and IS NULL to ensure no higher transaction_ids are available.
And yes, if want the rows without previous differing prices the inner should be a left.
To make things slightly more interesting I changed the last price for product_id to 6.00, so that there would be both increasing and dropping prices.
I then inlined the subquery. Since I do not need the transaction_id's things worked out as follows:
SELECT r.product_id, MAX(r.price) last, MAX(p.price) previous FROM transactions r LEFT JOIN transactions t ON r.product_id = t.product_id AND r.transaction_id < t.transaction_id AND r.price != t.price LEFT JOIN transactions p ON p.product_id = r.product_id AND p.transaction_id < r.transaction_id AND p.price != r.price LEFT JOIN transactions u ON u.product_id = r.product_id AND u.transaction_id > p.transaction_id AND u.transaction_id < r.transaction_id AND u.price != r.price WHERE u.product_id IS NULL AND t.product_id IS NULL GROUP BY product_id
*sigh* .. it now seems that its impossible to come up with a solution that works both on MySQL 4.0.x and Oracle 8i.
Since the latter is more important (and usually harder to upgrade) I had to rewrite things from ANSI style joins to alternatively using Oracle style outer joins. However Oracle 8i seems to be limited in the number of outer joins per table per query. So I went in and rewrite 2 of the left outer joins to sub queries. Thereby I killed off MySQL 4.0.x support. So it goes.
The following queries also handle a nasty side effect which is caused by transactions like the following:
INSERT INTO transactions VALUES (4, 11, '18.00'); INSERT INTO transactions VALUES (4, 12, '16.00'); INSERT INTO transactions VALUES (4, 13, '15.00'); INSERT INTO transactions VALUES (4, 14, '15.00'); INSERT INTO transactions VALUES (4, 15, '18.00');
So here they are:
SELECT L.product_id, MAX( L.price ) last, MAX( O.price ) previous FROM transactions L LEFT JOIN transactions O ON O.product_id = L.product_id AND O.transaction_id < L.transaction_id AND O.price != L.price AND O.transaction_id IN ( SELECT MAX( B.transaction_id ) FROM transactions B WHERE O.product_id = B.product_id AND B.price NOT IN ( SELECT C.price FROM transactions C WHERE O.product_id = C.product_id AND C.transaction_id = ( SELECT MAX( D.transaction_id ) FROM transactions D WHERE O.product_id = D.product_id ) ) GROUP BY O.product_id ) WHERE L.transaction_id = ( SELECT MAX( E.transaction_id ) FROM transactions E WHERE E.product_id = L.product_id ) GROUP BY L.product_id
Optionally using Oracle style joins is just a single if-statement in the code and should work out to an SQL query like the following:
SELECT L.product_id, MAX( L.price ) last, MAX( O.price ) previous FROM transactions L, transactions O WHERE O.product_id (+) = L.product_id AND O.transaction_id < L.transaction_id AND O.price != L.price AND O.transaction_id IN ( SELECT MAX( B.transaction_id ) FROM transactions B WHERE O.product_id = B.product_id AND B.price NOT IN ( SELECT C.price FROM transactions C WHERE O.product_id = C.product_id AND C.transaction_id = ( SELECT MAX( D.transaction_id ) FROM transactions D WHERE O.product_id = D.product_id ) ) GROUP BY O.product_id ) WHERE L.transaction_id = ( SELECT MAX( E.transaction_id ) FROM transactions E WHERE E.product_id = L.product_id ) GROUP BY L.product_id