SQL Workshop - Subselects And Join |
Written by Nikos Vaggalis | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Thursday, 21 November 2013 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SQL Workshop is where we confront the type of problems SQL devs face on a day to day basis. The example here is based on a real problem encountered in Ingres - but it has been simplified for the purpose of illustration. What do you do when you can't use a subselect within a Join?
Sometimes you know how to do something but the database you are working with just wont co-operate. You can give up or exercise the brain and find a different way of doing the job. From time to time , there are certain SQL queries that are not supported by the DBMS’s current version, with support being promised TBA in future versions, or in worst case, never planning for it But we really do need to make it work now, thus we must discover other ways of going about it. The ProblemAs an example, let’s assume that there is a fictional reseller company that uses the tables bought_master to record invoices issued for the products purchased, bought_detail to keep track of their corresponding details and sold_detail to record invoice details issued for the products sold. To keep the example code short we won't use the corresponding sold_master table.
For a given product there must be entries in the bought table but not necessarily in sold; for example item 23001002 has 3 entries in the bought tables but none in sold, which in essence means that although the item has been bought, it has not generated a sale yet. On the contrary, item 1001002 has entries in both tables which means that it has been bought and subsequently sold. For comparison/statistical reasons we are asked to retrieve the average buying price per product, together with its latest selling price. The desired result set should look like this:
How do we reach to those results ? For product 1001002, bought_detail has two invoices: 1 with price $25 and 2 with price bought $35, so the average is (25+35)/2 =$30, while its latest price sold retrieved from invoice 301 of the corresponding entry in the sold_detail, is $40. Product 23001002 has invoice entries 1,3,4 with average value ($20+$40+$60)/3 = $60, while there is no entry in the sold table since the product has not generated a sale yet. Our first attempt at tackling it would be :
Which is translated into English as : For each item that has been bought and its invoice has been paid, get its average bought price and, where applicable, its latest sold price. Thus for each item we need all rows from the bought tables that have a status of paid, plus the row from the sold table that keeps the latest sold price. However, we also want to keep all entries from the bought table that do not have a corresponding entry in the sold table, therefore we'll use use a left join to satisfy this requirement. To get to the latest price sold we use the ALL operator and a correlated sub-query which will do a one by one comparison of each product’s row in the sold table with the rest of the rows of the same product in that same table (that is, doing a self join on the sold table), to get the price based on the latest invoice number ( which is produced in a sequential fashion). Alternatively the same would be achievable with the use of the max operator. However, running the query (on Ingres v9.2) fails with: E_PS03A0 Subselects are not supported inside join qualifications. The same stands still true, when replacing ALL with the max operator:
So it seems that we've hit a technical issue. Another WayBut there should be a way of bypassing this limitation, and there is, as long as we restructure the query into :
What happens is that we shift the aggregate test into the WHERE filtering clause, and add the or s.invoice IS NULL condition which takes advantage of the NULL values returned by the left join whenever there is no applicable invoice entry in the sold table, thus keeping all entries of the bought table together with the latest sold price wherever found This goes to show that there are times that we must resort into makeshift solutions to overcome technical limitations imposed by our DBMS, something deemed feasible by SQL’s expressiveness, staying loyal to the “there's more than one way to do it” principle.
Other SQL WorkshopsSQL Workshop - Selecting columns without including a non-aggregate column in the group by clause SQL Workshop - Subselects And Join More SQL Workshop soon! Related ArticlesSQL Server: Quickly get row counts for tables, heaps, indexes, and partitions The First Things I Look At On A SQL Server – Part 1 The First Things I Look At On A SQL Server – Part 2 A Generic SQL Server Compression Utility To be informed about new articles on I Programmer, install the I Programmer Toolbar, subscribe to the RSS feed, follow us on, Twitter, Facebook, Google+ or Linkedin, or sign up for our weekly newsletter.
Comments
or email your comment to: comments@i-programmer.info
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Last Updated ( Thursday, 19 December 2013 ) |