SQL Workshop - Subselects And Join

### New Book Reviews!

 SQL Workshop - Subselects And Join
Written by Nikos Vaggalis

SQL Workshop is where we confront the type of problems SQL devs face on a day to day basis. The example here is based on a real problem encountered in Ingres - but it has been simplified for the purpose of illustration. What do you do when you can't use a subselect within a Join?

Sometimes you know how to do something but the database you are working with just wont co-operate. You can give up or exercise the brain and find a different way of doing the job.

From time to time , there are certain SQL queries that are not supported by the DBMS’s current version, with support being promised TBA in future versions, or in worst case, never planning for it

But we really do need to make it work now, thus we must discover other ways of going about it.

## The Problem

As an example, let’s assume that there is a fictional reseller company that uses the tables bought_master to record invoices issued for the products purchased, bought_detail to keep track of their corresponding details and sold_detail to record invoice details issued for the products sold.  To keep the example code short we won't use the corresponding sold_master table.

year

invoice

status

1/1/2013

1

paid

1/1/2013

2

paid

1/1/2013

3

paid

1/1/2013

4

paid

1/1/2013

5

paid

## bought_detail

year

invoice

product_code

price

1/1/2013

1

1001002

$25 1/1/2013 1 23001002$20

1/1/2013

2

1001002

$35 1/1/2013 3 23001002$40

1/1/2013

4

23001002

1/1/2013

301

1001002

$40 For a given product there must be entries in the bought table but not necessarily in sold; for example item 23001002 has 3 entries in the bought tables but none in sold, which in essence means that although the item has been bought, it has not generated a sale yet. On the contrary, item 1001002 has entries in both tables which means that it has been bought and subsequently sold. For comparison/statistical reasons we are asked to retrieve the average buying price per product, together with its latest selling price. The desired result set should look like this: ## desired result set year product code latest price sold average price bought 1/1/2013 1001002$40

$30 1/1/2013 23001002 NULL$60

How do we reach to those results ?

For product 1001002, bought_detail  has two invoices: 1 with price  $25 and 2 with price bought$35, so the average is (25+35)/2 =$30, while its latest price sold retrieved from invoice 301 of the corresponding entry in the sold_detail, is$40.

Product 23001002 has invoice entries 1,3,4 with average value ($20+$40+$60)/3 =$60, while there is no entry in the sold table since the product has not generated a sale yet.

Our first attempt at tackling it would be :

Select d.product_code as "product code"  s.price as "latest price sold"  avg(d.price) as "average price bought"from bought_detail d inner join bought_master h   on d.year = h.year and   d.invoice = h.invoice left join sold_detail s   on d.product_code = s.product_code and     d.last_year = s.last_year and       s.invoice >= ALL        (select s1.invoice          from sold_detail s1          where s.year = s1.last_year and            s.product_code = s1.product_code)where h.status="paid" and  h.year='1/1/2013'group by d.product_code, s.price

Which is translated into English as :

For each item that has been bought and its invoice has been paid, get its average bought price and, where applicable, its latest sold price. Thus for each item we need all rows from the bought tables that have a status of paid, plus the row from the sold table that keeps the latest sold price. However, we also want to keep all entries from the bought table that do not have a corresponding entry in the sold table, therefore we'll use use a left join to satisfy this requirement.

To get to the latest price sold we use the ALL operator and a correlated sub-query which will do a one by one comparison of each product’s row in the sold table with the rest of the rows of the same product in that same table (that is, doing a self join on the sold table), to get the price based on the latest invoice number ( which is produced in a sequential fashion). Alternatively the same would be achievable with the use of the max operator.

However, running the query (on Ingres v9.2) fails with:

E_PS03A0 Subselects are not supported inside join qualifications.

The same stands still true, when replacing ALL with the max operator:

left join sold_detail s on d.product_code = s.product_code and  d.last_year = s.last_year and   s.invoice = (select max(s1.invoice)     from invoice_sold_detail s1       where s.year = s1.last_year and         s.product_code = s1.product_code)

So it seems that we've hit a technical issue.

## Another Way

But there should be a way of bypassing this limitation, and there is, as long as we restructure the query into :

Select d.product_code as "product code" s.price as "latest price sold" avg(d.price) as "average price bought"from bought_detail d  inner join bought_master h on d.year = h.year and   d.invoice = h.invoice left join sold_detail s  on d.product_code = s.product_code and        d.year=s.yearwhere h.status="paid" and   h.year='1/1/2013' and    (s.invoice = (select max(s1.invoice) from sold_detail s1     where s.product_code = s1.product_code and       s1.year = s1.year )    or s.invoice IS NULL)group by d.product_code, s.price

What happens is that we shift the aggregate test into the WHERE filtering clause, and add the or s.invoice IS NULL condition which takes advantage of the NULL values returned by the left join whenever there is no applicable invoice entry in the sold table, thus keeping all entries of the bought table together with the latest sold price wherever found

This goes to show that there are times that we must resort into makeshift solutions to overcome technical limitations imposed by our DBMS, something deemed feasible by SQL’s expressiveness, staying loyal to the “there's more than one way to do it” principle.

#### Other SQL Workshops

SQL Workshop - Selecting columns without including a non-aggregate column in the group by clause

SQL Workshop - Subselects And Join

More SQL Workshop soon!

 //No Comment - Approximate Edit Distance, Irrational Guards & DCT In 14 Additions07/02/2017• On Practical Accuracy of Edit Distance Approximation Algorithms • Irrational Guards are Sometimes Needed • DCT-like Transform for Image Compression Requires 14 Additions Only + Full Story JDK 9 is Feature Complete31/01/2017 JDK 9 is now feature complete, and the general availability date for Java 9 Standard Edition has been announced as July 27th 2017.  + Full Story More News

Last Updated ( Thursday, 19 December 2013 )