SQL Workshop - Subselects And Join
Written by Nikos Vaggalis   
Thursday, 21 November 2013

SQL Workshop is where we confront the type of problems SQL devs face on a day to day basis. The example here is based on a real problem encountered in Ingres - but it has been simplified for the purpose of illustration. What do you do when you can't use a subselect within a Join?

 

SQLW

 

Sometimes you know how to do something but the database you are working with just wont co-operate. You can give up or exercise the brain and find a different way of doing the job. 

From time to time , there are certain SQL queries that are not supported by the DBMS’s current version, with support being promised TBA in future versions, or in worst case, never planning for it

But we really do need to make it work now, thus we must discover other ways of going about it.

The Problem

As an example, let’s assume that there is a fictional reseller company that uses the tables bought_master to record invoices issued for the products purchased, bought_detail to keep track of their corresponding details and sold_detail to record invoice details issued for the products sold.  To keep the example code short we won't use the corresponding sold_master table.

 

bought_master

year

invoice

status

1/1/2013

1

paid

1/1/2013

2

paid

1/1/2013

3

paid

1/1/2013

4

paid

1/1/2013

5

paid

 

bought_detail

year

invoice

product_code

price

1/1/2013

1

1001002

$25

1/1/2013

1

23001002

$20

1/1/2013

2

1001002

$35

1/1/2013

3

23001002

$40

1/1/2013

4

23001002

$60

 

sold_detail

year

invoice

product_code

price

1/1/2013

300

1001002

$10

1/1/2013

301

1001002

$40

 

For a given product there must be entries in the bought table but not necessarily in sold; for example item 23001002 has 3 entries in the bought tables but none in sold, which in essence means that although the item has been bought, it has not generated a sale yet.

On the contrary, item 1001002 has entries in both tables which means that it has been bought and subsequently sold.

For comparison/statistical reasons we are asked to retrieve the average buying price per product,  together with its latest selling price.

The desired result set should look like this:

 

desired result set

year

product code

latest price sold

average price bought

1/1/2013

1001002

$40

$30

1/1/2013

23001002

NULL

$60

 

How do we reach to those results ?

For product 1001002, bought_detail  has two invoices: 1 with price  $25 and  2 with price bought $35, so the average is (25+35)/2 =$30, while its latest price sold retrieved from invoice 301 of the corresponding entry in the sold_detail, is $40.

Product 23001002 has invoice entries 1,3,4 with average value ($20+$40+$60)/3 = $60, while there is no entry in the sold table since the product has not generated a sale yet.

Our first attempt at tackling it would be :

Select d.product_code as "product code"
  s.price as "latest price sold"
  avg(d.price) as "average price bought"
from bought_detail d
 inner join bought_master h
   on d.year = h.year and
   d.invoice = h.invoice
 left join sold_detail s
   on d.product_code = s.product_code and
     d.last_year = s.last_year and
       s.invoice >= ALL
        (select s1.invoice
          from sold_detail s1
          where s.year = s1.last_year and
            s.product_code = s1.product_code)
where h.status="paid" and
  h.year='1/1/2013'
group by d.product_code, s.price

Which is translated into English as :

For each item that has been bought and its invoice has been paid, get its average bought price and, where applicable, its latest sold price. Thus for each item we need all rows from the bought tables that have a status of paid, plus the row from the sold table that keeps the latest sold price. However, we also want to keep all entries from the bought table that do not have a corresponding entry in the sold table, therefore we'll use use a left join to satisfy this requirement.

To get to the latest price sold we use the ALL operator and a correlated sub-query which will do a one by one comparison of each product’s row in the sold table with the rest of the rows of the same product in that same table (that is, doing a self join on the sold table), to get the price based on the latest invoice number ( which is produced in a sequential fashion). Alternatively the same would be achievable with the use of the max operator.

However, running the query (on Ingres v9.2) fails with:

E_PS03A0 Subselects are not supported inside join qualifications.

The same stands still true, when replacing ALL with the max operator:

left join sold_detail s
 on d.product_code = s.product_code and
  d.last_year = s.last_year and
   
s.invoice = (select max(s1.invoice)
     
from invoice_sold_detail s1
       where s.year = s1.last_year and
         s.product_code = s1.product_code)

So it seems that we've hit a technical issue.

Another Way

But there should be a way of bypassing this limitation, and there is, as long as we restructure the query into :

Select d.product_code as "product code"
 s.price as "latest price sold"
 avg(d.price) as "average price bought"
from bought_detail d
  inner join bought_master h
 on d.year = h.year and
   d.invoice = h.invoice
 
left join sold_detail s
  on d.product_code = s.product_code and
        d.year=s.year
where h.status="paid" and
   h.year='1/1/2013' and
    (s.invoice = (select max(s1.invoice)
 from sold_detail s1
     where s.product_code = s1.product_code and
       s1.year = s1.year )
    or s.invoice IS NULL)
group by d.product_code, s.price

What happens is that we shift the aggregate test into the WHERE filtering clause, and add the or s.invoice IS NULL condition which takes advantage of the NULL values returned by the left join whenever there is no applicable invoice entry in the sold table, thus keeping all entries of the bought table together with the latest sold price wherever found

This goes to show that there are times that we must resort into makeshift solutions to overcome technical limitations imposed by our DBMS, something deemed feasible by SQL’s expressiveness, staying loyal to the “there's more than one way to do it” principle.

 SQLW

Other SQL Workshops

SQL Workshop - Selecting columns without including a non-aggregate column in the group by clause

SQL Workshop - Subselects And Join

More SQL Workshop soon!

Related Articles

SQL Server: Quickly get row counts for tables, heaps, indexes, and partitions

The First Things I Look At On A SQL Server – Part 1

The First Things I Look At On A SQL Server – Part 2

A Generic SQL Server Compression Utility

Codd and his Rules      

 

To be informed about new articles on I Programmer, install the I Programmer Toolbar, subscribe to the RSS feed, follow us on, Twitter, FacebookGoogle+ or Linkedin,  or sign up for our weekly newsletter.

 

espbook

 

Comments




or email your comment to: comments@i-programmer.info

 

Banner


Lightbend Announces Akka 3
15/11/2024

Lightbend, the company that developed Akka, has announced Akka 3, and has changed its name to Akka. The company produces cloud-native microservices frameworks, and Akka is used for building distribute [ ... ]



pg_parquet - Postgres To Parquet Interoperability
28/11/2024

pg_parquet is a new extension by Crunchy Data that allows a PostgreSQL instance to work with Parquet files. With pg_duckdb, pg_analytics and pg_mooncake all of which can access Parquet files, is  [ ... ]


More News

Last Updated ( Thursday, 19 December 2013 )