Programmer's Python Async - Streams & Web Clients

Written by Mike James

Monday, 07 November 2022

Article Index
Programmer's Python Async - Streams & Web Clients
StreamWriter
The Response

Page 3 of 3

When the server receives the GET request it finds the specified file and sends it to the client using the same socket connection. The first part of the message sent to the client is a set of headers which we need to read and process. The first line of any response is always:

HTTP/1.1 200 OK\r\n

which gives the HTTP version and the status code which we can assume is going to be 200, i.e. no error. If you want to write a complete client you need to extract the error code and react to it. In our simple demonstration we can read it and ignore it:

    headers=""
    line = await reader.readline()

Next we need to read the headers that the server has sent. These arrive one to a line and the end is marked by a blank like, just like the headers we sent to the server:

    while True:
        line = await reader.readline()
        line = line.decode('ascii')
        if line=="\r\n":
            break
        headers+=line

This loop reads each line in turn, converts it to a Python string using ASCII encoding and builds up a complete string of headers. The loop ends when we read a blank line.

We need to process the headers because the Content-Length header tells us how many bytes to read to get the content, i.e. the HTML that makes up the page. We need this because we cannot read data expecting an EOF signal, because there isn’t one. The socket stays open in case you have another request to send to the server. If you do wait for an EOF then you will usually wait a long time before the server times out.

We need to read the Content-Length header to find the number of bytes to read. We could use some simple string manipulation to extract the header we want, but there is a standard way to parse HTTP headers even if it is obscure because it is part of the email module. It turns out the emails use HTTP as their protocol and hence you can use email.message_from_string to parse HTTP headers:

def parseHeaders(headers):
    message = email.message_from_string(headers)
    return dict(message.items())

This utility function returns all of the headers as a dictionary keyed on the header names with values of the strings they are set to. Now we can use this to get the Content-Length header:

  headers = parseHeaders(headers)
  length = int(headers["Content-Length"])

As we now know the number of characters to read the rest of the procedure is simple:

    line = await reader.read(length)
    line = line.decode('utf8')  
    writer.close()
    await writer.wait_closed()
    return line

This time we decode the content using utf8 because this is what most modern web pages use for their content. To check, we should decode the Content-Type header which in this case reads:

Content-Type: text/html; charset=UTF-8

So the content is HTML and it is UTF-8 encoded.

To demonstrate all of this we need a coroutine to start things off:

async def main():
    start = time.perf_counter()
    results = await asyncio.gather(
              download('http://www.example.com/'), 
              download('http://www.example.com/'))
    end = time.perf_counter()
    print((end-start)*1000)
    print(results[0][:25])

asyncio.run(main())

This creates two tasks to download the same page, starts them both off asynchronously and waits for them to complete. Whenever one of the tasks has to wait for data to be available it releases the main thread and the other gets a chance to run and so on. As a result main mostly has little to do and you can increase the number of downloads without increasing the time it takes by much. For example, adding an additional download on a test machine to the asynchronous program increases the time it takes by about 30 ms, whereas for a synchronous program it adds 220 ms. This means that downloading 100 pages takes about 3 seconds asynchronously, but 21 seconds doing the job synchronously.

The complete program is:

import asyncio
import urllib.parse
import time
import email

def parseHeaders(headers):
    message = email.message_from_string(headers)
    return dict(message.items())

async def download(url):
    url = urllib.parse.urlsplit(url)
    reader, writer = await asyncio.open_connection(
                           url.hostname, 443,ssl=True)

    request = (
        f"GET /index.html HTTP/1.1\r\n"
        f"Host: {url.hostname}\r\n"
        f"\r\n"
    )

    writer.write(request.encode('ascii'))
    headers = ""
    line = await reader.readline()
    while True:
        line = await reader.readline()
        line = line.decode('ascii')
        if line == "\r\n":
            break
        headers += line

    headers = parseHeaders(headers)
    length = int(headers["Content-Length"])
    line = await reader.read(length)
    line = line.decode('utf8')
    writer.close()
    await writer.wait_closed()
    return line

async def main():
    start = time.perf_counter()
    results = await asyncio.gather(
                 download('http://www.example.com/'),
                 download('http://www.example.com/'))
    end = time.perf_counter()
    print((end-start)*1000)
    print(results[0][:25])

asyncio.run(main())

In chapter but not in this extract

Server
A Web Server
SSL Server
Using Streams
Converting Blocking To Non-blocking
Running in Threads
Why Not Just Use Threads?
CPU-Bound Tasks
Asyncio-Based Modules
Working With Other Event Loops – Tkinter
Subprocesses

Summary

The asyncio module makes network connections easy and asynchronous.
Network communication is via streams – StreamReader and StreamWriter - which work like more sophisticated Pipes.
Implementing a web client is easy, but there is no high-level function which downloads an HTML page. You have to work with the HTTP protocol.
The email module has many useful functions for working with HTTP.
Creating an SSL client is a matter of changing a single line in the program.
Creating a web server is only slightly more difficult in that you have to support multiple potential clients.
Converting the server to SSL requires the generation and installation of a certificate.
You can use raw sockets which do not support streams. The only reason for doing this is to implement a custom protocol.
To convert a blocking synchronous function into a non-blocking asynchronous function all you have to do is run it on another thread and release the original thread to service the event loop.
The asyncio module provides a function that allows you to run a function on another thread asynchronously.
You can use additional threads to run CPU-bound functions asynchronously.
There are additional modules that provide asynchronous versions of standard operations, usually by running them on an additional thread.
A particular problem is coexisting with modules that implement their own event loop such as tkinter. There are two approaches – to find an update function which can be called from an asyncio event loop or to use a separate thread to run each event loop.
The asyncio module provides a very easy way to run subprocesses without having to worry about blocking the thread or dealing with buffers.

Programmer's Python:
Async
Threads, processes, asyncio & more

Is now available as a print book: Amazon

1) A Lightning Tour of Python.

2) Asynchronous Explained

3) Processed-Based Parallelism
         Extract 1 Process Based Parallism
4) Threads
         Extract 1 -- Threads
5) Locks and Deadlock
Extract 1 -  Locks

6) Synchronization

7) Sharing Data
Extract 1 - Pipes & Queues
Extract 2 - Shared Memory ***NEW!

8) The Process Pool
Extract 1 -The Process Pool 1

9) Process Managers
Extract 1- Process Manager

10) Subprocesses

11) Futures
Extract 1 Futures,

12) Basic Asyncio
Extract 1 Basic Asyncio

13) Using asyncio
Extract 1 Asyncio Web Client
14) The Low-Level API
Extract 1 - Streams & Web Clients
Appendix I Python in Visual Studio Code

Comments

or email your comment to: comments@i-programmer.info

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Facebook or Linkedin.

<ASIN:1871962765>

<ASIN:1871962749>

<ASIN:1871962595>

<< Prev - Next

Last Updated ( Tuesday, 08 November 2022 )

In chapter but not in this extract

Summary

Programmer's Python:AsyncThreads, processes, asyncio & more

Is now available as a print book: Amazon

Contents

Comments

Programmer's Python:
Async
Threads, processes, asyncio & more