Programmer's Python Data - Byte Manipulation
Written by Mike James   
Monday, 05 June 2023
Article Index
Programmer's Python Data - Byte Manipulation
Multibyte Shifts

Multibyte Shifts

When you have byte sequences to work with it is tempting to simply write a for loop that processes each byte in turn. However, notice that there is no easy way to implement a shift operation on a bytes or bytearray object as you need to arrange to move bits from one byte to another. On the other hand, implementing a shift on a bignum is a single operation. For example:

myBytes=bytes([0xFF,0xAA,0x55])
bits=int.from_bytes(myBytes,byteorder="big")
bits=bits>>4
print(bits.to_bytes(3,byteorder="big"))

displays:

b'\x0f\xfa\xa5'

which, as you can see, has shifted the low four bits of each byte into the high four bits of the next byte.

Doing this without converting to bignums is a difficult task involving masking out and shifting the low-order bits of the previous byte to become the high-order bits of the current byte. For example, to implement a shift right of four bits:

myBytes1=bytearray([0xFF,0xAA,0x55])
myBytes2=bytearray(3)
for i in range(len(myBytes1)):
    myBytes2[i]=myBytes1[i]>>4
    if i>0:
        myBytes2[i]=myBytes2[i]|
((myBytes1[i-1]<<4)&0xF0) print(myBytes2)

In most cases it is preferable to convert to bignums.

One-Time Pad

As an example of this approach to byte manipulation consider the common task of XORing a set of random bits with a bit pattern. The reason you might want to do this is to encrypt the data. This is a very secure code usually known as a “one-time pad”. You can recover the original data by simply performing the XOR operation a second time as (x ^ y)^y is x. This doesn’t sound very secure, but to decode it you need the random bits to perform the XOR the second time – without the one-time pad it is impossible to recover the original text.

Start with a suitable message as an ASCII string:

myBytes = b"Hello World Of Secrets"

which could have been in the form of a Unicode string converted to an ASCII string. Next we need a one-time pad:

oneTime = int.from_bytes(random.randbytes(len(myBytes)),
                                      byteorder="big")

To understand this you need to know that:

random.randbytes(len(myBytes))

generates the specified number of random bytes as a bytes object. We then use the from_bytes method to create the bignum oneTime with the same bit pattern. To XOR the message with the oneTime pad we need to convert the ASCII string to a bignum:

msg = int.from_bytes(myBytes,byteorder="big")

Now we have both bit patterns as bignums and so can perform the XOR:

crypt=msg ^ oneTime

To decrypt the message we just need to repeat the XOR:

decrypt=crypt ^ oneTime

and to see it we need to convert it back to an ASCII string:

decrypt=decrypt.to_bytes((decrypt.bit_length()+7)//8,byteorder="big")

Putting all this together, and adding some print instructions gives:

import random
myBytes=b"Hello World Of Secrets"
oneTime=int.from_bytes(random.randbytes(len(myBytes)),
byteorder="big") msg=int.from_bytes(myBytes,byteorder="big") crypt=msg ^ oneTime print(hex(crypt)) decrypt=crypt ^ oneTime decrypt=decrypt.to_bytes((decrypt.bit_length()+7)//8,
byteorder="big") print(decrypt)

Of course, in a real application the one-time pad would be available at another site and the encoded message would be transmitted between them securely – usually a difficult task. The one-time pad may be uncrackable, but it isn’t convenient.

How would you implement this using direct operations on the byte sequences? The most obvious way to a programmer used to for loops in other languages would be to use a loop index:

crypt=bytearray(len(msg))
for i in range(len(msg)):
    crypt[i]=msg[i]^ oneTime[1]
print(crypt)

Notice that you need to use a bytearray and not a bytes object because of the need to modify it in-place.

A more Pythonic approach would be to use a comprehension:

crypt= bytes([a^b for a, b in zip(msg,oneTime)])

This is more compact and arguably easier to understand, but only if you are happy with comprehensions, the zip function, tuples, destructuring and the bytes constructor. In principle it also has the potential to be faster than the index loop approach, but this does depend on the quality of the compiler or interpreter in use.

A complete program using comprehensions is:

import random
msg=b"Hello World Of Secrets"
oneTime=random.randbytes(len(msg))
crypt= bytes([a^b for a, b in zip(msg,oneTime)])
print(crypt)
decrypt= bytes([a^b for a, b in zip(crypt,oneTime)])
print(decrypt)

In chapter but not in this extract

  • The Array
  • Memoryview

Summary

  • Working with bit patterns is fundamental, but you generally have to work with bytes or some other larger unit of storage.

  • Working with a byte sequence is possible using the bytes object which is immutable or a bytearray which is mutable.

  • Both the bytes and bytearray objects can be thought of as ASCII strings and have many of the same methods as strings.

  • A bytes literal is distinguished from a string by a leading b and contains ASCII characters and escape codes for values above 127.

  • You can also create bytes objects and bytearrays using an iterable that provides integers in the correct range.

  • The encode method takes a Unicode string and converts it into a byte sequence using the specified encoding.

  • The decode method takes a byte sequence and converts it into a Unicode string using the specified encoding.

  • When trying to manipulate a byte sequence you can opt to convert it to a bignum and then use bitwise operators or you can work byte-by-byte in a for loop.

  • When working with bytes in groups it matters which order you take them in – big endian takes the most significant byte first and little endian takes the least significant byte first.

  • Multibyte shifts are difficult to implement because of the way the sign bit has to be treated.

  • Python has a basic array type in the array module. This supports arrays of basic C arrays.

  • The memoryview class provides a view into the buffer of any object that supports the buffer protocol.

  • A memoryview doesn’t make a copy of the original buffer – it simply provides access.

  • The object that the buffer belongs to can set the type and shape of the buffer in an attempt to make it easier for you to use.

  • If the object doesn’t set the type and shape of the buffer you can use the cast method to change or set it.

 

Programmer's Python
Everything is Data

Is now available as a print book: Amazon

pythondata360Contents

  1. Python – A Lightning Tour
  2. The Basic Data Type – Numbers
       Extract: Bignum
  3. Truthy & Falsey
  4. Dates & Times
  5. Sequences, Lists & Tuples
       Extract Sequences 
  6. Strings
       Extract Unicode Strings
  7. Regular Expressions
  8. The Dictionary
       Extract The Dictionary 
  9. Iterables, Sets & Generators
       Extract  Iterables 
  10. Comprehensions
       Extract  Comprehensions 
  11. Data Structures & Collections
  12. Bits & Bit Manipulation
         Extract Bits and BigNum ***NEW!!!
  13. Bytes
        Extract Bytes And Strings
        Extract Byte Manipulation 
  14. Binary Files
  15. Text Files
  16. Creating Custom Data Classes
        Extract A Custom Data Class 
  17. Python and Native Code
        Extract   Native Code
    Appendix I Python in Visual Studio Code
    Appendix II C Programming Using Visual Studio Code

<ASIN:1871962765>

<ASIN:1871962749>

<ASIN:1871962595>

<ASIN:187196265X>

Related Articles

Creating The Python UI With Tkinter

Creating The Python UI With Tkinter - The Canvas Widget

The Python Dictionary

Arrays in Python

Advanced Python Arrays - Introducing NumPy

raspberry pi books

 

Comments




or email your comment to: comments@i-programmer.info

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner



Last Updated ( Monday, 05 June 2023 )