Pythonic Code Automatically
Wednesday, 27 July 2022

Python is different in that its community has a strong sense of what makes code "pythonic". It means that beginners and interlopers are easy to spot and easy to put down - "dude your code is NOT Pythonic". Now a research team has automatated the act of refectoring code to make it Pythonic.

pfsbanner

It is possible to write code in any language as if it was more like another language. This happens often in Python because while Python isn't like most class-based, object-oriented languages, you can still write Python as if it was Java or C++ or C# or.. and many programmers do just this.

Python insiders, however, tend to value code written in a style that makes use of the features that Python offers, i.e. they value Pythonic code. This said, the whole question of what is Pythonic code is something mysterious. Now we have some research from a team from the National University of Australia and CSIRO's Data61 that will be aired in November at the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022)  that attempts to find out. As a by-product it shows how to automate the conversion of non-Pythonic to Pythonic code and have built a refactoring tool.

Based on an analysis of 7,638 Python repositories on GitHub, we find that non-idiomatic Python code that can be implemented with pythonic idioms occurs frequently and widely. Unfortunately, there is no tool for automatically refactoring such non-idiomatic code into idiomatic code. In this paper, we design and implement an automatic refactoring tool to make Python code idiomatic

But what exactly is a Python idiom?

We identify nine pythonic idioms by systematically contrasting the abstract syntax grammar of Python and Java

Well I did tell you that Python was not Java. So what are these objective idioms - if you are a Pythonista you might well guess:

  1. List Comprehension
    e.g. t =[i for i in range(10000)]
  2. Set Comprehension
    e.g. simpsons_set = { word for word in chars}
  3. Dict Comprehension
    e.g. b = {v: k for k, v in a.items()}
  4. Chain Comparison
    e.g. a <= b <= c <= d <= e <= f
  5. Truth Value Test
    e.g. if not a: pass
  6. Loop Else
     e.g.  for x in range(2, n):
                  if n % x == 0:
                       break
            else:
                   pass
  7. Assign Multiple Targets
     e.g.   a, b, c, d = 2, 3, 5, 7 
  8. Star in Func Call
     e.g.  s = sum(*values)
  9. For Multiple Targets
     e.g.  for product, price, sold_units in sales:
                  a =product, price, sold_units

Well I think comprehensions is just one idiom so I'd claim they only list 7 but it's a matter of opinion. I have to admit that I don't use all these idioms because I think some are aren't as clear as the alternatives - idiom 4 can be hard to understand and idiom 6 isn't used often enough to be well understood - but yes I recognize them all as being special to Python.

So is using Pythonic code good? Well it seems to be faster. The researchers found that idiomatic code is 1.09 to 2.07 times faster than non-idiomatic code. It may be faster, but is it better? After looking at lots of Stack Overflow questions the researchers concluded that writing idiomatic code has its challenges. The number of questions concerning the idioms and the degree of activity strongly suggest that there is much to wonder about.

To try and automate the refactoring of non-Pythonic code to Pythonic, the syntactic patterns of non-Pythonic code were determined - anti-idiom code smells. Then AST-rewriting is used to transform the code to Pythonic.Then they tested it and concluded:

Our refactoring tool is robust and correct on real-world Python code. The limitation of Python static parsing and the complex program logic may result in a few rare detection and refactoring errors.

Then they did the only real way to find out if this was at all useful - they ran the refactoring on some GitHub repositories and submitted the improvements as pull requests, 90 in all. Of those 34 were accepted, 28 were merged and only 23 were rejected. Some of the rejection comments echo what I think about some Pythonic code:

 “While your change is indeed feasible, I believe the original style is more readable”

 “I feel like asserting it to empty dict is more explicit and readable”

All this just goes to show that what is readable is in the eye of the beholder. The  idioms may be objective, but their understandability is very subjective.

If you think this is a good idea then you might be pleased to learn that this refactoring might well be coming to a linter near your IDE very soon - as this is the main suggestion for further work.

However, before closing I would remind you of the final zen of Python:

There should be one – and preferably only one 
--obvious way to do it.
Although that way may not be obvious at first
unless you're Dutch.

So True.

python3

  • Mike James is the author of the Programmer's Python: Something Completely Different series of books which set out to show how Python diverges from other programming languages and how its special features deserve our attention. The second volume Programmer's Python: Everything Is Data was published in May and last week saw the publication of Programmer's Python: Everything Is An Object, 2nd Ed. The third volume on Asychronous and Concurrent Python is expected in the next few weeks.

More Information

Making Python Code Idiomatic by Automatic Refactoring Non-Idiomatic Python Code with Pythonic Idioms

Zejun Zhang, Zhenchang Xing, Xin Xia, Xiwei Xu, Liming Zhu

Related Articles

Python 3.11 Goes Faster

Python Is Everywhere - 2021 Survey Results

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

 

Banner


ScyllaDB Optimizes Mixed Workload Latency
18/07/2022

There's a major new version of ScyllaDB with improvements aimed at improving performance and ease of use. These start with support for running on the AWS EC2 servers that are powered by Intel Xeon pro [ ... ]



In Praise Of SQLite
28/07/2022

SQLite, the RDBMS that never ceases to amaze, has reached version 3.39.2. While this release contains just bug fixes, it builds on major changes that debuted in 3.39.0.


More News

pythondata

 



 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Wednesday, 27 July 2022 )