Find Python Code On GitHub With Gistable
Written by Kay Ewbank   
Friday, 07 September 2018

Researchers have put together a database of Python code snippets on GitHub. Gistable lists over 10,000 Python code snippets, of which around half come with a Dockerfile to configure and execute them.

The database was developed on the basis of research carried out by a team from North Carolina State University, who were interested in the executable status of Python code snippets shared on GitHub.

The researchers wanted to to know what percentage of code shared through GitHub's gist system would just work, and how much would require 'non-trivial configuration to overcome missing dependencies, configuration files, reliance on a specific operating system, or some other environment configuration'.

The problem, of course, is that code snippets can contain parse errors, or fail to execute if the environment contains unmet dependencies.

The researchers found that 75.6% of gists require non-trivial configuration to overcome missing dependencies, configuration files, reliance on a specific operating system, or some other environment configuration. The study also suggests that:

"the natural assumption developers make about resource names when resolving configuration errors is correct less than half the time."

The researchers scraped gist URLs from the GitHub gist UI, and collected an initial dataset of 10,259 gists containing over 1,700 unique third-party library packages. These were then cloned and executed inside of a Docker container based on the official Python image for Docker, categorizing the gist by its exit status.

Less than 25% of gists were executable by default, with over half failing due to ImportError in Python 2. Of the gists which initially failed withImportError, attempts to infer an environment specification worked less than 50% of the time.

While the researchers were mainly interested in investigating the state of online code, out of it they developed the database Gistable.  The idea is that this is an extensible framework that can be used for reproducible studies in software engineering. Gistable contains 10,259 code snippets, approximately 5,000 with a Dockerfile to configure and execute them without import error.


More Information

Gistable On GitHub

Research Abstract On Arxiv

Related Articles

GitHub Security Alerts For Python 

Python The Future Of Programming?

GitHub Adds Security Alerts 

GitHub For Unity Now Available

Microsoft Buys GitHub - Get Ready For a Bigger Devil

GitHub Marketplace Now Accepts Free Apps and Offers Free Trials

GitHub Enterprise Adds Team Discussions


To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.


Deno Improves JSR Support

Deno has been updated to improve JSR support, and to build on the Temporal API introduced in version 1.4.  Deno is the JavaScript and TypeScript runtime from the creator of Node.js.

Is PHP in Trouble?

The April 2024 headline for the TIOBE Index, which ranks programming languages in terms of their popularity, reads, "Is PHP losing its mojo" asking this question because this month PHP has dropped out [ ... ]

More News

raspberry pi books



or email your comment to:

Last Updated ( Monday, 10 September 2018 )