Find Python Code On GitHub With Gistable
Written by Kay Ewbank   
Friday, 07 September 2018

Researchers have put together a database of Python code snippets on GitHub. Gistable lists over 10,000 Python code snippets, of which around half come with a Dockerfile to configure and execute them.

The database was developed on the basis of research carried out by a team from North Carolina State University, who were interested in the executable status of Python code snippets shared on GitHub.

The researchers wanted to to know what percentage of code shared through GitHub's gist system would just work, and how much would require 'non-trivial configuration to overcome missing dependencies, configuration files, reliance on a specific operating system, or some other environment configuration'.

The problem, of course, is that code snippets can contain parse errors, or fail to execute if the environment contains unmet dependencies.

The researchers found that 75.6% of gists require non-trivial configuration to overcome missing dependencies, configuration files, reliance on a specific operating system, or some other environment configuration. The study also suggests that:

"the natural assumption developers make about resource names when resolving configuration errors is correct less than half the time."

The researchers scraped gist URLs from the GitHub gist UI, and collected an initial dataset of 10,259 gists containing over 1,700 unique third-party library packages. These were then cloned and executed inside of a Docker container based on the official Python image for Docker, categorizing the gist by its exit status.

Less than 25% of gists were executable by default, with over half failing due to ImportError in Python 2. Of the gists which initially failed withImportError, attempts to infer an environment specification worked less than 50% of the time.

While the researchers were mainly interested in investigating the state of online code, out of it they developed the database Gistable.  The idea is that this is an extensible framework that can be used for reproducible studies in software engineering. Gistable contains 10,259 code snippets, approximately 5,000 with a Dockerfile to configure and execute them without import error.


More Information

Gistable On GitHub

Research Abstract On Arxiv

Related Articles

GitHub Security Alerts For Python 

Python The Future Of Programming?

GitHub Adds Security Alerts 

GitHub For Unity Now Available

Microsoft Buys GitHub - Get Ready For a Bigger Devil

GitHub Marketplace Now Accepts Free Apps and Offers Free Trials

GitHub Enterprise Adds Team Discussions


To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.


Hydra Turns PostgreSQL Into A Column Store

Hydra is an open-source extension that adds columnar tables to Postgres for efficient analytical reporting. Version 1.0 is generally available.

Windows Wallpaper Is Latest Ugly Sweater Design

Microsoft has made the classic Windows XP wallpaper the design for its shot at this year's ugly sweater options. If, like me, you're wondering how you managed to successfully avoid this as a concept,  [ ... ]

More News




or email your comment to:

Last Updated ( Monday, 10 September 2018 )