Find Python Code On GitHub With Gistable
Written by Kay Ewbank   
Friday, 07 September 2018

Researchers have put together a database of Python code snippets on GitHub. Gistable lists over 10,000 Python code snippets, of which around half come with a Dockerfile to configure and execute them.

The database was developed on the basis of research carried out by a team from North Carolina State University, who were interested in the executable status of Python code snippets shared on GitHub.

The researchers wanted to to know what percentage of code shared through GitHub's gist system would just work, and how much would require 'non-trivial configuration to overcome missing dependencies, configuration files, reliance on a specific operating system, or some other environment configuration'.

The problem, of course, is that code snippets can contain parse errors, or fail to execute if the environment contains unmet dependencies.

The researchers found that 75.6% of gists require non-trivial configuration to overcome missing dependencies, configuration files, reliance on a specific operating system, or some other environment configuration. The study also suggests that:

"the natural assumption developers make about resource names when resolving configuration errors is correct less than half the time."

The researchers scraped gist URLs from the GitHub gist UI, and collected an initial dataset of 10,259 gists containing over 1,700 unique third-party library packages. These were then cloned and executed inside of a Docker container based on the official Python image for Docker, categorizing the gist by its exit status.

Less than 25% of gists were executable by default, with over half failing due to ImportError in Python 2. Of the gists which initially failed withImportError, attempts to infer an environment specification worked less than 50% of the time.

While the researchers were mainly interested in investigating the state of online code, out of it they developed the database Gistable.  The idea is that this is an extensible framework that can be used for reproducible studies in software engineering. Gistable contains 10,259 code snippets, approximately 5,000 with a Dockerfile to configure and execute them without import error.

githubdeklogo

More Information

Gistable On GitHub

Research Abstract On Arxiv

Related Articles

GitHub Security Alerts For Python 

Python The Future Of Programming?

GitHub Adds Security Alerts 

GitHub For Unity Now Available

Microsoft Buys GitHub - Get Ready For a Bigger Devil

GitHub Marketplace Now Accepts Free Apps and Offers Free Trials

GitHub Enterprise Adds Team Discussions

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


The Appeal of Google Summer of Code
21/03/2024

With the list of participating organizations now published, it is time for would-be contributors to select among them and apply for Google Summer of Code (GSoC). Rust has joined in the program fo [ ... ]



ACM Adopts Open Access Publishing Model
05/04/2024

ACM, the Association for Computing Machinery, the professional body for computer scientists, has relaunched Communications of the ACM, the organization’s flagship magazine, as a web-first  [ ... ]


More News

raspberry pi books

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Monday, 10 September 2018 )