|Find Python Code On GitHub With Gistable|
|Written by Kay Ewbank|
|Friday, 07 September 2018|
Researchers have put together a database of Python code snippets on GitHub. Gistable lists over 10,000 Python code snippets, of which around half come with a Dockerfile to configure and execute them.
The database was developed on the basis of research carried out by a team from North Carolina State University, who were interested in the executable status of Python code snippets shared on GitHub.
The researchers wanted to to know what percentage of code shared through GitHub's gist system would just work, and how much would require 'non-trivial configuration to overcome missing dependencies, configuration files, reliance on a specific operating system, or some other environment configuration'.
The problem, of course, is that code snippets can contain parse errors, or fail to execute if the environment contains unmet dependencies.
The researchers found that 75.6% of gists require non-trivial configuration to overcome missing dependencies, configuration files, reliance on a specific operating system, or some other environment configuration. The study also suggests that:
"the natural assumption developers make about resource names when resolving configuration errors is correct less than half the time."
The researchers scraped gist URLs from the GitHub gist UI, and collected an initial dataset of 10,259 gists containing over 1,700 unique third-party library packages. These were then cloned and executed inside of a Docker container based on the official Python image for Docker, categorizing the gist by its exit status.
Less than 25% of gists were executable by default, with over half failing due to ImportError in Python 2. Of the gists which initially failed withImportError, attempts to infer an environment specification worked less than 50% of the time.
While the researchers were mainly interested in investigating the state of online code, out of it they developed the database Gistable. The idea is that this is an extensible framework that can be used for reproducible studies in software engineering. Gistable contains 10,259 code snippets, approximately 5,000 with a Dockerfile to configure and execute them without import error.
or email your comment to: firstname.lastname@example.org
|Last Updated ( Monday, 10 September 2018 )|