A New Threat - Package Hallucination |
Written by Sue Gee | |||
Wednesday, 07 May 2025 | |||
The rise and rise of reliance on LLMs for code generation has resulted in a new threat to software supply chains. Dubbed "package hallucination", this occurs when LLMs generation references to non-existent packages. Package hallucination is a novel phenomenon explored in a paper to be presented at the 2025 USENIX Security Symposium. The tendency of LLMs to "hallucinate", that is to invent bogus information that goes beyond their training data, is already well known. Now a study, led by Joseph Spracklen of the University of Texas, identifies a specific type in misinformation that could seriously compromise AI-generated code. Akin to the package confusion, or dependency confusion, "package hallucination" occurs when an LLM generates code that recommends or contains a reference to a package that does not actually exist. As the paper explains, an adversary can exploit package hallucinations, especially if they are repeated, by publishing a package containing some malicious code or functionality to an open-source repository with the same name as the hallucinated package. The study used 16 of the most widely used large language models to generate 576,000 code samples in Python and JavaScript and found that over 440,000 of the package dependencies they contained, almost 20% of all dependencies. were non-existent.
With regard to LLMs, the temperature parameter is a numerical value that is used to adjust the degree of randomness of the generated responses - a lower temperature results in more predictable and deterministic outputs, while a higher temperature increases creativity and diversity in the responses. While the range for the temperature parameters is generally between 0 and 2 for commercial LLMs, Anthropic (not included in this study) limits it to between 0 and 1 and the recommended range for the task of generating package names is considered to be 0 to 0.3 while higher values are intended for storytelling, poetry, and brainstorming where creativity is to be encouraged. For the purposes of this study temperature was varied between the minimum and maximum allowed, i.e. between 0 and 2 in the GPT models and between 0 and 5 in the open source LLMs. As shown in these graphs lower temperatures produced lower hallucination rates on package names, while higher temperatures significantly increased them. Notice how the range of values on the y-axis is much wider for the two open source models than the OpenAI models and whereas at maximum temperature GPT-3.5 had a rate of 31.8% hallucination, GPT-4's rate was only 8.9%, Compared to temperature, employing the decoding strategies of top-p and top-k, which are intended to reduce the chances of a low probability token being selected as a potential package, assuming that lower probability tokens correspond to higher probabilities of hallucination, resulted only in a slight increase in hallucination rates (1.16% on average). The other factor that affected hallucination rates was the recency of the data. In order to evaluate whether the hallucination rate was correlated with topics/packages that emerged after the model was trained the coding prompts had been divided into two temporal datasets A lower difference between the rates of recent and all-time prompts would indicate a better performance in handling questions falling outside the model’s pre-training data and therefore a more generalizable model. The models tested were shown to be more likely to generate a package hallucination when responding to prompts that dealt with more recent topics producing a 10% higher hallucination rate on average for older data versus more recent data. There was also a clear correlation between the number of unique package names generated during testing and the rate of hallucination, demonstrating that the more verbose a model the greater the incidence of invented package names. The chart below also reveals the superiority of the GPT models which are in a cluster well below the regression line.
More InformationWe Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs by Joseph Spracklen, University of Texas at San Antonio; Raveen Wijewickrama, University of Texas at San Antonio; AHM Nazmus Sakib, University of Texas at San Antonio; Anindya Maiti University of Oklahoma; Bimal Viswanath, Virginia Tech; Murtuza Jadliwala, University of Texas at San Antonio
Related ArticlesGitHub Copilot Provides Productivity Boost GitHub Sees Exponential Rise In AI GitHub Announces AI-Powered Changes To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
Comments
or email your comment to: comments@i-programmer.info |
|||
Last Updated ( Wednesday, 07 May 2025 ) |