Meta Releases AI Safety Tools
Written by Kay Ewbank   
Monday, 01 January 2024

Meta has released open source tools for checking the safety of generative AI models before they are used publicly. The interestingly named Purple Llama is an umbrella project featuring open trust and safety tools and evaluations that Meta says is meant to level the playing field for developers to responsibly deploy generative AI models and experiences in accordance with best practices.

The first group of tools being released is CyberSec Eval, a set of cybersecurity safety evaluations benchmarks for LLMs; and Llama Guard, a safety classifier for input/output filtering that is optimized for ease of deployment.

purple llama logo

 

If you're thinking Purple Llama has worrying overtones of Barney the Purple Dinosaur, relax. Meta believes that mitigating the challenges that generative AI presents means taking both attack (red team) and defensive (blue team) postures. Purple teaming, composed of both red and blue team responsibilities, is a collaborative approach to evaluating and mitigating potential risks. So now you know.

CyberSecEval is a benchmark developed to help bolster the cybersecurity of Large Language Models (LLMs) employed as coding assistants. It provides a thorough evaluation of LLMs in two crucial security domains: their propensity to generate insecure code and their level of compliance when asked to assist in cyberattacks. In a paper on the benchmark, Meta researchers described a case study involving seven models from the Llama2, codeLlama, and OpenAI GPT large language model families, in which CyberSecEval pinpointed key cybersecurity risks along with practical insights for refining these models. A significant observation from the study was the tendency of more advanced models to suggest insecure code.

Llama Guard, the second tool, is an LLM-based input-output safeguard model geared towards Human-AI conversation uses. The model incorporates a tool for categorizing a specific set of safety risks found in LLM prompts (i.e., prompt classification).  Llama Guard is instruction-tuned on Meta's collected dataset. It functions as a language model, carrying out multi-class classification and generating binary decision scores.

purple llama logo

More Information

Welcome to Purple Llama

CyberSec Eval

Llama Guard

Related Articles

Meta Releases Buck2 Build System

Meta Builds AI Supercomputer

Meta Announces Conversational AI Project

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


JetBrains Updates IDEs With AI Code Completion
04/04/2024

JetBrains has launched the first set of updates for 2024 of its JetBrains IDEs. The new versions include full-line code autocompletion powered by locally run AI models.



Spider Courtship Decoded by Machine Learning
07/04/2024

Using machine learning to filter out unwanted sounds and to isolate the signals made by three species of wolf spider has not only contributed to an understanding of arachnid courtship behavior, b [ ... ]


More News

raspberry pi books

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Monday, 01 January 2024 )