nsa

joined 1 year ago
 

DecodingTrust is the Adversarial GLUE Benchmark. DecodingTrust aims at providing a thorough assessment of trustworthiness in GPT models.

This research endeavor is designed to help researchers and practitioners better understand the capabilities, limitations, and potential risks involved in deploying these state-of-the-art Large Language Models (LLMs).
This project is organized around the following eight primary perspectives of trustworthiness, including:

  • Toxicity
  • Stereotype and bias
  • Adversarial robustness
  • Out-of-Distribution Robustness
  • Privacy
  • Robustness to Adversarial Demonstrations
  • Machine Ethics
  • Fairness

Paper: https://arxiv.org/abs/2306.11698
Repo: https://github.com/AI-secure/DecodingTrust

 

Here's some preliminary work from Microsoft from 2022 that incorporates OpenAI's Codex model to make NPCs that can interact with the player using natural language instructions. It works by defining an API of functions the bot can use, then having Codex generate function calls in response to the player's instructions.

Paper: https://aclanthology.org/2022.wordplay-1.3/
Repo: https://github.com/microsoft/interactive-minecraft-npcs
Videos: Introductory Demo, Escape Room Demo

 
view more: ‹ prev next ›