Skip to content

Latest commit

 

History

History
51 lines (42 loc) · 3.28 KB

README.md

File metadata and controls

51 lines (42 loc) · 3.28 KB

ScenarioQA: Evaluating Test Scenario Reasoning Capabilities of Large Language Models

Overview

ScenarioQA is an in-depth test and evaluation of the capabilities of different LLMs using high-quality QA pairs under driving scenarios and conditions generated by GPT-4, which we picked due to it having comparable logical reasoning compared to other LLMs. We use the ontology to aid with question generation, along with information about the proper formation of MCQ questions as well as Bloom's taxonomy. Subsequently, we use the generated questions for each reasoning category in the LiteLLM prompt box to compare the results of several LLMs simultaneously.

Python Guidance Library

We used the Python Guidance library due to its ability to call many LLMs including GPT with more efficiency compared to conventional prompting and chaining methods.

All of the scripts for question generation are in the QA Template folder. To run them, follow the steps below:

  1. Make sure you have the proper environmental variables set on your local PC. We are using GPT-4 for the guidance prompts so it was necessary to set the API key in the environmental variables to ensure the prompts run properly.
  2. Install all the dependencies, you need Python Version 3.10 or above to run Guidance. For more information about Guidance check out the Guidance Library
    pip install guidance
    pip install sys
    pip install re
    pip install jason
    

Question Generation

After installing all the dependencies, we can then easily run guidance. All the files are Jypyter notebooks which can be run with Anaconda or VSCode. The prompt to generate questions has the following sections:

  • The Ontology
  • Directions on Proper MCQ Formation
  • Details about Bloom's Taxonomy
  • Directions for Question Generation for GPT
  • A Detailed Driving Scenario (if necessary)
  • Examples of Specific Types of Reasoning Tasks

If you would like to edit the questions outputted or ask for different results, you can modify the [Directions for Question Generation] section and prompt for better questions or more clarity, etc. By changing the directions, the ontology, as well as the scenario, you can create different results or outputs.

Ontology Creation

The ontology formation is a call to GPT multiple times after being given a seed ontology. More information can be found on the OpenXONTOLOGY site. Change the seed and the number of iterations in order to change the results of the ontology.

Scenario Creation

A very simple call to guidance using or modifying this scenario will generate high-quality detailed scenarios that can be used within the scope of this project.

PROMPT: Generate a scenario where a car crash occurred at an intersection between a car and a bike. The car was taking a right
at an intersection when the traffic light was red, however, the car did not check its blindspot and crashed into the 
cyclist. Provide details about the car, the cyclist, the type of road, and the weather.

LiteLLM

Additional directions can be found on the official LiteLLM website.

  1. Install Dependencies:
    pip install Flask
    pip install litellm
    pip install waitress
    
  2. Run main.py
  3. Run app.py