alpaca eval config file #433

zwRuan · 2025-01-04T13:23:50Z

What is the difference between the following two config evaluations? My openai url model cannot use logprob_parser, can I use the second one to evaluate the performance of my model's output on alpaca eval2？The max_tokens in the second config is set to 50. Does this mean a 50x difference in evaluation cost?

gpt4_turbo_logprob:
prompt_template: "gpt4_turbo_clf/basic_clf_prompt.txt"
fn_completions: "openai_completions"
completions_kwargs:
model_name: "gpt-4-1106-preview"
max_tokens: 1
temperature: 1 # temperature should be applied for sampling, so that should make no effect.
logprobs: true
top_logprobs: 5
fn_completion_parser: "logprob_parser"
completion_parser_kwargs:
numerator_token: "A"
denominator_tokens: [ "A", "B" ]
is_binarize: false
completion_key: "completions_all"
batch_size: 1

gpt4_turbo:
prompt_template: "chatgpt/basic_prompt.txt"
fn_completions: "openai_completions"
completions_kwargs:
model_name: "gpt-4-1106-preview"
max_tokens: 50
temperature: 0
completion_parser_kwargs:
outputs_to_match:
1: '(?:^|\n) ?Output (a)'
2: '(?:^|\n) ?Output (b)'
batch_size: 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alpaca eval config file #433

alpaca eval config file #433

zwRuan commented Jan 4, 2025 •

edited

Loading

alpaca eval config file #433

alpaca eval config file #433

Comments

zwRuan commented Jan 4, 2025 • edited Loading

zwRuan commented Jan 4, 2025 •

edited

Loading