Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alpaca eval config file #433

Open
zwRuan opened this issue Jan 4, 2025 · 0 comments
Open

alpaca eval config file #433

zwRuan opened this issue Jan 4, 2025 · 0 comments

Comments

@zwRuan
Copy link

zwRuan commented Jan 4, 2025

What is the difference between the following two config evaluations? My openai url model cannot use logprob_parser, can I use the second one to evaluate the performance of my model's output on alpaca eval2?The max_tokens in the second config is set to 50. Does this mean a 50x difference in evaluation cost?

gpt4_turbo_logprob:
prompt_template: "gpt4_turbo_clf/basic_clf_prompt.txt"
fn_completions: "openai_completions"
completions_kwargs:
model_name: "gpt-4-1106-preview"
max_tokens: 1
temperature: 1 # temperature should be applied for sampling, so that should make no effect.
logprobs: true
top_logprobs: 5
fn_completion_parser: "logprob_parser"
completion_parser_kwargs:
numerator_token: "A"
denominator_tokens: [ "A", "B" ]
is_binarize: false
completion_key: "completions_all"
batch_size: 1

gpt4_turbo:
prompt_template: "chatgpt/basic_prompt.txt"
fn_completions: "openai_completions"
completions_kwargs:
model_name: "gpt-4-1106-preview"
max_tokens: 50
temperature: 0
completion_parser_kwargs:
outputs_to_match:
1: '(?:^|\n) ?Output (a)'
2: '(?:^|\n) ?Output (b)'
batch_size: 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant