Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The detail of training LLaMA_GPT4 #38

Open
YuanDaoze opened this issue Aug 23, 2024 · 0 comments
Open

The detail of training LLaMA_GPT4 #38

YuanDaoze opened this issue Aug 23, 2024 · 0 comments

Comments

@YuanDaoze
Copy link

YuanDaoze commented Aug 23, 2024

I am curious about how to reproduce the result of the model.I tried to use llama-factroy to train llama3.1-8B-instrutc using full-stf.Sadly,after my training the model,i got a poorer result comparing to the base model.There is some train detail below.

method

stage: sft
do_train: true
finetuning_type: full
deepspeed: examples/deepspeed/ds_z3_config.json

dataset

dataset: alpaca_gpt4_data
template: llama3
cutoff_len: 2048
max_samples: 20000
overwrite_cache: true
preprocessing_num_workers: 16

output

output_dir: saves/LLaMA3.1-8B/model_id/full/mix/train_08240042
logging_steps: 100
save_steps: 2000
plot_loss: true
overwrite_output_dir: true

train

per_device_train_batch_size: 1
gradient_accumulation_steps: 2
learning_rate: 2.0e-4
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant