You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am curious about how to reproduce the result of the model.I tried to use llama-factroy to train llama3.1-8B-instrutc using full-stf.Sadly,after my training the model,i got a poorer result comparing to the base model.There is some train detail below.
method
stage: sft
do_train: true
finetuning_type: full
deepspeed: examples/deepspeed/ds_z3_config.json
I am curious about how to reproduce the result of the model.I tried to use llama-factroy to train llama3.1-8B-instrutc using full-stf.Sadly,after my training the model,i got a poorer result comparing to the base model.There is some train detail below.
method
stage: sft
do_train: true
finetuning_type: full
deepspeed: examples/deepspeed/ds_z3_config.json
dataset
dataset: alpaca_gpt4_data
template: llama3
cutoff_len: 2048
max_samples: 20000
overwrite_cache: true
preprocessing_num_workers: 16
output
output_dir: saves/LLaMA3.1-8B/model_id/full/mix/train_08240042
logging_steps: 100
save_steps: 2000
plot_loss: true
overwrite_output_dir: true
train
per_device_train_batch_size: 1
gradient_accumulation_steps: 2
learning_rate: 2.0e-4
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
The text was updated successfully, but these errors were encountered: