Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train 成功了,分享一些细节。 trained successfully! Share some details with guys. #25

Open
LYCnight opened this issue Sep 4, 2024 · 8 comments
Labels
documentation Improvements or additions to documentation

Comments

@LYCnight
Copy link

LYCnight commented Sep 4, 2024

train成功了,分享一下环境:

  • 系统环境:
    • python==3.11.9
    • transformers==4.33.0
    • pytorch==2.2.0
    • flash-attn==2.6.3
    • ninja==1.11.1.1
    • deepspeed==0.15.0
    • wandb==0.17.8
    • /glm-4-9b 目录下的 modeling_chatglm.pytokenization_chatglm.py 都已经替换
    • 在 /ds_config/stage3.json 中设置 "stage3_prefetch_bucket_size": 15099494
    • 在 /glm-4-9b/config.json 中设置 "seq_length": 131072
  • general.json: (仅供测试)
{"messages":[{"role": "user", "content": "北京的天气怎么样?"}, {"role": "assistant", "content": "今天北京的天气晴朗,气温在25度左右。"}]}
{"messages":[{"role": "user", "content": "如何在Java中实现单例模式?"}, {"role": "assistant", "content": "你可以使用懒汉式或饿汉式来实现单例模式。以下是饿汉式的示例代码:\n```java\npublic class Singleton {\n    private static final Singleton instance = new Singleton();\n    private Singleton() {}\n    public static Singleton getInstance() {\n        return instance;\n    }\n}\n```"}]}
{"messages":[{"role": "user", "content": "你知道中国的四大发明是什么吗?"}, {"role": "assistant", "content": "中国的四大发明是造纸术、指南针、火药和印刷术。"}]}
{"messages":[{"role": "user", "content": "有什么推荐的中文书籍吗?"}, {"role": "assistant", "content": "如果你喜欢历史类书籍,我推荐《资治通鉴》。如果你对文学感兴趣,可以看看《红楼梦》。"}]}
{"messages":[{"role": "user", "content": "What's the weather like in New York?"}, {"role": "assistant", "content": "The weather in New York is cloudy with a temperature around 18°C."}]}
{"messages":[{"role": "user", "content": "How can I implement a stack in Python?"}, {"role": "assistant", "content": "You can implement a stack using a list in Python. Here’s a simple example:\n```python\nstack = []\nstack.append(1)\nstack.append(2)\nprint(stack.pop())  # Outputs 2\n```"}]}
{"messages":[{"role": "user", "content": "Can you explain the concept of machine learning?"}, {"role": "assistant", "content": "Machine learning is a field of artificial intelligence where algorithms learn from data to make predictions or decisions without being explicitly programmed."}]}
{"messages":[{"role": "user", "content": "What's a good book to learn about AI?"}, {"role": "assistant", "content": "A great book to start with is 'Artificial Intelligence: A Modern Approach' by Stuart Russell and Peter Norvig. It covers a wide range of AI topics comprehensively."}]}
  • tips:

    • /glm-4-9b 目录下的 modeling_chatglm.pytokenization_chatglm.py 都已经替换

      这一步不确定自己是否成功的小伙伴,可以在这两个代码中自己加一些注释,如果报错了就检查一下报错代码中有没有这些注释。

@bys0318
Copy link
Member

bys0318 commented Sep 4, 2024

Thanks! Here is the English version: The training was successful! Here are the details of the environment:

  • Environment:

    • python==3.11.9
    • transformers==4.33.0
    • pytorch==2.2.0
    • flash-attn==2.6.3
    • ninja==1.11.1.1
    • deepspeed==0.15.0
    • wandb==0.17.8
    • Both modeling_chatglm.py and tokenization_chatglm.py in the /glm-4-9b directory have been replaced.
    • Set "stage3_prefetch_bucket_size": 15099494 in /ds_config/stage3.json.
    • Set "seq_length": 131072 in /glm-4-9b/config.json.
  • general.json: (for testing purposes only)

{"messages":[{"role": "user", "content": "What's the weather like in Beijing?"}, {"role": "assistant", "content": "Today's weather in Beijing is clear with temperatures around 25°C."}]}
{"messages":[{"role": "user", "content": "How can I implement the Singleton pattern in Java?"}, {"role": "assistant", "content": "You can use the eager or lazy initialization method to implement the Singleton pattern. Here's an example of eager initialization:\n```java\npublic class Singleton {\n    private static final Singleton instance = new Singleton();\n    private Singleton() {}\n    public static Singleton getInstance() {\n        return instance;\n    }\n}\n```"}]}
{"messages":[{"role": "user", "content": "Do you know what the Four Great Inventions of China are?"}, {"role": "assistant", "content": "China's Four Great Inventions are paper making, the compass, gunpowder, and printing."}]}
{"messages":[{"role": "user", "content": "Can you recommend any Chinese books?"}, {"role": "assistant", "content": "If you're interested in historical books, I recommend *Zizhi Tongjian*. If you're into literature, you might enjoy *Dream of the Red Chamber*."}]}
{"messages":[{"role": "user", "content": "What's the weather like in New York?"}, {"role": "assistant", "content": "The weather in New York is cloudy with a temperature around 18°C."}]}
{"messages":[{"role": "user", "content": "How can I implement a stack in Python?"}, {"role": "assistant", "content": "You can implement a stack using a list in Python. Here’s a simple example:\n```python\nstack = []\nstack.append(1)\nstack.append(2)\nprint(stack.pop())  # Outputs 2\n```"}]}
{"messages":[{"role": "user", "content": "Can you explain the concept of machine learning?"}, {"role": "assistant", "content": "Machine learning is a field of artificial intelligence where algorithms learn from data to make predictions or decisions without being explicitly programmed."}]}
{"messages":[{"role": "user", "content": "What's a good book to learn about AI?"}, {"role": "assistant", "content": "A great book to start with is 'Artificial Intelligence: A Modern Approach' by Stuart Russell and Peter Norvig. It covers a wide range of AI topics comprehensively."}]}
  • Tips:
    • Both modeling_chatglm.py and tokenization_chatglm.py in the /glm-4-9b directory have been replaced.

If you're unsure whether this step was successful, you can add some comments in these two files yourself. If an error occurs, check if the error code contains those comments.

@bys0318 bys0318 added the documentation Improvements or additions to documentation label Sep 4, 2024
@thunder95
Copy link

@LYCnight 大佬 训练的时候预计要多大的显存

@zhoufn
Copy link

zhoufn commented Sep 5, 2024

@LYCnight 按照大佬的环境,成功的开启了训练,但是为什么训练完后的文件超级大呢?存储空间不够了……
-rw-r--r-- 1 root root 4984147224 Sep 5 10:49 model-00001-of-00004.safetensors
-rw-r--r-- 1 root root 4895071360 Sep 5 10:49 model-00002-of-00004.safetensors
-rw-r--r-- 1 root root 4895071384 Sep 5 10:49 model-00003-of-00004.safetensors
-rw-r--r-- 1 root root 4025651256 Sep 5 10:49 model-00004-of-00004.safetensors

@bys0318
Copy link
Member

bys0318 commented Sep 5, 2024

@LYCnight 大佬 训练的时候预计要多大的显存

GLM-4-9b 32k训练需要8卡80G。如果显存不够可以试试lora或者qlora。

@bys0318
Copy link
Member

bys0318 commented Sep 5, 2024

@LYCnight 按照大佬的环境,成功的开启了训练,但是为什么训练完后的文件超级大呢?存储空间不够了…… -rw-r--r-- 1 root root 4984147224 Sep 5 10:49 model-00001-of-00004.safetensors -rw-r--r-- 1 root root 4895071360 Sep 5 10:49 model-00002-of-00004.safetensors -rw-r--r-- 1 root root 4895071384 Sep 5 10:49 model-00003-of-00004.safetensors -rw-r--r-- 1 root root 4025651256 Sep 5 10:49 model-00004-of-00004.safetensors

9B的模型存下来就是这么大的,你可以设置只存模型参数不存训练的optimizer state,这样文件能小一些。

@LYCnight
Copy link
Author

LYCnight commented Sep 5, 2024

@LYCnight 大佬 训练的时候预计要多大的显存

我的是 8 * 80G 全部跑满

@LYCnight
Copy link
Author

LYCnight commented Sep 5, 2024

@LYCnight 按照大佬的环境,成功的开启了训练,但是为什么训练完后的文件超级大呢?存储空间不够了…… -rw-r--r-- 1 root root 4984147224 Sep 5 10:49 model-00001-of-00004.safetensors -rw-r--r-- 1 root root 4895071360 Sep 5 10:49 model-00002-of-00004.safetensors -rw-r--r-- 1 root root 4895071384 Sep 5 10:49 model-00003-of-00004.safetensors -rw-r--r-- 1 root root 4025651256 Sep 5 10:49 model-00004-of-00004.safetensors

确实大,等等官方的压缩大小的设置的文档吧

@zhangnn520
Copy link

你好,我在运行的时候,配置完毕transformers=4.33.0但是运行的时候出现了下面的错误,暂时不知道怎么解决,其他内容跟楼主参数一致
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 342, in init
self.create_accelerator_and_postprocess()
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3883, in create_accelerator_and_postprocess
self.accelerator = Accelerator(
TypeError: Accelerator.init() got an unexpected keyword argument 'dispatch_batches'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

5 participants