Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INT8 has a poor performance with groupsize > 0 in Torchchat, compared with BF16 and INT8 groupsize == 0 #1427

Closed
yanbing-j opened this issue Dec 18, 2024 · 3 comments
Assignees
Labels
Quantization Issues related to Quantization or torchao triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@yanbing-j
Copy link
Contributor

yanbing-j commented Dec 18, 2024

🐛 Describe the bug

Hi maintainers,

We find that INT8 with groupsize 0 can achieve relatively good performance than BF16 dtype. While INT8 with groupsize > 0 performs even worse than BF16.

BF16 results:

Warning: Excluding compile in calculations
      Average tokens/sec (total): 6.28
Average tokens/sec (first token): 0.45
Average tokens/sec (next tokens): 7.35

INT8 with groupsize 0:

Warning: Excluding compile in calculations
      Average tokens/sec (total): 6.89
Average tokens/sec (first token): 0.12
Average tokens/sec (next tokens): 12.24

INT8 with groupsize 128:

Warning: Excluding compile in calculations
      Average tokens/sec (total): 2.54
Average tokens/sec (first token): 0.46
Average tokens/sec (next tokens): 2.64

I also investigate INT8 groupsize > 0 in torchao, Only pytorch/ao#1121 has this int8 wo groupwise support with int8_weight_only(group_size=group_size) in _int8wo_groupwise_api. Unfortunately, it eventually runs into F.linear, which is same as torchchat usage and is a slow path.

Is INT8 woq with groupsize a key point in torchchat? Do you have plan to optimize this feature? Thanks!

Reproducer:
numactl --physcpubind=120-159 --membind=3 python3 torchchat.py generate llama3.1 --prompt 'It is done, and submitted. You can play '\''Survival of the Tastiest'\'' on Android, and on the web. Playing on the web works, but you have to simulate multiple touch for table moving and that can be a bit confusing. There is a lot I'\''d like to talk about. I will go through every topic, insted of making the typical what went right/wrong list. Concept Working over the theme was probably one of the hardest tasks which I had to face. Originally, I had an idea of what kind of game I wanted to develop, gameplay wise - something with a lot of enemies/actors, simple graphics, maybe set in space, controlled from a top-down view. I was confident that I could fit any theme around it. In the end, the problem with a theme like '\''Evolution'\'' in a game is that evolution is unassisted. It happens through several seemingly random mutations over time, with the most apt permutation surviving. This genetic car simulator is, in my opinion, a great example of actual evolution of a species facing a challenge. But is it a game? In a game, you need to control something to reach an objective. That control goes against what evolution is supposed to be like. If you allow the user to pick how to evolve something, it'\''s not evolution anymore - it'\''s the equivalent of intelligent design, the fable invented by creationists to combat the idea of evolution. Being agnostic and a Pastafarian, that'\''s not something that rubbed me the right way. Hence, my biggest dillema when deciding what to create was not with what I wanted to create, but with what I did not. I didn'\''t want to create an '\''intelligent design'\'' simulator and wrongly call it evolution. This is a problem, of course, every other contestant also had to face. And judging by the entries submitted, not many managed to work around it. I'\''d say the only real solution was through the use of artificial selection, somehow. So far, I haven'\''t seen any entry using this at its core gameplay. Alas, this is just a fun competition and after a while I decided not to be as strict with the game idea, and allowed myself to pick whatever I thought would work out. My initial idea was to create something where humanity tried to evolve to a next level, but had some kind of foe trying to stop them from doing so. I kind of had this image of human souls flying in space towards a monolith or a space baby (all based in 2001: A Space Odyssey of course) but I couldn'\''t think of compelling (read: serious) mechanics for that. Borgs were my next inspiration, as their whole hypothesis fit pretty well into the evolution theme. But how to make it work? Are you the borg, or fighting the Borg? The third and final idea came to me through my girlfriend, who somehow gave me the idea of making something about the evolution of Pasta. The more I thought about it the more it sounded like it would work, so I decided to go with it. Conversations with my inspiring co-worker Roushey (who also created the '\''Mechanical Underdogs'\'' signature logo for my intros) further matured the concept, as it involved into the idea of having individual pieces of pasta flying around and trying to evolve until they became all-powerful. A secondary idea here was that the game would work to explain how the Flying Spaghetti Monster came to exist - by evolving from a normal dinner table. So the idea evolved more or less into this: you are sitting a table. You have your own plate, with is your '\''base'\''. There are 5 other guests at the table, each with their own plate. Your plate can spawn little pieces of pasta. You do so by '\''ordering'\'' them through a menu. Some pastas are better than others; some are faster, some are stronger. They have varying '\''costs'\'', which are debited from your credits (you start with a number of credits). Once spawned, your pastas start flying around. Their instinct is to fly to other plates, in order to conquer them (the objective of the game is having your pasta conquer all the plates on the table). But they are really autonomous, so after being spawned, you have no control over your pasta (think DotA or LoL creeps). Your pasta doesn'\''t like other people'\''s pasta, so if they meet, they shoot sauce at each other until one dies. You get credits for other pastas your own pasta kill. Once a pasta is in the vicinity of a plate, it starts conquering it for its team. It takes around 10 seconds for a plate to be conquered; less if more pasta from the same team are around. If pasta from other team are around, though, they get locked down in their attempt, unable to conquer the plate, until one of them die (think Battlefield'\''s standard '\''Conquest'\'' mode). You get points every second for every plate you own. Over' --quantize '{"linear:int8": {"bitwidth": 8, "groupsize": 128}}' --num-samples 5 --device cpu --max-new-tokens 128

Versions

torch-2.6.0.dev20241124+cpu-cp310
torchaudio-2.5.0.dev20241121+cpu-cp310
torchvision-0.20.0.dev20241121+cpu-cp310
torchao==0.8.0+git039cef4a
torchchat 4fdbe10

@Jack-Khuu Jack-Khuu added Quantization Issues related to Quantization or torchao triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Dec 18, 2024
@Jack-Khuu
Copy link
Contributor

Jack-Khuu commented Dec 18, 2024

Thanks for flagging, @vmpuri and @jerryzh168 are actually working on deprecating int8 implementations in torchchat in favor of AO.

While we have historically provided support for int8 groupsizes in torchchat, I haven't seen this be a popular use case.
I'm going to defer whether we want to optimize groupsize for int8 to the AO folk, though my hunch is that it isn't hot in demand (right @jerryzh168 @kimishpatel ?)

Varun's PR #1328

@yanbing-j
Copy link
Contributor Author

@Jack-Khuu Thanks for the confirmation! Will keep a check of the replies of you guys! Thanks so much!

@jerryzh168
Copy link
Contributor

jerryzh168 commented Dec 19, 2024

yeah we typically just use per channel int8 weight only quant, but we could check the perf for larger group sizes as well I think, maybe we can check again after the torchao migration is done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Quantization Issues related to Quantization or torchao triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants