Spaces:
Running
Hugging Chat is lagging with model like qwen-coder
Whenever i use the model qwen-coder, obviously for coding related problem, it makes juice out of my cpu, starts to lag after some time when generating large reply and eventually freezes the whole page. I understand that a pentium g3240 isn't a decent performer but good enough to handle openai's chatgpt, and most importantly it can handle llama too, so i think it is the problem with the model itself or needs more processing power. And would be glad if devs fix extensive resource usage issue.
same issue here on M4 mac mini - uses 100%+ CPU and freezes the tab/browser when outputting code with qwen-coder
I think devs thought to reduce load on the server by cutting down something, possibly some optimization which will cost them extra server load or else the model is not optimized properly, if not then why llama's 70b model would run better then qwen.