Recommended server endpoint
#49
by
RonanMcGovern
- opened
I note that prefix attention is not supported by vLLM yet?
Is there a recommended inference library?
I note that prefix attention is not supported by vLLM yet?
Is there a recommended inference library?