On March 21, Tether announced its BitNet LoRA framework, which optimizes 1-bit LLMs for efficient training and inference on consumer-grade hardware.
It allows significant speed increases in inference, now 2 to 11 times faster on mobile GPUs, while reducing memory consumption by approximately 77.8%.
Leave a Reply