Jetstream: New LLM Inference Engine
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome). - google/JetStream
Read more here: External Link
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome). - google/JetStream
Read more here: External Link