top of page

LG AI Research Partners with FuriosaAI to achieve 2.25x Better LLM Inference Performance vs. GPUs

  • Writer: Menlo Times
    Menlo Times
  • Jul 22
  • 2 min read
ree

FuriosaAI’s RNGD (“Renegade”) accelerator has passed LG AI Research’s performance tests with its EXAONE LLMs, delivering high throughput, low latency, and improved energy efficiency. Following this success, Furiosa and LG AI Research will offer the RNGD Server to enterprises deploying LLMs, including across LG’s businesses in electronics, finance, telecom, and biotech. RNGD proved to meet demanding performance and service requirements, positioning it as a next-gen alternative to traditional GPU-based infrastructure.

LG AI Research has adopted FuriosaAI’s RNGD accelerator after a two-year evaluation, citing its power efficiency, cost-effectiveness, and scalability for large language model (LLM) deployments. RNGD, built on the Tensor Contraction Processor (TCP) architecture, delivers up to 512 TFLOPS of FP8 performance at just 180W TDP. The RNGD Server integrates eight accelerators in a 4U air-cooled chassis, with up to five systems fitting a 15kW rack. Tested on 7.8B and 32B parameter versions of EXAONE 3.5, RNGD met real-world LLM performance benchmarks with 4K and 32K context windows.


LG AI Research’s real-world tests show RNGD delivers a major breakthrough in AI inference economics, achieving 2.25x better performance per watt than GPUs. With superior compute density, an RNGD-powered rack produces 3.75x more tokens under the same power limits. Running the EXAONE 3.5 32B model on a single server with four RNGD cards, LG AI Research achieved 60 tokens/sec at 4K and 50 tokens/sec at 32K context windows.


Following installation of RNGD hardware at the Koreit Tower data center, LG AI Research and FuriosaAI launched an enterprise-ready solution by scaling EXAONE models 3.0, 3.5, and 4.0 from single-card to eight-card configurations. Tensor parallelism was applied across both processing elements and RNGD cards, optimizing PCIe paths and communication scheduling to overlap inter-chip DMA with computation. The Furiosa TCP architecture and compiler enabled maximal SRAM reuse, while the vLLM-compatible Furiosa-LLM stack ensured ease of deployment with OpenAI API support, Prometheus monitoring, Kubernetes integration, and a public SDK.


Furiosa and LG AI Research aim to deliver scalable, sustainable, and cost-efficient infrastructure for deploying advanced models and agentic AI. With EXAONE 4.0 support in place, ongoing collaboration includes developing new software capabilities, expanding to broader markets, and supporting enterprise clients. LG AI Research plans to extend access to ChatEXAONE—its EXAONE-powered AI agent for document analysis, deep research, data processing, and RAG—leveraging RNGD to drive this rollout.

Comments


bottom of page