The KERE Research Project Optimizes the Storage Requirements of AI Recommendation Systems

Artificial intelligence is transforming industries and business models—and in the process, consuming an ever-increasing amount of energy. AI systems already account for about one percent of global electricity consumption, and that figure is on the rise. This brings a question to the forefront that is becoming increasingly strategic for companies: How can powerful AI applications be operated in a cost-effective and resource-efficient manner?

The KERE Project – What Is It About?

KERE (“Cost- and Energy-Efficient Implementation of Recommendation Systems”) is a joint research project between ITPower Solutions and the University of Rostock, with GK Artificial Intelligence for Retail AG serving as an associated partner. The project is being carried out as part of the “KMU-innovativ” funding program, financed by the Federal Ministry of Research, Technology, and Space (BMFTR). The focus is on AI-based recommendation systems, specifically Deep Learning Recommendation Models (DLRMs). These models determine which content or products are suggested to users next in online shops, streaming platforms, or digital marketplaces.

These systems are indispensable today—but also expensive to operate. For a typical recommendation system handling tens of millions of daily queries, the cost of cloud resources exceeds 100,000 euros annually. The reason lies in their architecture: to map user behavior and large product catalogs, they require a great deal of storage and computing power. This leads to high infrastructure costs and correspondingly high energy consumption.

Three Approaches for Greater Efficiency: Tensor Trains, Riemannian SGD, FPGA Implementation

KERE follows a holistic approach based on three interrelated strategies.

The first approach reduces memory requirements through a so-called tensor-train decomposition. In this process, the memory-intensive embedding matrices—which contain a large portion of the parameters and can consist of several million rows—are replaced by a “chain” of smaller tensors. Building on Facebook’s TT-Rec approach, KERE automatically determines the optimal decomposition parameters via cluster analysis to save as many parameters as possible while retaining as much information as possible. This reduces memory and computational requirements right at the fundamental system level. In initial tests, the number of parameters was reduced to a fraction of the original value.

The second approach focuses on training the AI models. The goal here is to make learning processes more efficient so that fewer computational steps are required to generate a high-performance model. Instead of classical methods, a specialized training approach using Riemannian Stochastic Gradient Descent (Riemannian SGD) is employed. In contrast to classical Stochastic Gradient Descent (SGD), this approach utilizes the mathematical structure of tensor trains and performs the optimization on a subspace, known as a manifold. This allows the model to be optimized in fewer steps, saving time, energy, and costs.

The third approach aims to efficiently implement a trained model on dedicated hardware. The University of Rostock will apply the computation-coding method to TensorTrains. In this process, matrices are decomposed into submatrices with powers of two. This allows complex matrix multiplications to be efficiently implemented as hardware operations using bit shifts. FPGAs (Field-Programmable Gate Arrays) are particularly well-suited for this purpose. These electronic circuits can be flexibly adapted to the model’s requirements and operate significantly more energy-efficiently than traditional cloud infrastructures. The result is lower power consumption during operation and, in the long term, greater independence from expensive cloud resources. The goal is to implement the entire model on an FPGA. The translation of computation coding into TensorTrain arithmetic represents a scientific innovation—just as the intended full FPGA implementation of a recommendation system has not yet been realized anywhere else.

Optimizing the performance of a recommendation system through the use of TensorTrains and computation coding, and its implementation on an FPGA

The Benefits for Businesses

For businesses that use AI-based recommendation systems, the results are expected to deliver immediate economic benefits. By reducing storage requirements, computational load, and energy consumption, the operating costs of such systems can be significantly lowered. At the same time, this opens up new possibilities: in the future, AI applications could be run more extensively on specialized, proprietary infrastructure—providing greater control over costs, data, and performance.

Another important aspect: efficiency gains and cost reductions can make powerful systems accessible even to small and medium-sized enterprises (SMEs). Application scenarios are conceivable for both users and providers of AI-powered recommendation systems in the fields of e-commerce, social media, and streaming. In this way, the project contributes to embedding the use of AI more broadly across the economy.

KERE will run until mid-2027 and is currently in the implementation phase. If you would like to learn more about the project or discuss specific use cases, please feel free to contact us. We are happy to engage in discussions and provide consulting services on the topic of artificial intelligence!

Do you have any questions? Get in touch. We are happy to assist you!

I’d be happy to answer any questions you may have about our research activities, services, and products! Feel free to contact me or simply schedule an appointment for a free consultation.

Dr. Sadegh Sadeghipour
E-Mail: sadegh.sadeghipour@itpower.de
Phone: +49 (0)30 6098501-11