Two ways of adapting LLMs for Recommender Systems

date

Jan 16, 2025

slug

two-way-llmrec

status

Published

Introduction

With the success of Large Language Models (LLMs) in recent days, how to introduce LLMs to enhance recommender systems has attracted attention from both academia and industry.

LLMs have rich open-world knowledge and exhibit some sort of reasoning ability, which could be useful for personalized recommendation. However, there is inherently a gap between LLMs and recommender systems(mainly ID-based or with some extension to process content information) we use currently: LLMs live in text space, while recommender systems live in collaborative(ID) space.

Many researchers have proposed methods to bridge this gap, which could be classified into two main categories:

Adapting LLMs to Recommender Systems (Sequential Adapt)

Adapting Recommendation Tasks to LLMs (Textual Adapt)

Two ways to integrate LLMs with recommender systems. Figure is from: “LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application”

Sequential Adapt(LLM-to-Rec) uses LLMs as a feature encoder for users or items, which could produce a transferable representation. And then some kinds of traditional recommender system models is used to modeling user’s preference towards items.

Textual Adapt(Rec-to-LLM) converts the input for traditional recommender systems to the form used by LLMs, which is textual description. The typical step is to encode user interaction history into a textual prompt before feed it into LLMs. And then the LLMs is asked to generate the items to be recommended directly or with some constraints.

The following sections briefly describe the basic framework of these two kinds of methods for integrating LLMs and Recommender Systems, which is a summary of my understanding to these two kinds of methods.

Sequential Adapt (LLM-to-Rec)

The sequential adapt is relatively straightforward. In traditional sequential recommender systems, we first gather the embeddings of the user’s interaction sequence, go through some kinds of modeling to convert it to a single vector representation representing user’s preference, and finally use it and all the items’ embeddings to evaluate user-item compatibility in collaborative space.

Sequential Adapt mainly change the input layer of the traditional methods: instead of give a dedicated embedding to each item, we use their content information to build their representation. With LLMs’ world-knowledge, we can compress the items’ textual attribute such as title and description, to a single embedding representaion. This kind of representation is more transferable, since semantically similar items will be close in the embedding space.

LEARN is a representative of sequential adapt. The main focus of this kind of methods is to build better item representation with LLMs.

From my pespective, the sequential adapt of LLMs is just a “pro” version of content-based recommendation, where the content encoder could encode the item into a more transferable representation by leveraging their world-knowledge. This is ensentially what previous content-based methods do, with models in much smaller scale.

Textual Adapt (Rec-to-LLM)

The basic idea of textual adapt is to adapt recommendation tasks as language modeling tasks. The typical workflow is to encode user interaction history into textual prompt, and use LLMs to generate recommended items.

However, there are several challenges here:

Item Indexing: how to represent items?

Alignment: LLMs are trained on general tasks, how to enhance LLMs’ capability on recommendation tasks?

Item Generation: how to extract results from LLMs’ output? how to handle out of corpus item generation?

Item Indexing

Item indexing is concerned with assigning unique identifier to an item. For LLMs, this means assigning a sequence of tokens for an item, for example: item_i -> <a_i><b_i><c_i><d_i>, where <{a,b,c,d}_i> is token in the LLM’s vocabulary. The embeddings for these tokens could potentially be tuned during the alignment stage.

Based on my understanding, a good item indexing strategy should have the following properties:

Similar items should share parts of their index, which could make training process faster and easier.

Unrelated items should not have sharing parts in their index to avoid false parameter sharing.

Each item should have an unique index.

The index for an item should not be too long in terms of length in tokens.

Let’s first start with three trivial indexing methods to get a sense:

Random indexing (RID): assign a random numeric id for each item, the numeric id is treated as a string in later tokenization.

Title indexing (TID): use the title or short description of an item as its index.

Independent indexing (IID): assign an new unique token which is not in the original vocabulary of the LLMs for each item.

The disadvantages of the above trivial approaches are obvious:

For RID and TID, false sharing could happens: unrelated items with shared digits or unrelated items with similar titles could share token embedding. For TID, the description may be more informative than titles, but they may be too lengthy, limiting the history the LLMs could see.

For IID, index of related items are independent, which could make training harder, since there are a huge number of new tokens. (Think about the number of items in a recommender systems!)

To overcome the above limitations, many recent research on LLMs for RecSys have proposed advanced item indexing approaches. Two of them seems most reasonable to me:

Collaborative indexing (CID): CID represents an item’s index as a sequence of tokens. Basically it is a multi-level tree representation, the deeper two items’ lowest common ancestor is, the stronger relationship they have. This kind of index capture the collaborative information, frequently co-occured items are likely to share longer prefix in index. With the composition of multi-level tokens, we could represent a large number of items using only hundreds or thousands of new tokens.

Semantic indexing (SemID): SemID uses the metadata of items, such as category, to group items in a tree representation. Some varaints start from description representation and use vector quantization to automatically learn a hidden Semantic index.

CID illustration, from paper: “How to Index Item IDs for Recommendation Foundation Models”

SemID illustration, from paper: “How to Index Item IDs for Recommendation Foundation Models”

How to Index Item IDs for Recommendation Foundation Models introduces many kind of indexing approaches.

LC-Rec, TIGER are two representative work using advanced indexing methods.

Alignment

In previous sections which talking about item indexing, we know that to use those advance indexing approaches, typically we need to expand LLM’s vocabulary. What’s more, LLM is trained to solve general natural language tasks, so to achieve better performance, we need to align it with recommendation tasks.

Several kinds of tasks have been proposed by researchers on LLM4Rec, majority of which could be classified in the following categories:

Target tasks: for example, if we want to use LLM as a sequential recommender, then we should include sequential recommendation task in SFT stage.

Recommendation related understanding tasks, such as intention inference, preference summarization, index-text matching.

The first category targets the desired tasks directly, while the second category aims to inject some recommendation related knowledge into LLMs. These two types of tasks are commonly used together in many LLM4Rec systems.

Item Generation

One obvious approach of item generation is just to let LLMs generate open-ended natural language response, and then extract the recommended item index from the response. However, this approach suffers from out of corpus generation. A commonly used appraoch is trie-based constrained decoding, which is originally proposed for entity retrival in AUTOREGRESSIVE ENTITY RETRIEVAL. In this sense we discard the language generation ability of LLMs (which is lost basically, from my experience, after SFT on recommendation data).

Generally, we use LLMs as a large generative retrieval model, instead of a chatbot that could do recommendation, if you want to achieve competetive performance against traditional methods.