Compresr

LLM-native context compression

Winter 2026

Artificial Intelligence

Developer Tools

B2B

Enterprise Software

About

Compresr provides an API that compresses LLM context without losing what matters. It’s a drop-in for agents and RAG that cuts token costs and improves accuracy.

AI Research Report

Problem & Solution

Problem / Solution Report

Problem – Modern LLM agents, Retrieval‑Augmented Generation (RAG) systems, and long‑context applications ingest large documents and multi‑turn histories, inflating token counts. High token usage drives up inference costs, slows response times, and can degrade model focus (the so‑called “context rot”). Enterprises deploying production agents at scale face ballooning token bills that limit economic viability.

Solution – Compresr offers a research‑backed context‑compression service (the cmprsr‑v1 model) that reduces input tokens by up to 90 % while preserving semantic fidelity. The service is delivered via an API/SDK that can be dropped into existing agent or RAG pipelines. It supports adjustable compression ratios (up to 10×) and domain‑specific variants (finance, legal, healthcare). Benchmarks claim cost reductions of up to 64 % for a gpt‑5‑mini model and even modest accuracy gains (+3 % at 2× compression).

Technical differentiation – Unlike extractive methods (e.g., LLMLingua‑2), Compresr’s approach is abstractive: smaller LLMs paraphrase and restructure prompts, guided by fine‑tuned SFT + GRPO training to meet exact compression targets. The accompanying paper demonstrates superior performance across LongBench V2, InfiniteBench, and GSM8k compared to both extractive and vanilla abstractive baselines.

Value proposition – By cutting input tokens, customers achieve immediate ROI through lower API costs and faster inference. The “compress‑once‑reuse‑many” model for static corpora further amortizes savings. Compresr thus acts as an efficiency layer in the LLM production stack, complementing existing APIs without sacrificing downstream task quality.

All statements are drawn from the company’s website, documentation, and the publicly available research paper.

Market & Competitors

Market and Competitors Report

Market context – Enterprise generative‑AI spend reached $37 B in 2025, with $12.5 B allocated to foundation‑model APIs (Menlo Ventures). Token processing volumes are massive (≈1.5 quadrillion tokens/month per Fireworks.ai), creating a clear demand for tools that lower token costs.

Competitive landscape – The primary open‑source baseline is LLMLingua‑2 (extractive token classification) which offers modest speed‑ups and memory reductions. Commercial entrants include The Token Company (https://thetokencompany.com/docs) and LLUMO’s Prompt Compression API (https://docs.llumo.ai/cost-saving/setup/api), both offering API‑based token reduction services. Academic work such as the LLMLingua‑2 paper (https://arxiv.org/abs/2403.12968) and related evaluations (https://arxiv.org/html/2403.12968v2) provide benchmark data for extractive methods.

Compresr positioning – Compresr differentiates itself by delivering abstractive, rate‑adherent compression trained specifically for the task, achieving higher compression ratios with maintained or improved downstream accuracy. Its product includes universal and domain‑specific models, a pre‑compressed knowledge base (“Compressed Web”), and a developer‑first API/SDK, targeting production agents and RAG workloads that need reliable, high‑fidelity token savings.

Trends and outlook – As LLM API usage scales and enterprise agents proliferate, token‑efficiency solutions become strategic cost‑saving layers. While extractive tools will remain popular for low‑overhead use cases, abstractive compressors like Compresr that meet enterprise reliability standards can capture premium market share, especially as overall AI inference spend (estimated at $106 B in 2025, projected to $255 B by 2030) continues to rise.

All market size figures and competitor references are taken from the cited public sources.

Total Addressable Market

Quantitative TAM Report

Compresr’s addressable market is anchored in the spend on foundation‑model APIs, which directly correlate with token usage. Menlo Ventures estimates that $12.5 B of the $37 B total enterprise generative‑AI spend in 2025 is devoted to foundation‑model APIs. Fireworks.ai reports that the market processes roughly 1.5 quadrillion tokens per month (≈50 trillion tokens per day), highlighting the massive scale of input‑token costs.

Assuming 40‑60 % of the API spend is driven by input/context tokens, the input‑token‑related spend is roughly $5 B‑$7.5 B. If Compresr can achieve an average 50 % compression (as claimed), it could capture $2.5 B‑$3.75 B in annual token‑cost savings. Monetizing 10‑30 % of those savings would yield a $250 M‑$1.1 B TAM in 2025, with growth expected as token volumes and enterprise adoption increase.

Broader AI inference markets (estimated at $106 B in 2025, growing to $255 B by 2030) further support a rising demand for token‑efficiency tools. Gartner projects worldwide IT spending to rise 9.8 % in 2026, reinforcing the macro‑economic tailwinds for cost‑saving AI infrastructure.

All figures and assumptions are explicitly sourced from market‑size reports and token‑volume estimates.

Founder Analysis

Founders Background Report

Compresr was founded by a team of researchers and engineers from EPFL and leading tech companies. The CEO, Ivan Zakazov, completed a PhD at EPFL focused on LLM context compression and previously worked at Microsoft and Philips Research. Berke Argin, the CAIO, holds a Computer Science degree from EPFL and has experience at UBS, bringing enterprise insight to the team. Kamel Charaf, COO/CPO, earned a Data Science master’s at EPFL and worked at Bell Labs, giving him a strong background in applied research and product development. Oussama Gabouj, CTO, conducted research at EPFL’s DLab and AXA on efficient ML systems and prompt compression.

The founders’ combined expertise spans academic research on abstractive prompt compression (e.g., the Cmprsr paper), large‑scale engineering at Microsoft and Philips, and domain‑specific experience in finance (UBS) and communications (Bell Labs). This blend of deep technical knowledge and industry exposure positions them well to build a commercial API that reduces LLM token usage while maintaining accuracy.

All details are drawn from the Y Combinator company profile and public professional pages.

Unlock Full AI Research Report

Enter your email to access the complete analysis.

We'll never spam you. Unsubscribe anytime.