1 The token efficiency problemEvery extra token an LLM processes costs money, adds latency, and risks overrunning context windows. Two approaches have emerged to tackle this challenge, but they differ fundamentally in their methods and limitations:
LLMLingua-2 (Microsoft) applies a binary classifier to decide which tokens to delete, regardless of the content's semantic structure or information density.
Telegraph English (TE) (Telegrapher.ai) transforms text into a compact symbolic dialect that preserves complete semantic units and can persist throughout an entire processing pipeline.
The difference isn't merely technical—it dramatically impacts where and how these systems can be effectively applied in production AI systems.
2 LLMLingua-2's approach and inherent limitationsCore mechanism: GPT-4 labels "must-keep" tokens specifically on the MeetingBank dataset; an XLM-RoBERTA-large classifier learns to replicate this pattern through binary token classification.
User control: Set a fixed compression ratio (rate=0.33) or token budget, applied uniformly regardless of content density or complexity.
Published results: Microsoft reports 2–5× compression with 1.6–2.9× latency gains on benchmarks like MeetingBank and GSM-8K.
Fundamental limitations:- Domain-specific performance: Explicitly trained on MeetingBank meeting transcripts; effectiveness decreases on domains with different token importance distributions
- Fixed-rate compression: Applies the same compression ratio to dense technical content and verbose prose, leading to information loss in complex sections
- Deletion-based approach: By removing connecting tokens, forces LLMs to hallucinate relationships between remaining tokens—potentially introducing errors that defeat the purpose of prompt engineering
- Limited scope: Only compresses the initial prompt; generation steps produce uncompressed outputs, limiting benefits in multi-step processes
- Poor human readability: Microsoft themselves acknowledge that the compressed prompts "may be difficult for humans to understand," making audit and verification challenging
LLMLingua-2 effectively serves as a quick fix for single-prompt scenarios where some information loss is acceptable and exact compression ratio is prioritized over semantic fidelity.
3 Telegraph English: A fundamentally different paradigmTelegraph English isn't merely another compression method—it's a complete semantic representation system that transforms how information is structured:
EARNINGS/SHARE=USD3.42 Q4 VS CONSENSUS=USD3.25
Key distinctions:- Semantic restructuring vs. deletion: TE reorganizes information into atomic fact lines rather than arbitrarily removing tokens
- Density-adaptive compression: Automatically compresses verbose content more aggressively than information-dense sections—no manual ratio tuning required
- Complete semantic units: Preserves facts as complete units, eliminating the need for models to hallucinate connections
- Human readable: After learning ~40 symbols, humans can directly read, audit, and edit TE content
- Domain-agnostic: Rule-based transformation works consistently across all content types without domain-specific training biases
- End-to-end pipeline compatibility: Information can remain in TE format throughout multi-step processes, compound savings across complex workflows
Our internal benchmarks demonstrate 2–3× token reduction while maintaining complete factual fidelity, with performance consistent across diverse content types and domains.
Advanced capabilities that LLMLingua-2 cannot provide:- Line-level fact pruning: Selectively remove entire factual statements while preserving complete semantic units—enabling auditable information prioritization
- TE-to-TE summarization: Further compress already-compact TE text while maintaining the structured format
- Hierarchical reasoning framework: TE's heading structure enables explicit reasoning hierarchies—ideal for complex multi-step thinking
4 Architectural comparison: System-level effectsAspect | LLMLingua-2 | Telegraph English |
Pipeline position | Input-only preprocessing | Full workflow format from input to output |
Content adaptivity | Fixed ratio regardless of content | Automatically adapts to information density |
Semantic integrity | Breaks connections between tokens | Preserves complete semantic units |
Domain generalization | Limited by training distribution | Works across all content types |
Multi-agent systems | Requires recompression at each step | Native communications protocol between agents |
Auditability | Outputs "difficult for humans to understand" (Microsoft's words) | Line-by-line human verification possible |
These architectural differences create vastly different application footprints, with LLMLingua-2 limited to single-step preprocessing while TE enables system-wide optimization.
5 Business impact comparisonPrompt-only scenario (LLMLingua-2's complete footprint): A 2,000-token prompt → 700 tokens at $10/M-token model = $13 saved per 1,000 calls.
Full-pipeline scenario (where TE shines): Same prompt plus five agent steps that each produce 400 tokens.
- Without compression: 2,000 + 5 × 400 = 4,000 tokens processed
- With LLMLingua-2: Still requires ~3,700 tokens (only input compressed)
- With TE: Entire pipeline at ~40% original = ~1,600 tokens
- Result: ~$24 saved per 1,000 calls throughout the pipeline
As systems grow more complex with more processing steps, TE's advantages compound dramatically compared to input-only compression. This translates to significant operational efficiency improvements at scale.
6 Implementation comparisonAspect | LLMLingua-2 | Telegraph English |
Setup | pip install llmlingua | API endpoints for compression/decompression |
Integration | Wrapper around input prompts | Integration with input/output pipeline |
Configurability | Manual ratio tuning required | Zero configuration needed |
Output quality | Varies by domain similarity to training | Consistent across content types |
Human oversight | Limited due to readability issues | Full auditability of compressed format |
Telegraph English's API includes pre-trained translator models for conversion between natural language and TE format. For advanced use cases, bilingual TE/English models are available that work with TE natively without translation overhead.
7 Decision frameworkChoose LLMLingua-2 when:- You need an immediate fix for a single-prompt scenario
- Exact compression ratio control matters more than semantic fidelity
- Your content closely matches LLMLingua-2's training distribution
- Human review of compressed content isn't required
Choose Telegraph English when:- You're building multi-step or multi-agent systems
- Your content varies widely in information density
- Human auditability of compressed content matters
- You need consistent performance across diverse domains
- Long-term efficiency across the entire pipeline is the goal
8 Forward outlookWhile Microsoft continues to iterate on LLMLingua-2, its fundamental limitations as a token classifier remain: domain-specific performance, fixed-rate compression regardless of content, and input-only applicability. These constraints significantly limit its long-term potential.
Telegraph English's roadmap focuses on bilingual models that reason natively in TE format, eliminating translation overhead and enabling true end-to-end semantic efficiency throughout AI systems. This approach fundamentally transforms token efficiency from an input preprocessing step to a system-wide architectural advantage.
Both projects aim to make AI more efficient, but Telegraph English addresses the problem at a deeper architectural level—exactly where sustainable, long-term value is created in complex AI systems.
Ready to see how TE transforms your specific workflow? Contact us to explore a proof-of-concept on your actual workloads.d measure the impact for yourself.