Telegraph English: An Open Protocol for Semantic Compression

A collaborative research initiative exploring the limits of lossless linguistic compression. Join us in developing standards for efficient human-AI and AI-AI communication.
Get In Touch
OUR MISSION: Democratize access to advanced AI by reducing computational barriers through open, reproducible compression protocols.

We believe efficient communication shouldn't be proprietary. Telegraph English is our contribution to a more accessible AI future.
WHAT –
What Is Telegraph English?
Telegraph English (TE) is a structured compression protocol that transforms natural language into a symbol-rich, semantically-dense format. Originally inspired by 19th-century telegraphy's economy of expression, TE provides a formal grammar for reducing computational overhead without sacrificing meaning.:

- Reduces tokens by ~55% while preserving 98% fine-grained facts
- Provides formal grammar rules for reproducible compression
- Works with any LLM without modification
- Enables systematic study of information density in language models
Quick Example
Original: 
"""Earnings per share (EPS) were $3.42 for the fourth quarter, 
exceeding analyst expectations of $3.25. The company reported 
a return on equity (ROE) of 21.8% for the fiscal year."""

TelegraphEnglish: 
"""EPS=USD3.42 Q4 VS EXPECTED=USD3.25
ROE=21.8% FISCAL-YEAR"""
WHY – The Token Drain
The Computational Linguistics Challenge:
- Modern LLMs waste 60-70% of compute on redundant tokens
- Traditional compression loses semantic structure and citations
- No standardized protocol for studying information density

Our Research Questions:
→ What is the theoretical limit of lossless semantic compression?
→ How do compression patterns vary across domains and languages? → Can structured compression improve LLM reasoning about key tokens?

TE kills the padding and keeps the facts,
compounding savings from ingest to inference.

HOW Telegraph English

Seamless Integration, Powerful Results.
  • Compress - Transforms raw text into compact Telegraph English, reducing payload by 65%.
  • Embed - Delivers ready-made vector embeddings with no additional model hosting required.
  • Store - Provides ready-to-upsert JSONL for any vector database, cutting storage costs by half
  • Retrieve - Requires only 45% of normal tokens, making LLM responses up to 55% faster.
  • Fidelity on Demand - Each chunk includes ID mapping for instant conversion back to original text.
  • Live in a Day - Simple batch API and wrappers let most teams implement and test before lunch.
Python Example
We are working on opens source compression API, that will compress, embed, and store in two lines of code.
import terag
from pinecone import Pinecone

# 1. Process document with Telegrapher (in Pinecone-compatible format)
result = terag.process_document("your-doc.pdf", dimension=1536)
pinecone_records = result.to_pinecone()

# 2. Initialize Pinecone client
pc = Pinecone(api_key="YOUR_API_KEY")
index = pc.Index("your-index-name")

# 3. Upsert directly to Pinecone
index.upsert(vectors=pinecone_records)
Five Chores vs One Call
See how future Telegrapher API will replace manual chunking, lossy compression, self-hosted embedding, and custom serializers with a single HTTPS endpoint that streams insert-ready JSONL.
PERFORMANCE SNAPSHOT


  • Token reduction: 55 % avg (LongBench + live corpora)
  • GPT latency: -40–55 % on long answers
  • Precision: F1 +8 % vs full-text baseline
USE CASES


  • RAG platforms – 2 - 3× more context, fewer hallucinations
  • Vector-DB ops – 65 % lower storage & write costs
  • API cost control – slice spend on GPT, Claude, Gemini endpoints
  • Long-context chat – fit richer history without model upgrades
OPEN SCIENCE
📊 Benchmarks & Datasets
- LongBench-TE: Standardized compression evaluation suite
- Domain corpora: Legal, medical, scientific, code
- Multilingual test sets (coming soon)

🔬 Active Research Areas
- Key token preservation theory
- Domain-specific dialect development
- Compression impact on embedding geometry
- TE as pre-processing for smaller models

🤝 How to Contribute
- Implement TE in new languages
- Submit domain-specific test cases
- Propose grammar improvements via RFC - Share experimental results
PRICING – Open Source
Simply - it's FREE.
Open Source & Sustainability
✓ Core Specification: CC-BY-SA 4.0
✓ Reference Implementation: MIT License
✓ Datasets: CC0 (Public Domain)
✓ Optional Cloud API: At-cost pricing for compute (future non-profit effort)

Support the Project:
- Academic partnerships & grants
- Infrastructure sponsorships
- Community contributions
- Optional compute credits
Compression is one-time—every retrieval and generation that follows is permanently cheaper.
Join the Telegraph English Research Community
Join our open science initiative and forward‑looking AI teams
already boosting their token efficiency.

  • 🔬 For Researchers - Access datasets & benchmarks - Collaborate on papers - Propose extensions
  • 💻 For Developers - Integrate TE in your stack - Contribute to core tools - Build domain adapters
  • 🏢 For Organizations - Test TE on your data - Sponsor development - Shape the roadmap
Made on
Tilda