Disclaimer: This is an experimental case study. AISL is not production-ready for all use cases. Token savings vary significantly depending on data shape, key length, and value types. We're sharing this because we think the direction is right — not because the problem is fully solved.
The Setup
We run a RAG (Retrieval-Augmented Generation) pipeline that answers internal queries against a structured product catalogue — roughly 4,000 SKUs with attributes including category, pricing tiers, availability, specifications, and metadata tags.
Every query involved retrieving a subset of that catalogue and passing it to the language model as context. The context was being serialised as JSON. It worked. It also consumed a significant and growing chunk of our token budget every single call.
When AISL crossed our radar, the claim was simple: the same structured data, fewer tokens, no loss of meaning. We decided to test it properly for three weeks on a sandboxed version of the pipeline.
What We Measured
We ran the same 200 representative queries through both pipelines — JSON and AISL — and measured three things: token consumption per query, answer accuracy against a human-validated ground truth set, and parsing reliability (whether the model misread or hallucinated from the structured input).
We used OpenAI's o200k_base tokenizer to count tokens consistently across both formats.
Token Consumption
This was the most straightforward thing to measure and the most dramatic result.
Across the 200 test queries, the average context payload in JSON was 41 tokens per record. The same records in AISL averaged 22 tokens. On a retrieval set of 20 records per query — a typical call in our pipeline — that's a reduction from roughly 820 tokens to 440 tokens of structured context per query.
At scale — thousands of queries per day — that difference is not marginal. It is the difference between a pipeline that fits comfortably in a context window and one that is constantly competing with the retrieval context for space.
| Format | Avg tokens per record | Avg tokens per 20-record retrieval |
|---|---|---|
| JSON | 41 | 820 |
| AISL | 22 | 440 |
| Reduction | — | ~46% |
Answer Accuracy
We were more cautious about this one going in. The concern was that a less human-readable format might increase the rate at which the model misread or misattributed values — producing confident-sounding answers from incorrectly parsed context.
The result surprised us. Across our 200-query evaluation set, accuracy against the ground truth was marginally higher with AISL than with JSON — 91.5% vs 89.0%. We want to be careful about how much weight we put on a 2.5 percentage point difference across 200 queries. It is not a statistically conclusive result. But it is directionally consistent with the theoretical argument for AISL: that eliminating structural noise — the braces, quotes, and repeated delimiters that carry no semantic value — leaves the model with a cleaner signal to parse meaning from.
| Format | Accuracy (200-query eval) |
|---|---|
| JSON | 89.0% |
| AISL | 91.5% |
Parsing Reliability
This was the finding we found most interesting.
In the JSON pipeline, we observed 7 instances across 200 queries where the model's response indicated it had misread or confused a field — attributing a value to the wrong key, or combining fields from two different records. These are not catastrophic failures. They are the kind of subtle misreads that are easy to miss in production but compound over time.
In the AISL pipeline, we observed 2 such instances. The flattened, linear structure — one record, one line, no nested braces to track — appeared to reduce the conditions under which the model lost track of field boundaries.
Again, 200 queries is not a large enough sample to make strong claims. But the direction was consistent enough that we're planning a larger evaluation.
What We Didn't Expect
Conversion was trivial. We wrote a JSON-to-AISL converter in an afternoon. There is also a web-based converter available at aisl-web.github.io/AISL for anyone who wants to test without writing code. The AISL spec is simple enough that writing a parser or converter is a realistic one-day task. Human readability is genuinely reduced. AISL is optimised for AI consumption, not human reading. Our team got used to it quickly, but anyone who needs to read or debug the raw format regularly will find it noticeably less comfortable than JSON. This is a real tradeoff, not just a theoretical one. Savings are not universal. On the handful of query types in our set that returned short, irregular records, the token savings were small or occasionally negative. AISL performs best on uniform, repeated-key structured data — exactly what a product catalogue or database record set looks like. It is not a universal replacement for JSON.What AISL Is — And Isn't
AISL is an open-source, MIT-licensed serialization format built for one specific context: structured data being consumed by a language model. It eliminates the tokens that exist for human readability — quotes, braces, indentation — because a language model doesn't need them. It replaces that overhead with a minimal, deterministic structure that the model can parse reliably.
It is not a replacement for JSON in human-facing APIs. It is not a configuration format. It is not mature tooling with production-grade libraries across every stack. The spec is at version 1.0. The ecosystem is early.
What it is: a format that takes the actual constraints of AI inference seriously and makes a different set of tradeoffs than the formats designed thirty years ago for a different context.
Should You Try It?
If you are running a pipeline that retrieves structured data and passes it to a language model as context — RAG pipelines, function-calling workflows, structured prompt injection — AISL is worth an afternoon of evaluation. The conversion tooling is simple, the format is easy to understand in a sitting, and the token savings on the right data shape are real.
If you need production-grade stability, a large ecosystem of tooling, or a format that humans will read and edit directly, JSON or YAML remain the right answers.
We're continuing to evaluate AISL on a larger query set and across different data types. We'll share what we find. For now, the direction is interesting enough that we wanted to put it on record.
The format is experimental. The problem it's solving is not.
Resources
- AISL Spec & Documentation: [aisl-web.github.io/AISL](https://aisl-web.github.io/AISL)
- JSON to AISL Converter: [aisl-web.github.io/AISL](https://aisl-web.github.io/AISL)
- License: MIT
This evaluation was conducted on an internal pipeline with a specific data shape. Results will vary. AISL is experimental software and should be evaluated for your specific use case before any production adoption.
