Data8 min readMarch 20, 2026

JSON, CSV, XML, YAML: Data Formats Compared

Why can't everything just be a spreadsheet? Because data has shape — nested objects, typed values, optional fields — and a flat grid of cells can't express that. JSON, CSV, XML, and YAML each solve the problem of structuring data, but they make radically different trade-offs. Understanding those trade-offs saves you from forcing a square peg into a round hole.

CSV: the simplest table you can make

Comma-Separated Values is exactly what it sounds like: rows of text separated by commas (or tabs, or semicolons — that ambiguity is part of the problem). CSV predates the personal computer. Its strength is that any spreadsheet app, database, and programming language can read it instantly.

name,age,active
Alice,30,true
Bob,25,false

The limitations are real, though. CSV has no concept of nesting, no data types (everything is a string), no standard for escaping commas inside values, and no way to represent null. A comma inside a city name like "Lancaster, PA" breaks naive parsers. Despite all this, CSV remains the lingua franca for tabular data exchange.

JSON: the web's data format

Douglas Crockford didn't invent JSON — he discovered it. In 2001, he noticed that a subset of JavaScript's object literal syntax was a perfectly good, language-independent data format. He wrote the spec on a single page and the web adopted it almost overnight.

[
  { "name": "Alice", "age": 30, "active": true },
  { "name": "Bob", "age": 25, "active": false }
]

JSON supports strings, numbers, booleans, null, arrays, and nested objects. It's compact enough for API responses, readable enough for config files, and parsed natively by every modern language. The main complaint? No comments. Crockford left them out deliberately, arguing that comments encourage hacks in config files.

Why JSON won the web: Every browser ships a native JSON.parse() function. No library, no dependency, no import — it just works.

XML: verbose but powerful

eXtensible Markup Language was the enterprise standard before JSON took over. Derived from SGML (the parent of HTML), XML uses opening and closing tags to structure data. It's verbose, but that verbosity buys you features JSON lacks: attributes, namespaces, schemas, and mixed content.

<people>
  <person active="true">
    <name>Alice</name>
    <age>30</age>
  </person>
  <person active="false">
    <name>Bob</name>
    <age>25</age>
  </person>
</people>

XML is still everywhere in enterprise systems — SOAP APIs, SVG graphics, RSS feeds, and Android layouts all use it. But for new projects, the tag overhead makes most developers reach for JSON or YAML instead.

YAML: human-readable, machine-parseable

YAML Ain't Markup Language trades curly braces and angle brackets for indentation. The result is the most readable data format on this list — and arguably the most error-prone. A single misplaced space can change the meaning of an entire document.

- name: Alice
  age: 30
  active: true
- name: Bob
  age: 25
  active: false

YAML is the default for Kubernetes configs, GitHub Actions workflows, Docker Compose files, and many CI/CD pipelines. It supports comments (a big win over JSON), anchors for reusing values, and multi-line strings. The trade-off is that its implicit typing can surprise you — no becomes a boolean false, and 3.10 becomes the number 3.1.

The Norway problem: In YAML 1.1, the country code NO is parsed as boolean false. This infamous bug has bitten countless developers working with country data.

Side-by-side comparison

Here's how the same data looks in all four formats:

Feature	CSV	JSON	XML	YAML
Human-readable	Fair	Good	Fair	Excellent
Supports nesting	No	Yes	Yes	Yes
Typed values	No	Yes	No*	Yes
Comments	No	No	Yes	Yes
Common use	Tabular data	APIs, configs	Enterprise, markup	DevOps, configs
Verbosity	Minimal	Low	High	Low

*XML values are all strings by default; schemas can enforce types but add complexity.

Relative file size for 1,000 records (KB)

Which format should you choose?

Choose CSV when...

Your data is flat and tabular (rows and columns)
You need to import/export from spreadsheet software
File size matters and you don't need nesting

Choose JSON when...

You're building or consuming web APIs
Your data has nested structures
You need typed values (numbers, booleans, null)

Choose XML when...

You need schema validation or namespaces
You're working with legacy enterprise systems
Your data mixes content and markup (like HTML)

Choose YAML when...

Humans will read and edit the file frequently
You need comments in your configuration
You're writing DevOps configs (Kubernetes, CI/CD)

No data format is universally superior. The best choice depends on who will read the data, what tools will process it, and whether your data is flat or deeply nested.