How to use this comparison
Format choice is a function of who reads the file next. The axes in the table below capture what actually matters in that handoff: size ceilings, CRS handling, attribute fidelity, multi-layer capability, and whether a consumer reads the format natively.
The table
| Format | Max size | Multi-layer | CRS support | Browser-native | Attribute types | Best for | Avoid when |
|---|---|---|---|---|---|---|---|
| GeoJSON | None (practical ~1 GB) | No | WGS 84 only (RFC 7946) | Yes (every web library) | JSON: string, number, bool, null, array, object | Web maps, APIs, tooling interchange | Multi-layer datasets, projected CRSs |
| Shapefile | 2 GB per .shp / .dbf | No | Per-file via .prj | No | dBase: string, int, float, date — no time, no bool, no null | Legacy GIS exchange, government delivery | New projects, multi-layer, long field names |
| GeoPackage | 140 TB (SQLite cap) | Yes | Per-layer, any EPSG | No | SQLite: full SQL types incl. blobs and nulls | Modern desktop GIS, multi-layer projects | In-browser delivery, REST API responses |
| KML / KMZ | None (practical ~10s MB) | Folders simulate | WGS 84 only | Partial (Google Earth, some libs) | Strings + typed ExtendedData | Google Earth handoff, consumer mapping | Heavy GIS workflows, analytical use |
| GML | None (practical GBs) | Yes (named feature types) | Per-feature, any EPSG | No | XSD-typed: full XML type system | Government / INSPIRE, schema-bound exchange | Anything you want a human to read |
| TopoJSON | None (practical 100s MB) | Multiple objects per file | WGS 84 only | Yes (D3, topojson-client) | JSON-serialisable only | Topology-sharing web maps, choropleths | Datasets without shared boundaries |
| WKT | None | No (one geometry per string) | None (implicit / external) | Yes (with libraries) | None — geometry only | Database queries, geometry-only exchange | Anything with attributes |
| DXF | None (practical 100s MB) | Layers within drawing | None (units only) | No | Block attributes only | CAD interchange, surveyor / architect handoff | Anything requiring CRS or rich attributes |
| CSV | None | No | None (lat/lon column convention) | Yes (every browser) | Strings only (typed downstream) | Tabular sensor data, point-cloud lists | Lines, polygons, multi-geometry data |
| PostGIS | 32 TB per table | Multiple tables / schemas | Per-column via SRID | No | Full Postgres type system | Server-side analytics, transactional GIS | File handoffs (it's a database, not a file) |
Notes on the columns
Max size. "None" means there is no inherent spec limit but practical limits exist (memory at read time, browser fetch ceilings, transport timeouts). Shapefile's 2 GB is a hard 32-bit offset cap. GeoPackage's 140 TB is the SQLite ceiling nobody will ever hit in practice.
Multi-layer. Whether one file can carry several feature collections elegantly. GeoPackage is the only widely-deployed format that does this cleanly. GML can via named feature types. KML simulates with folders. Everything else needs ZIP archives or per-feature conventions.
CRS support. GeoJSON and TopoJSON are pinned to WGS 84 by their RFCs; embedding any other CRS makes the file non-conformant. Shapefile carries CRS in a sibling .prj (easy to lose). GeoPackage and GML carry CRS as first-class metadata — the most robust choice.
Browser-native. Does a typical web map library read this directly? GeoJSON yes everywhere. TopoJSON yes with a small companion library. CSV yes with parsing. Shapefile and GeoPackage no without a WASM shim.
Attribute types. Expressivity of the column type system. Shapefile is the constraint case: 10-char names, no nulls, no booleans, dates without time. GeoPackage (SQLite under the hood) is the most flexible.
The decision in five common scenarios
- Publishing a public dataset: ship GeoPackage as primary plus GeoJSON as a web-friendly secondary. Skip Shapefile unless legally required.
- Building a Leaflet / Mapbox map: GeoJSON if under 5 MB, TopoJSON if topology-shared (adjacent polygons), tiled vector formats for anything larger.
- Government agency requires "GIS deliverables": read the procurement spec verbatim. "Shapefile" means Shapefile. "GIS-readable" means GeoPackage.
- Handing data to a CAD user: DXF. Coordinate units only — agree on CRS in the cover email.
- Storing data for an application: PostGIS. Everything else is a file format, not a data store.
What's not in the table
- Raster formats (GeoTIFF, COG, NetCDF). Different beast — different comparison.
- Point clouds (LAS, LAZ, COPC). Different beast again.
- Tile formats (MBTiles, PMTiles, vector tiles). Distribution formats, not authoring.
- Cloud-native vectors (FlatGeobuf, GeoParquet). Worth a separate post — they're the present and future of cloud-optimised vector data.
For most working GIS at the vector layer, the 10 above cover every conversation.
Closing rule of thumb
Default to GeoPackage for new authoring, GeoJSON for web delivery, Shapefile only when explicitly required, and PostGIS when you need a database. The other formats serve specific niches; reach for them when the niche fits.