CSV
Spreadsheets with latitude and longitude columns — the start of half of all GIS projects.
- Specification
- RFC 4180
- Released
- 1970s; RFC 4180 in 2005
- When to use
- Use CSV when your data comes out of Excel, a database export, or a sensor log and you need to plot point locations on a map. CSV is also the easiest format to hand-edit, batch-generate from scripts, and feed into statistical tools (R, pandas, Stata). For anything more complex than points with attributes — lines, polygons, mixed geometries — use GeoJSON or GeoPackage instead.
What is CSV?
CSV (Comma-Separated Values) is not a spatial format. It is a tabular text format formalised by RFC 4180 in 2005, with rows separated by line breaks and columns by commas (or tabs, semicolons, pipes, depending on local convention). It becomes spatial when at least two columns can be interpreted as coordinates — most commonly longitude/latitude pairs in decimal degrees, occasionally Easting/Northing in a projected grid, sometimes a single WKT-encoded geometry column for non-point shapes. GDAL's CSV driver auto-detects coordinate columns from a configurable list of names (X_POSSIBLE_NAMES=lon,longitude,x,easting; Y_POSSIBLE_NAMES=lat,latitude,y,northing) and applies an explicit CRS via -a_srs. Without those hints, the file is treated as attribute-only. For non-point geometries, CSV can carry a WKT or GeoJSON string in a column, which GDAL parses with the GEOMETRY=AS_WKT option. Encoding is a perpetual headache — RFC 4180 specifies ASCII, but real-world CSVs arrive as UTF-8, UTF-16, or Windows-1252, and a single mis-decoded byte breaks every row downstream. Always declare encoding explicitly.
Supported by
- Excel, LibreOffice Calc, Numbers, Google Sheets
- QGIS (delimited text layer, native import)
- ArcGIS (XY Table To Point tool)
- GDAL/OGR (driver: CSV)
- Python pandas, R, Stata, SPSS
- Every database (LOAD DATA / COPY commands)
- Mapbox Studio, CartoDB, kepler.gl
Strengths
- Universal — every tool reads it, every human can edit it
- Trivially diffable in source control
- Streams well — process row-by-row without loading the whole file
- Easy to generate from any programming language
- Compact for tabular data
Weaknesses
- No native geometry — must designate coordinate columns explicitly
- Only points work out of the box; lines and polygons need WKT strings
- No CRS metadata — must be specified externally
- Encoding ambiguity (UTF-8 vs Windows-1252) breaks accented characters
- Quoting and escape rules vary subtly between writers
- Delimiter wars: comma in English locales, semicolon in German/French Excel
Converters for CSV
Convert FROM CSV
Convert TO CSV
Frequently Asked Questions
What column names should I use for latitude and longitude?
Pick clear names and stick to them. 'longitude' and 'latitude' are the most explicit. 'lon'/'lat' is short and standard. Avoid 'x'/'y' because they don't tell the reader which is which, and avoid swapping the order: most GIS tools default to longitude first, but some Excel templates list latitude first. When in doubt, add a header comment row explaining the projection.
How do I encode a polygon in CSV?
Add a column containing a WKT string per row, e.g. POLYGON((10.0 53.5, 10.1 53.5, 10.1 53.6, 10.0 53.6, 10.0 53.5)). Tell your reader where to find it — in ogr2ogr that's the -oo GEOMETRY=AS_WKT option together with -oo GEOMETRY_NAME=<column>. QGIS's delimited text dialog also has a 'WKT' geometry option.
Why does my CSV import garble accented characters?
Encoding mismatch. The file is probably UTF-8 but your reader assumes Windows-1252 (or vice versa). Add a sidecar declaring the encoding (some tools support .csvt or .cpg), or set the encoding flag in your reader: ogr2ogr -oo ENCODING=UTF-8. In QGIS, the delimited text dialog has an encoding dropdown.
Comma or semicolon as delimiter?
Comma is the RFC 4180 default and the only choice for true international interoperability. Semicolon is common in European locales because German and French Excel use comma as the decimal separator and need a different field separator. If you must use semicolons, document it and tell readers explicitly.
What CRS does GDAL assume for a CSV with lon/lat columns?
None by default — GDAL will read the geometry but mark the layer's CRS as undefined. Always pass -a_srs EPSG:4326 (or whichever code is correct) to assign the source CRS during conversion. Without it, downstream tools cannot reproject the data.