Deep Dive9 min readMay 9, 2025

GML Explained: The XML-Based GIS Format Used by Governments

If you've ever downloaded cadastral data from a European government portal and stared at gigabytes of XML, you've met GML. Here's why governments love it and how to extract useful data from it.

Outside specialist circles, GML is the GIS format hardly anyone uses voluntarily. Inside government, it's the format almost everything is published in. This article explains why, and how to handle GML in a workflow built around more web-friendly formats.

What GML is

GML (Geography Markup Language) is an OGC standard, also published as ISO 19136, that defines an XML grammar for expressing geographic features, geometries, coordinate reference systems, observations, and topology. The current version is 3.2.1, published in 2007.

Unlike GeoJSON or Shapefile, GML is not a single file format with a fixed schema. It is a meta-format — a vocabulary for defining schemas. Each application profile (German ALKIS NAS, Dutch BAG, French Cadastre, EU INSPIRE schemas) defines its own XML schema constraining which GML constructs are allowed and which feature types may appear.

A typical raw GML file looks like:

<?xml version="1.0" encoding="UTF-8"?>
<wfs:FeatureCollection 
    xmlns:wfs="http://www.opengis.net/wfs/2.0"
    xmlns:gml="http://www.opengis.net/gml/3.2"
    xmlns:au="http://inspire.ec.europa.eu/schemas/au/4.0">
  <wfs:member>
    <au:AdministrativeUnit gml:id="DE_BB_12054">
      <au:geometry>
        <gml:MultiSurface srsName="urn:ogc:def:crs:EPSG::25833">
          <gml:surfaceMember>
            <gml:Polygon>
              <gml:exterior>
                <gml:LinearRing>
                  <gml:posList>...</gml:posList>
                </gml:LinearRing>
              </gml:exterior>
            </gml:Polygon>
          </gml:surfaceMember>
        </gml:MultiSurface>
      </au:geometry>
      <au:nationalCode>12054</au:nationalCode>
      <au:nationalLevel>4thOrder</au:nationalLevel>
    </au:AdministrativeUnit>
  </wfs:member>
</wfs:FeatureCollection>

For a single administrative boundary, that's 600 bytes of XML to express ~30 bytes of useful data. The verbosity is the price of self-description.

Why governments use it

Three reasons:

1. Schema validation. Every GML application profile has an XML Schema Definition (.xsd) that strictly types every field. A receiver can validate an incoming file against the schema and reject anything malformed. For exchange between agencies, where mistakes are expensive, this matters more than file size.

2. INSPIRE compliance. The EU INSPIRE directive (2007) mandates that member states publish standardised, machine-readable spatial data. The directive specifies XML schemas based on GML for every theme (administrative units, addresses, transport networks, hydrography). Compliance means publishing in GML.

3. CRS rigour. GML carries the CRS as a per-geometry attribute (srsName="urn:ogc:def:crs:EPSG::25833"), not as a per-file metadata blob. For datasets spanning multiple CRSs (rare but real), this is the only format that handles it correctly.

The big real-world flavours

Most GML you'll encounter is one of:

  • INSPIRE feeds (AU administrative units, AD addresses, HY hydrography, TN transport networks, etc.) — published by every EU member state via national geoportals.
  • German ALKIS NAS — the federal cadastre exchange format. A specialised GML profile so distinctive that GDAL has a dedicated NAS driver for it.
  • Dutch BAG (Basisregistraties Adressen en Gebouwen) — addresses and buildings.
  • French Cadastre EDIGEO — older format, GML-based.
  • CityGML — 3D city models with explicit Building, TINRelief, WaterBody types and levels of detail (LoD0 through LoD4).
  • AIXM — aeronautical information exchange.
  • S-100 — hydrographic data (next-gen ENC charts).

If you're working with any of these, GML is the format you receive. The first question is always: which application profile? Because that determines which GDAL driver and which XSD schema you need.

Converting GML to something useful

The simplest approach with simple GML files:

ogr2ogr -f GPKG output.gpkg input.gml

or

ogr2ogr -f GeoJSON output.geojson input.gml

This works for plain GML profiles. For complex profiles, you'll need help:

For ALKIS NAS (German cadastre):

ogr2ogr -f GPKG output.gpkg input.xml \
  --config NAS_INDICATOR "NAS-Operationen.xsd" \
  --config GML_SKIP_RESOLVE_ELEMS NONE

The NAS driver requires GDAL built with libxml2. On macOS/Linux, the Homebrew and conda-forge GDAL builds include it.

For INSPIRE feeds: download the .xsd schema alongside the .gml, then run:

ogr2ogr -f GPKG output.gpkg input.gml \
  --config GML_GFS_TEMPLATE schema.gfs

The .gfs (GML Feature Schema) file is generated from the .xsd the first time GDAL opens the file — keep it around to speed up subsequent runs.

For CityGML: GDAL handles the geometry, but you'll lose the level-of-detail semantics. For full CityGML processing, use the 3DCityDB software stack — it's designed specifically for this format.

Our online GML to GeoJSON converter and GML to GeoPackage converter handle simple GML profiles.

Handling the size

Real GML files from government portals are large. A national-scale INSPIRE dataset can run to multiple gigabytes. Three strategies:

1. Stream rather than load. GDAL reads GML in a streaming fashion — features are emitted as they're parsed, not after the whole file is in memory. This means you can convert a 50 GB GML to a 5 GB GeoPackage on a laptop with 8 GB of RAM.

2. Filter early. Use -sql to extract only the features you need:

ogr2ogr -f GPKG output.gpkg input.gml \
  -sql "SELECT * FROM AdministrativeUnit WHERE nationalLevel = '4thOrder'"

3. Split spatially. For huge nationwide datasets, use a bounding box filter:

ogr2ogr -f GPKG output.gpkg input.gml -spat 13.0 52.0 14.0 53.0

This extracts only features intersecting the Berlin bounding box.

Validating GML

GML's strict schema is a double-edged sword: you can validate properly, but you have to know which schema. The standard approach:

xmllint --schema inspire-au.xsd input.gml --noout

This validates against the schema and reports any violations. The schema URLs are documented in the data publisher's metadata.

For structural validation only (well-formed XML, valid GML namespace, basic geometry structure), use our GML validator. It catches the common issues — missing namespace, malformed XML, invalid geometry encoding — without needing the application's specific XSD.

Should you ever write GML?

For a publication channel, only if mandated. The audience for raw GML is small (other GIS systems with strict schema validation), and the file sizes punish casual consumption.

If you must publish in GML — say, to comply with INSPIRE — the workflow is to keep your master data in PostGIS or GeoPackage, then export to GML via ogr2ogr -f GML for each release, validating against the relevant XSD before shipping.

For most workflows, GML is something you receive and convert away from. The faster you transform it into GeoJSON or GeoPackage, the easier the rest of the work becomes.

Related Converters

Format References