How to Build a Transportation (Travel Demand) Model: A Practitioner’s Guide

A transportation (travel demand) model is a quantitative framework that forecasts how people and goods move through a region under different land-use and network scenarios. Done well, a model becomes a decision engine: it helps test road and transit projects, pricing policies, and growth plans before spending real money.

This guide covers when you actually need a model, what data it requires, common software, the four-step modeling process, the GIS pieces, calibration/validation, costs, timelines, and practical tips.

When do you need a travel demand model?

Use a regional TDM when you must answer questions like:

Which corridor alternatives best relieve congestion by 2035 or 2045?
How do land-use changes (new housing, employment centers) shift travel patterns?
What is the network-wide effect of a BRT/metro line, tolls, or parking pricing?
How do policy scenarios (fuel cost, fares, telework) affect mode shares?

You do not always need a full TDM:

Small towns/corridors: Sketch planning (HCM methods, ITE trip rates, simple spreadsheets) or microsimulation of a corridor may be enough.
Project-level design: Use operational tools (HCM, SIDRA, VISSIM/SUMO) once the regional demand is known or assumed.

What a TDM can and can’t do

Can

Forecast network-wide demand by O-D, time period, and mode.
Compare alternatives consistently using the same assumptions.
Produce link volumes, speeds, V/C, transit boardings, and accessibility metrics.

Can’t (without extra work)

Predict driver behavior perfectly; models approximate reality.
Replace detailed operations analysis at intersections.
Work reliably without good data and careful calibration.

Core software stack (pick what fits your scope and budget)

Purpose	Commercial	Open-Source / Low-Cost	Notes
Macroscopic 4-step models + transit	PTV VISUM, TransCAD, EMME, CUBE	AequilibraE (QGIS plugin), Emme Classic tools, Python libraries	Strong transit and assignment features in commercial tools; AequilibraE is improving fast.
Dynamic traffic assignment (DTA)	Aimsun Next, Dynameq, PTV Vissim + Visum/Vistro	DTALite, NeXTA	DTA captures time-dependent queuing/spillback; heavier data and calibration needs.
Microsimulation (ops-level)	PTV Vissim, Aimsun	SUMO	Use after TDM to test signal plans/queues at project level.
Activity-based / agent models	(Commercial ABM add-ons)	ActivitySim, MATSim	Higher fidelity (daily activity patterns), higher data and skill requirements.
GIS + data engineering	ArcGIS Pro	QGIS, PostgreSQL/PostGIS, Python, R	Essential for TAZs, network building, GTFS, ETL.
Visualization	Tableau, Power BI	Kepler.gl, QGIS, Python dashboards	For maps, screenlines, KPI dashboards.

Data requirements (the part that makes or breaks your model)

Data Category	Examples	Typical Sources	Notes
Base network	Road centerlines, lanes, speeds, capacities, restrictions	OpenStreetMap, local road agencies, GIS base maps	Conflate to a routable graph; code facility types and turn restrictions.
Transit supply	Routes, stops, headways, fares, access links	GTFS from operators, agency shapefiles	Import GTFS; check stop spacing, transfers, and walk access.
Zones & land use	TAZ boundaries; households, population, employment by sector	Census/Statistics office, planning depts., parcel data	Zones should align with barriers and major roads; keep intrazonal sizes reasonable.
Socio-economics	Income, car ownership, student/worker ratios	Household surveys, census	Critical for mode choice segmentation.
Travel surveys	Household travel survey (HTS), intercepts, RP/SP surveys	Commissioned studies, universities, consultants	Gold standard but expensive; sample should represent all market segments.
Counts & screenlines	24-hr/peak link counts, turning counts, transit boardings	Traffic counts, APC/AVL data	Use for calibration/validation (GEH, RMSE, screenline checks).
O-D/Probe data	Mobile phone (CDR/GPS), app data, floating car travel times	Data vendors, Google/HERE APIs (travel times)	Useful to seed O-D matrices and validate speeds.
Future land use	Growth by TAZ, development plans	MPO/planning agencies	Drives forecast scenarios.

The GIS pieces you cannot skip

Define TAZs that respect physical barriers and land-use homogeneity; avoid very large zones in urban cores.
Build/clean the network: conflate multiple sources; snap nodes; code lanes, speeds, capacities, turn bans, tolls, HOVs, centroid connectors.
Transit import: ingest GTFS; verify headways, fares, access links, transfers.
Skim geography: maintain consistent coordinate systems; precompute walk/bike access distances and impedances.

The classical 4-step modeling procedure (plus data needs)

Trip Generation

Goal: Estimate productions/attractions by purpose (HBW, HBS, HBO, NHB, freight).
Methods: Cross-classification (category analysis), linear/log-linear regression.
Inputs: Households by size/income, employment by sector, auto ownership, school enrollments.
Outputs: Trips by TAZ and purpose (P/A).

Trip Distribution

Goal: Connect P/A to O-D flows.
Method: Gravity model with impedance (time/cost) and friction factors; optional K-factors for special pairs.
Inputs: Skims (time, cost, distance), productions/attractions, seed O-D (if available).
Checks: Average trip length (ATL) by purpose; intrazonal share; reasonableness of flows.

Mode Choice

Goal: Split O-D travel among auto, transit (local/rapid), walk/bike, TNC/taxi.
Method: Multinomial or nested logit (often by market segments: income, car ownership, trip purpose).
Inputs: Skims per mode (in-vehicle time, wait, walk, transfer, fares, parking cost), socio-economics.
Estimation: Use survey RP/SP data to estimate coefficients (value of time, transfer penalties).
Outputs: Mode shares by O-D and purpose.

Assignment

Highway: Static user equilibrium (UE) with volume-delay functions (e.g., BPR), or DTA for time-dependent effects.
Transit: Multi-path assignment with crowding/transfer penalties where supported.
Outputs: Link volumes, speeds, V/C, VHT/VKT, queue proxies; station/line loads for transit.

Skims (the glue)

Before distribution/mode choice, compute skim matrices (shortest-path impedance by time/cost). Update skims iteratively as networks load.

Calibration and validation (how to make the model credible)

Key concepts

Calibration: Adjust model parameters so base-year outputs match observed data.
Validation: Test on hold-out data (or different time period/locations) to ensure transferability.

Typical calibration sequence

Network & speeds: Ensure free-flow speeds and capacities by facility type are realistic; tune volume-delay (BPR) parameters.
Trip generation: Match observed totals by purpose and area type; adjust rates/auto ownership models.
Trip distribution: Calibrate friction factors to match average trip length and intrazonal shares; use K-factors sparingly for known anomalies.
Mode choice: Fit to observed mode shares by segment and corridor; apply reasonable transfer and parking penalties.
Assignment: Match link counts, turning counts, and screenlines; tune centroid connectors and turn penalties.

Common target metrics

Link volume GEH: ≥85% of calibration counts with GEH < 5; most links < 7.
RMSE / %RMSE by facility and volume bin within accepted ranges.
Screenline totals within ±5–10%.
Average trip length within ±5% by purpose.
Mode shares within ±2–3 percentage points overall and by segment.
Transit boardings/line loads within ±10–15% at key locations.

Tools & techniques

Matrix estimation (ME/ODME) using counts and priors.
Sensitivity tests: increase fuel price, change transit headways, add parking cost; verify directional and approximate elasticities make sense.
Split data into calibration and validation sets to avoid overfitting.

How long does it take?

Indicative timelines (heavily scope-dependent):

Scope	Typical Duration	Notes
Small city, classic 4-step, limited new surveys	3–6 months	Reuse existing data; focus on calibration to counts/screenlines.
Mid-size metro, full update with some surveys	6–12 months	Fresh HTS/OD data, transit network coding, policy scenarios.
Large metro, ABM or DTA + new surveys	9–18+ months	Complex networks, multiple agencies, iterative calibration.

Team often includes a PM, data/GIS engineer, modeler, survey specialist, and QA/QC.

What does it cost?

Very rough, order-of-magnitude guidance (varies by region, rates, data, licenses):

Data: Household/OD surveys are the biggest ticket; probe data and counts also add up.
Software: Commercial licenses can range from low five figures to higher, per seat per year; open-source reduces license costs but increases engineering effort.
Delivery:
- Small city, 4-step with limited new data: ~US$50k–150k.
- Metro with new surveys and commercial stack: ~US$300k–$1M+.
  Budget ongoing maintenance for annual updates to networks, land use, and counts.

Is a TDM necessary for every city or town?

Not always. Consider lighter methods when:

The question is local/corridor-specific (e.g., one interchange or main street).
The budget and timeline are constrained, and decisions don’t hinge on network-wide redistribution.
You can defend a decision using HCM, ITE trip rates, targeted counts, and microsimulation.
Escalate to a TDM when cumulative, citywide effects matter or when agencies need a consistent forecasting platform for multiple projects.

Practical workflow you can reuse

Scope: Purposes, time periods, geography, KPIs, scenarios.
Data audit: What exists? What must be collected or purchased?
GIS: TAZs, network, GTFS, centroid connectors.
Base model: Trip gen → distribution → mode choice → assignment. Build skims.
Calibration: Follow the sequence above; document every change.
Validation: Independent counts/screenlines, hold-out geographies.
Scenarios: Future land-use, network projects, policy tests.
Documentation & handover: Model spec, user guide, versioned data, scripts.

Common pitfalls (and how to avoid them)

Messy zones: Oversized TAZs hide short trips and distort intrazonals → refine in urban cores.
Untuned speeds: Unrealistic free-flows or BPR parameters → garbage skims and misallocated demand.
Overusing K-factors: Hide structural issues; fix network or data first.
Ignoring transit access: Missing walk links or transfer penalties → inflated transit shares or misassigned paths.
No version control: Use Git; tag calibration milestones; keep a change log.
Poor documentation: Future users can’t reproduce results → create a data dictionary and runbook.

Deliverables clients and agencies expect

Model files + networks + skims for base and forecast years.
Data dictionary and metadata for every table and parameter.
Calibration report with targets, diagnostics, and achieved metrics.
Scenario book: assumptions, results, maps, KPIs.
Run scripts (batch) to reproduce results end-to-end.

FAQ

Four-step vs. ABM? ABM adds behavioral realism but needs more data and skill; 4-step is faster and adequate for many planning questions.
Static vs. DTA? DTA for time-dependent queuing/spillback when peak-period dynamics matter; otherwise static UE suffices.
SUMO’s role? Use SUMO/VISSIM after the TDM to test intersection control, lane use, and queuing for specific projects.