GrandLine Architecture Intelligence. Diagram Quality Technical Note
Revision: 2026-Q2 · Audience: engineers and architects who want to understand why our diagrams look the way they do and what trade-offs we made.
Diagrams are the product. An architecture tool that produces ugly diagrams is a tool that does not get used. We treat diagram quality as a first-class correctness property, not an afterthought.
1. The rendering pipeline
Every architecture diagram in GrandLine is produced by the same four-stage pipeline. The pipeline is deterministic. the same inventory produces the same diagram byte-for-byte, which matters for report reproducibility and for version-controlled architecture documentation.
Inventory → Graph model → Layout (ELK) → Style + Render (Cytoscape) → Export
- Inventory is the normalised
Resource+Relationshiptable. Relationship types areattached_to,routes_to,trusts,encrypts_with,invokes, and a handful of provider-specific specialisations. - Graph model is a provider-agnostic Cytoscape JSON with group nodes for account/VPC/subnet containers and edges typed so the renderer can style them consistently.
- Layout is elkjs 0.9 running the
layeredalgorithm with orthogonal routing. - Render is Cytoscape.js 3.30 on the client for interactive views; a headless Node process uses the same Cytoscape + a canvas polyfill for server-side PNG/SVG export.
- Export produces either PNG (for email-safe reports) or SVG (for zoomable reports and Confluence embedding).
The same pipeline runs in the interactive dashboard, the PDF report, and the demo site. This is deliberate. we never want a customer to see a diagram on screen and then get a different diagram in their exported PDF.
2. Why ELK
We evaluated five layout engines before choosing ELK:
| Engine | Verdict |
|---|---|
| Graphviz (dot) | Beautiful hierarchical layouts; weak at orthogonal routing through container nodes; C dependency awkward in our worker image. |
| D3-force | Organic and pretty; does not produce the "clean bus with orthogonal drops" look that cloud diagrams need. |
| Cytoscape-cose-bilkent | Good for network graphs; poor for layered architecture. |
| Mermaid | Easy; no real control over grouping, crossings, or compound nodes. |
| ELK (Eclipse Layout Kernel) | Chosen. Full support for compound/container nodes (VPCs, subnets), explicit orthogonal routing, rich configuration for crossings, spacing, and rank separation. |
ELK originated in the Eclipse project and powers the layout in the likes of Sprotty and Kieler. It is the only open-source layout engine we found that handles both layered top-down flow and deeply nested compound nodes (account → VPC → subnet → resource) without collapsing or overlapping containers.
3. Grouping and layering
Cloud diagrams are not flat graphs. A production AWS estate has 4–6 levels of nesting. organisation → account → region → VPC → subnet → resource. and every level matters. We render compound nodes at every meaningful level, with the following defaults:
- Top-level group. one per cloud account / Azure subscription / GCP project. Labelled with the native ID and the customer-provided name.
- Region sub-group. for regional resources. We only draw the region container when the diagram spans multiple regions; otherwise it is implicit from the account label.
- VPC / VNet container. a bordered region with the CIDR printed in the label strip.
- Subnet container. nested inside VPC. Visual distinction between public (brand cyan border) and private (muted slate border) subnets.
- Cluster / service-mesh container. for ECS/EKS/AKS/GKE we draw the cluster as a container and place pods/tasks inside.
Grouping uses ELK's elk.hierarchyHandling: INCLUDE_CHILDREN so edges cross cleanly through container boundaries without routing around containers.
4. Edges. the actual work
Edge quality is the largest single determinant of whether a diagram feels professional. We spend a disproportionate fraction of our engineering effort here.
4.1 Orthogonal routing
Every edge is right-angled. We use:
elk.edgeRouting: ORTHOGONAL
elk.spacing.nodeNode: 32
elk.spacing.nodeNodeBetweenLayers: 48
elk.layered.spacing.nodeNodeBetweenLayers: 48
elk.layered.spacing.edgeEdgeBetweenLayers: 12
elk.layered.spacing.edgeNodeBetweenLayers: 16
These values came from iterating against a corpus of 40 real customer topologies plus five canned "torture tests" (very dense hub-and-spoke, deeply nested subnets, heavy cross-region routing).
4.2 Crossing minimization
ELK's elk.layered.crossingMinimization.strategy is set to LAYER_SWEEP with elk.layered.thoroughness: 10 for interactive views and 20 for exported PDFs (we can afford the extra ~100 ms server-side). We also enable elk.layered.crossingMinimization.semiInteractive: true so when a customer manually pins a node, subsequent re-layouts respect the pin but still minimise crossings elsewhere.
4.3 Edge typing and style
Edges are typed:
- traffic (
#1FA9FF, solid, arrow). network reachability, load balancer to target, ALB to ECS service. - data (
#5EC6FF, solid, arrow, slightly thinner). asynchronous data flow (SNS→SQS, Kafka, Event Grid). - trust (muted slate, dashed, diamond head). IAM AssumeRole, Azure RBAC assignment, GCP service-account impersonation.
- encrypts_with (muted slate, dotted, short). KMS key relationships.
- attached_to (no line. rendered as containment instead of an edge). the parent/child relationship between e.g. ENI and instance.
Colour is brand-consistent. every diagram in the product uses the same four colours across the whole UI and reports, so a customer can read any GrandLine diagram without a new legend.
4.4 Auto-split for very large graphs
When a view would contain more than 800 nodes after grouping collapses, we auto-split. The split dimensions are, in order:
- By account / subscription / project.
- By VPC / VNet.
- By resource tag (customer picks the tag key; defaults to
teamthenenv). - By criticality. resources with an open critical finding appear in a "Red cuts" view.
Splits are rendered as sibling diagrams with cross-view edges drawn as labelled stubs (→ prod-web VPC). Users can click a stub to jump to the target view.
5. Export fidelity
The server-side renderer uses the same Cytoscape code as the client, with a canvas polyfill (@napi-rs/canvas) so that text metrics and rounding match the browser. This means:
- PNG exports are what you see in the dashboard. same fonts, same colours, same line weights.
- SVG exports are vector and scale cleanly; we ship font subsets inline to avoid dependency on a viewer font.
- PDF reports embed SVG when practical, PNG only when the diagram size would inflate the PDF beyond 10 MB.
Diagrams in the shipped PDFs are the same diagrams the user can drill into in the dashboard. We treat any drift between the two as a bug.
6. Sample diagram library
We ship seven sample diagrams at v1 that exercise the engine's full range. More will be added as customer estates surface interesting patterns.
AWS (3):
- Three-tier web app on EKS. single account, single region, single VPC.
- Hub-and-spoke landing zone with AWS Cloud WAN core network, AWS Organizations → 3 accounts.
- Serverless ingest. API Gateway, Lambda, DynamoDB, EventBridge.
Azure (2):
- Single-subscription hub-and-spoke with Azure Virtual WAN. Management Group onboarding, Azure Firewall, Azure Application Gateway + Private Endpoints, Azure SQL Database + Azure Blob Storage via Private Link.
- AKS cluster with Microsoft Entra ID Workload Identity Federation. Azure Front Door → Azure Application Gateway → AKS → Azure Service Bus + Azure SQL Database + Azure Key Vault.
GCP (2):
- Multi-project with a Network Connectivity Center (NCC) hub. Shared-VPC style, three VPC spokes, Cloud Armor, GKE Autopilot + GKE Standard, Cloud SQL via Private Service Connect, BigQuery + Cloud Storage (GCS).
- Data platform. Pub/Sub, Dataflow, Cloud Composer, Dataproc, BigQuery bronze/silver/gold, Cloud Storage (GCS), Cloud KMS, Looker Studio, Vertex AI.
Each sample is backed by a deterministic seed dataset so anyone evaluating GrandLine can see the same output.
7. Drill-down and views
A single tenant has three diagram scopes by default:
- Portfolio view. every account, collapsed to container nodes, showing only cross-account edges (peering, Transit Gateway, Cloud WAN, ExpressRoute, Interconnect). Useful for CIOs and security architects.
- Account / subscription / project view. the full graph for one account. Useful for platform teams.
- Service view. filtered to a single tier of resources (e.g. "all RDS + their consumers"). Useful for specialists.
Every view supports:
- Drill-down. click a container to expand or zoom.
- Filter chips. region, env, owner, criticality.
- Search. native ID, name, tag value.
- Pin / unpin. manual layout overrides that survive across re-layouts (see 4.2).
8. Performance characteristics
Measured on a c7g.xlarge:
| Graph size | ELK layout (ms) | Cytoscape render (ms) | Total p95 |
|---|---|---|---|
| 50 nodes | 18 | 20 | 70 ms |
| 200 nodes | 75 | 60 | 220 ms |
| 500 nodes | 210 | 150 | 500 ms |
| 1000 nodes | 520 | 380 | 1.2 s |
| 2000 nodes (auto-split triggers at 800) | see auto-split | . | . |
These numbers land us comfortably inside the dashboard's "feels responsive" target (< 300 ms for the common case). Above 1000 we rely on auto-split.
9. What we deliberately do NOT do
- We do not hand-draw icons. We use the official AWS / Azure / GCP architecture icon sets (licensed for embedded use in third-party tools). This avoids uncanny-valley pastiches and keeps the diagrams recognisable.
- We do not force a single layout algorithm. For certain scopes (cost-by-service treemap, findings-by-severity heatmap) we switch to non-layered renderers. But the architecture views are all layered + orthogonal.
- We do not round corners on container edges except for the subtle
2pxvisual radius on cards. Architects read diagrams. heavy styling gets in the way.
11. Feedback
If a diagram looks wrong, tell us: [email protected] with a screenshot. Diagrams that look wrong are bugs and we treat them as such.