almost minimal HTML5

Content Elements

Use document semantics to structure your content and help machines find it.

main: Specifies the main content of this document (web page).; The main element should not be hidden and must be directly reachable by assistive technologies.; The content inside the main element should be unique to the document.; There must not be more than one main element per document.; It should not contain any repeated content that appears across documents, such as sidebars, navigation links, copyright or imprint information, site logos, or search forms.; The main element must not be a descendant of an article, aside, footer, header, or nav element.; Use the main tag with attribute id="main-content" for skip-links.
section: A section is a thematic grouping of content, typically with at least one heading.; The meaningful content of a website can usually be divided into sections for introduction, content, conclusion, interpretation, and further topic-specific information.; If you cannot give a section a heading, it probably should not be a section.
article: The article tag specifies independent, self-contained content.; An article should make sense on its own and it should be possible to distribute it independently from the rest of the site.; Potential sources for the article element are forum post, blog post, news story, press release, release note, whitepaper, or additional documentation.; An article can contain its own header, footer, sections, and even nested articles.
aside: The aside tag defines content aside from the main content it is placed in.; The aside content may be removed without making the main content incomprehensible.; The aside content should be indirectly related to the surrounding content.; Aside content is often placed as a sidebar in a document.; Typical use cases include notes, definitions, related references, warnings, or contextual information.

Site & Layout Elements

Use landmark semantics for usability and convenience for your audience.

header: Represents a container for introductory content or a set of navigational links, grouped in nav sections.; The header element typically contains one or more of heading elements, logo, site navigation, icons, tagline, language switch, or authorship information.; A header does not have to be at the top of the page; it can introduce its nearest ancestor sectioning content like an article or section.; Do not mix up with the head element, which is an invisible container for metadata between the HTML and body tags.; itemscope: WPHeader supported.
nav: The nav tag is often inside a header or footer section and more than one nav block is allowed.; Defines a major block of navigation links to other pages or parts within the page, e.g., site navigation (aria-label="Primary navigation") or table of contents (aria-label="Table of contents").; Not all links in a document must be inside a nav element.; Screen readers can omit this element and jump directly to the main section.; itemscope: SiteNavigationElement supported.
footer: A footer contains information about its nearest sectioning ancestor and provides additional metadata or navigational aids.; Common content, embedded in elements such as nav or address, includes authorship information, copyright, contact information, sitemap, back to top links, and related documents.; The footer element itself has no dedicated schema.org type, but commonly contains structured data such as ContactPoint, SiteNavigationElement, or author metadata.

Semantic Elements

Agent-ready, embedded accessible structured data management

Overall, JSON-LD is the widely preferred way for embedding structured data in HTML5 documents, particularly by search engines.

Standards and Organizations

Standards

Use UTF-8 encoding, add valid charset and language attributes.

Read and follow (RFC) standards and many more like W3C, WHATWG, .

Schema.org

Schema.org is a collaborative, community activity with a mission to create, maintain, and promote schema for structured data.

Schema.org is good for SEO purposes, however, search engines still prefer flat structures and few nested relationships, even if these are standard-compliant. Moreover, Schema.org seams limited to real world description and is focused on events, articles, products, reviews and recipes (but only as instruction) and few other things.

Getting started with Schema.org
Google's way using structured data.

RDF and OWL

RDF is a data-modelling vocabulary and can be embedded in HTML5 using RDFa 1.1 but JSON-LD is often preferred due to easier integration and separation from markup.

OWL is a family of knowledge representation languages for authoring ontologies.

SUMO is a free (GPLv3) formal ontology owned by the IEEE.

DCMI, known as Dublin Core vocabulary, is one of the oldest (1995) general purpose metadata vocabulary for describing resources of any type.

Wikidata is a free and open knowledge base for structured data and part of the Wikimedia Foundation that can be read and edited by both humans and machines.

Turtle, a subset of Notation 3 and equal to the query language SPARQL, is a language, MIME-type and file format (.ttl) for storing serialized graphs of RDF Data (along with N-Triples, JSON-LD and RDF/XML). It's widely used (e.g. Jena, RDFLib) because it is a user-friendly alternative for RDF/XML.

SHACL is validating RDF graphs against a set of conditions. This helps for a variety of purposes beside validation, including user interface building, code generation and data integration

SKOS is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary.

UMBEL (retired 2019) is a logically organized knowledge graph, written in OWL.

Rights Expression Language (REL) are a machine-processable language used to express intellectual property rights.

ccREL is a specification describing how license information may be described using RDF and how license information may be attached to works. (easy, common, works with Schema.org)
ODRL is a policy expression language, which became an endorsed W3C Recommendation 2018 (complex, expressive, for business applications, uses RDF/JSON/JSON-LD).
PROV-O is a W3C recommendation and provides a set of classes, properties, and restrictions that can be used to represent and interchange provenance information.
liblicense Bulid license-aware applications.
License RDF Creative Commons provides information on licenses for three audiences: lawyers, humans and machines.
Rights Statements from Dublin Core provide simple and standardized terms not a REL, but sometimes good enough.

Other Domain-specific Ontologies:

BIBO: is a independent RDF ontology that builds upon Dublin Core terms. It can describe bibliographic things like books or magazines.
DCAT: RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web.
SIOC: aims to enable the integration of online community information (Social Web) in conjunction with the FOAF vocabulary.

ARIA

WAI-ARIA adds semantic meaning only when native HTML semantics are insufficient and with focus on accessibility.

First rule of ARIA: Do not use ARIA if native HTML can express the same meaning.

FOAF

FOAF (Friend of a Friend) Uses RDF vocabularies to describe people and their relationships, which may include personally identifiable data.

It works with the similar social idea of 'knowing through relationships' like web-of-trust does, but FOAF describes only social closeness, not trustworthiness.

Validation Tools

Use linter and validation:

Rethink

Disclose as much as necessary and as little as possible.
A separation of concerns, data minimization, explicit and conscious semantics are more important than expressive power.

JSON-LD

A W3C Recommendation using the RDF models for serialized linked data.

Structured Data Formats

JSON-LD is the supported standard way to embed structured data into HTML5 documents.

RDFS is a minimal-ontologie and provides a data-modelling vocabulary for RDF data. It is also a supported standard way to embed structured data into HTML5 documents (RDFa, JSON-LD).

Unlike Microdata, JSON-LD does not support multiple role assignments on a single property. Each role - such as author, editor, reviewer, or publisher - must be expressed as its own property, even when they all point to the same person. While this increases verbosity, JSON-LD keeps HTML markup untouched by encapsulating all structured data in a single script element.

JSON-LD Standard is preferred by search engines to describe structured data without modifying visible HTML, but search engines prefer semantic data to be as flat as a cow pat.

see application/ld+json example:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@id": "https://example.com/work/almost-minimal-html5",
  "@type": "CreativeWork",
  "name": "almost minimal HTML5",
  "dateCreated": "2026-01",
  "datePublished": "2026-01",

  "author": {
    "@type": "Person",
    "@id": "https://example.com/person/max-mustermann",
    "name": "Max Mustermann"
  },

  "editor": {
    "@type": "Person",
    "@id": "https://example.com/person/max-mustermann",
    "name": "Max Mustermann"
  },

  "reviewer": {
    "@type": "Person",
    "@id": "https://example.com/person/max-mustermann",
    "name": "Max Mustermann"
  },

  "publisher": {
    "@type": "Person",
    "@id": "https://example.com/person/max-mustermann",
    "name": "Max Mustermann"
  }
}
</script>

Microdata

Microdata is part of the HTML-specification using attributes as data container.
But microdata tightly couples semantics to markup, which can make refactoring and reuse harder in larger projects.

itemscope / itemtype

An itemscope defines the boundary of a new semantic object, it's scope, e.g. a div or article container or any other suitable.

An itemtype specifies the meaning of this created container, classifies that object by assigning a concrete type, usually from schema.org or any other ontology.

Itemscope and itemtype only make sense when used together:
<div itemscope itemtype="https://schema.org/Person">...</div>

Any element with itemscope creates a new scope. Nested microdata itemscopes are allowed and indicate a relation to the nearest ancestor item.

itemprop / itemid

An itemprop attribute only make sense within an itemscope.

Then every HTML element may have an itemprop to add one or more properties to one or more items:
<div itemprop="author" ...>Author data</div>

If specified, it must have at least one value or an unordered set of unique space-separated tokens:
itemprop="author editor reviewer publisher"

The itemid provides a global identifier (URL) for this item. Useful to link the same entity across documents.

<div itemid="https://example.com/person/max" ...>Max data</div>

see HTML microdata example:

<article
  itemscope itemtype="https://schema.org/CreativeWork"
  itemid="https://example.com/work/almost-minimal-html5">
  <h1 itemprop="name">almost minimal HTML5</h1>
    <p><time itemprop="dateCreated datePublished" datetime="2026-01">Created and published in January 2026</time></p>
  <div
    itemscope itemtype="https://schema.org/Person"
    itemprop="author editor reviewer publisher"
    itemid="https://example.com/person/max-mustermann">
    <span itemprop="name">Max Mustermann</span>
  </div>
</article>

HTML Elements with Semantic Flavour

If an element can convey meaning rather than merely change appearance, use it.

Content Structure

headlines

Of course, headlines! Best practice: only one visible h1 headline per page. Clean structure in main section, starting with h2.

Do not skip heading levels — start with h1, max to h6.

Headings are important for text structure and are used to generate automatic tables of contents.

Headings are used by screen readers for navigation and by search engines to assess content importance.

p

Every paragraph is considered a self-contained unit. It indicates that this is a separate section of thought.

Remember: Within a paragraph there are no block elements allowed, like div, section, article, nav, footer, details, ol, li, ul, table, form, fieldset, button, label, iframe, and further p elements. (This list is not complete).

Only phrasing content elements are allowed within p blocks, like

Text and Emphasis: span, em, strong, b, i, mark, small, sub, sup
Links & Metadata: a, abbr, cite, dfn, time, data
Code & technical distinction: code, kbd, samp, var
Structure within a line: br, wbr, bdi, bdo
Inline media: img, svg, canvas, picture
Quotes, measurements, and special cases: q, ruby, rt, rp, meter, progress, output, template

dl, dt, dd

A description list (dl), with a term/name (dt) and the descriptions (dd).

After a dd, paragraphs, line breaks, images, links, lists, etc. are allowed.

ol, ul, li

ol: ordered and ul: unordered list items (li) are very semantic, too.

dfn

The dfn tag stands for the "definition element" and specifies a term that is being defined.

The nearest parent of the dfn tag should contain the definition/explanation. This has various possibilities, e.g.

<span><dfn>HTML</dfn>HyperText Markup Language</span>
Content of surrounding element: HTML HyperText Markup Language.
<dfn title="HyperText Markup Language">HTML</dfn>
The title attribute works as tooltip, but needs often mouse-over: HTML.
<dfn><abbr title="HyperText Markup Language">HTML</abbr></dfn>
Better safe than sorry with an abbr inside dfn: HTML.
<dfn id="def-html">HTML</dfn>
Link with an anchor to the id: HTML.

Special HTML Elements

meta: Meta data within the head element can transports a lot of semantic information.
address: Contact information for the nearest article or the whole document.; Typically used for author or organization contact details, not for general postal addresses.
form: HTML5 semantic role="search" can be applied.; Better: <nav aria-label="Search"><form></form></nav>; Use label for accessibility.; Interaction with the audience via forms.
figure & figcaption: The figure element represents self-contained content, often referenced from the main text. figcaption provides a caption or legend for that content.; Sunset in Ρόδος; Typical usage: images, charts, diagrams, code snippets, tables.; Screen readers announce figure and read the figcaption as a description.; itemscope: CreativeWork or subtype like ImageObject.
details & summary: These two elements represents a disclosure widget that woks without javascript:
You can't pull the wool
over my eyes.; The summary element provides the heading or label for the details content. It needs to be the very next element after details to work, like <details><summary>wool</summary>eyes</details>.; Typical usage: FAQ items, technical information blocks, notes or hints, optional sections.; Provides accessible toggle behavior automatically; screen readers announce expandable content.; itemscope: FAQPage (multiple FAQs) or CreativeWork (single block).
data: Machine-readable translation of content.; Use data when meaning matters, but presentation should remain silent.; Provides both machine-readable and human-readable values.; The value attribute can contain a hierarchical, stable semantic identifier, e.g., value="content.section.introduction".; If the content is time- or date-related, use the time element instead.
time: The datetime attribute represents a machine-readable format (YYYY-MM-DDThh:mm:ss+TZ).; MEZ Example: 3rd January 2026, 20:15 (datetime="2026-01-03T20:15:00+01:00")
Global and Event Attributes: All HTML tags also support Global Attributes and many support Event Attributes.
See also the Attribute Reference for inspiration.

Text Emphasis

Block Elements

blockquote: Indicate as 'q' a section with quoted text from another source.; Blockquote can specify the source via cite="URL".; blockquote:
for longer, multi-line passages.
pre: Defines preformatted text. It is displayed as in a text editor.; HTML tags are still rendered, therefore escape '<' (lower than) and '>' (greater than) with HTML entity: '< tag >'

Inline Elements

em vs. strong: em: nuance, content emphasis; strong: important. The tags b: bold and i: italic have only visual effects, no semantic meaning.
small: Defines smaller text, e.g., copyright, disclaimer, or side-comments.
mark: Highlights content that should stand out, e.g. search results hits: marked text.
cite: Represents the title of a cited creative work (book, poem, song, movie, painting, sculpture, etc.), never a person’s name.; Use cite for references to other works, not for the work being described.; Rendered typically in italics: cite element.
code: Represents computer code or technical tokens. Displayed in monospace font <code> is inline allowed.; Often used for multiline code sections this way: <pre><code>some multiline source code</code></pre> is because of pre not inline allowed.; Screen readers announce it as ‘code’.
q: Indicate as 'blockquote' a section with quoted text from another source.; q: for short inline quotations, browser adds quotation marks automatically.
abbr: Defines an abbreviation or acronym: HTML, CSS.; The attribute title="abbr written out in full" is displayed when hoovered.; Does often not work in mobile phones or touch screens when there is no mouse.
kbd: Keyboard input or shortcuts, e.g., press 'Ctrl + C'.
samp: Program output: 404 Not Found.
var: Variable / placeholder: E = m * c².
ins and del: Defines text inserted or deleted in a document.; Browsers usually underline inserted or strike through ~~deleted~~ text.