Skip to content

Overview

This is the first in a series of modules that range from introductory guidance, tips for experienced vocabulary practitioners and lessons in using innovative vocabulary tooling. These modules needn't be approached in series order, but note the step-by-step exercise that continues through out other modules. In summary, the modules are:

Introduction to Vocabularies (this module)

  • Vocabulary types
  • Core properties
  • Exercise - start using VocEdit

Advanced Vocbulary Editing

  • Mapping between vocabularies
  • Additional properties
  • Exercise (continued from Introduction)

Vocabulary Reuse

  • Reuse patterns
  • Importing concept from other vocabularies
  • Exercise (continued from Advanced)

Vocabulary querying

  • SPARQL querying
  • How to query
  • Common vocab queries

Vocabulary systems

  • VocEdit + GitHub (with new exercise)
  • VocExcel
  • SHACL Validator
  • RDF Converter

Vocabulary patterns

  • Handling special cases and advanced tips


  • 💡 Identifies troubleshooting tips, common errors and potential issues.
  • 🚧 Exercises
  • 🎬 Videos

Introduction to Vocabularies

As languages speakers, we have developed the categorization of "things" to both understand and communicate our experience of the world. The sheer volume of data that we interact with has necessitated a shared understanding of naming and definition. In large-scale data holdings, standardisation and disambiguation through vocabularies becomes a necessity.

When sharing information across diverse groups, catalogues or software applications, there is need for a common understanding of exactly what is being referred to. This is where controlled vocabularies provide terminology applied to information, driving agreed understanding of concepts. Vocabularies provide both textual labels and definitions for human understanding, and machine-readable unique identifier for machine processing.

Whatever size and shape, the vocabularies mentioned in these modules are designed for describing data and content. Vocabularies can be used to describe (or catalogue) content in information systems. Vocabularies can also optimise search engines and provide the basis for navigation in information systems, making it easier for users to find content and data.

Vocabulary types

In this section we will introduce some common vocabulary types. By introducing simple and more complex vocabulary examples we will introduce some important vocabulary features.

Glossary

... defining terms

Glossaries are a very common form of vocabulary found in many print and web resources. A glossary is a list of concepts, expressed by natural language terms (we will refer to terms and labels interchangeably) with added definitions.

Glossary Picture

Each concept in a Glossary has at least one label and one definition. Some glossaries include see references that direct a user to a preferred term. This equivalence mapping is a common feature in more complex vocabulary types such as a thesaurus (see below). But first we will look at vocabularies that include hierarchy relationships.

Taxonomies

... a very short history

Taxonomies are vocabularies with hierarchical relationships between concepts. Conventionally, we might say that concept A is broader than concept B, when the all-some rule apples: All B's are A, and some A's are B. For example, all apples are fruit, and some fruit are applies. Therefore, fruit is broader than apples.

Modern taxonomies that are used to organise and retrieve data owe their heritage largely to two disciplines: biology, or the taxonomy of living things, featuring familiar concepts of class, family, genus, species etc..., and financial classifications, where concepts are typically categorised as either function, activity of transaction.

%% Title: **Financial classification**
graph TD
subgraph "Financial classification"
    F[Function]
    T1[Transaction A]
    T2[Transaction B]
    A1[Activity 1]
    A2[Activity 2]
    A3[Activity 3]

    F --> T1
    F --> T2
    T1 --> A1
    T1 --> A2
    T2 --> A3
end
The definition, or meaning of a given term is given, in part, by its relationship to broader and narrower terms. For example, we have a clearer understanding of what crane means if it has a broader relationship with birds (and not construction equipment).


%% Title: Taxonomy of living things
graph TD
subgraph "Taxonomy of living things"
    F[Family]
    G1[Genus A]
    G2[Genus B]
    S1[Species 1]
    S2[Species 2]
    S3[Species 3]

    F --> G1
    F --> G2
    G1 --> S1
    G1 --> S2
    G2 --> S3
end
    classDef green fill:#90ee90,stroke:#333,stroke-width:2px;
class F,G1,G2,S1,S2,S3 green;

We will see below in Vocabularies in the context of knowledge graphs how the broader / narrower relationship between concepts can improve search and extraction functions in data where vocabularies are used to enrich data.


Thesaurus

... a (more) complete picture

The modern retrieval thesaurus combines the structure of a taxonomy with additional non-hierarchical relationships and also synonym control. Thesauri establish hierarchy, association and equivalence between terms. Each can be expressed using the Simple Knowledge Organization System (SKOS) properties skos:broader / skos:narrower; skos:relation; and skos:prefLabel / skos:altLabel (W3C, 2009).

graph TD
    A[Concept A] 
    B[Concept B]
    C[Concept C]
    D[Concept D]
    S[Synonym of A]

    %% Broader / Narrower relationships
    A -- "skos:narrower" --> B
    B -- "skos:broader" --> A

    %% Associative (related) relationship
    A <-- "skos:related" --> C

    %% Synonym relationship (using skos:altLabel)
    A -- "skos:altLabel" --> S


    %% Additional hierarchical relationship for illustration
    C -- "skos:narrower" --> D
    D -- "skos:broader" --> C

💡 Tip: the skos:related property is most useful for relating disparate concepts in deep, complex hierarchies. Use skos:related sparingly - don't relate everything to everything!

We will look at SKOS properties in more detail in the Properties section.

Vocabularies in knowledge graphs

Thought of as an interconnected system of data classes, a knowledge graph may include vocabularies as additional classes that connect with some or all other classes. In a knowledge graph, a vocabulary concept can be modelled as just another instance in the graph.

One function that vocabularies serve is to supplement and fill semantic gaps in data relations. Data objects that are not directly related may still connect through shared vocabulary concepts.

In the diagram below, two datasets are linked through a concept drawn from a vocabulary. The relationship between a class instance and a concept is often a subject relationship — that is, the data is about the concept.

graph LR;
    classDef concept fill:#f9f1a5,stroke:#b59a00;
    classDef default fill:#dae8fc,stroke:#6c8ebf;

    A["Traffic counts dataset"];
    B["Road closures dataset"];
    C["One-way road"]:::concept;

    A -- "subject" --> C;
    B -- "subject" --> C;

Because both datasets reference the same concept, they become indirectly related through the vocabulary.

Vocabulary relationships

Vocabularies often include relationships between concepts. These relationships allow data tagged with different concepts to still be connected.

For example, a vocabulary might define the concept One-way road, with a more specific concept Reversible lane road.

graph LR;
    classDef concept fill:#f9f1a5,stroke:#b59a00;

    A["One-way road"]:::concept;
    B["Reversible lane road"]:::concept;

    A -- "skos:narrower" --> B;

Now imagine that one dataset is tagged with the broader concept, while another dataset uses the narrower one.

graph LR;
    classDef concept fill:#f9f1a5,stroke:#b59a00;
    classDef default fill:#dae8fc,stroke:#6c8ebf;

    D["Traffic counts dataset"];
    E["Road closure dataset"];

    C["One-way road"]:::concept;
    F["Reversible lane road"]:::concept;

    D -- "subject" --> C;
    E -- "subject" --> F;

    C -- "skos:narrower" --> F;

Because Reversible lane road is defined as a type of One-way road, the knowledge graph can infer a relationship between the two datasets.

Even though the datasets were originally tagged with different concepts, the vocabulary hierarchy connects them.

Alternative labels

SKOS also allows concepts to carry alternative labels (skos:altLabel). These capture common synonyms or variant terminology.

graph LR;
    classDef concept fill:#f9f1a5,stroke:#b59a00;

    A["One-way road"]:::concept
    B["single-direction road"]

    A -- "skos:altLabel" --> B

This helps bridge datasets that use different terminology for the same concept. A dataset tagged with single-direction road can still be recognised as referring to the concept One-way road.

Without vocabulary concepts, datasets often remain isolated because they use different terms or different levels of specificity. SKOS vocabularies introduce structured relationships between concepts that allow knowledge graphs to connect these datasets.

Through relationships such as skos:broader, skos:narrower, and skos:altLabel, vocabulary concepts act as semantic bridges, enabling inferences and connections that would not otherwise be possible.

Vocabulary properties

Vocabularies contain, as a minimum: preferred labels, definitions and identifiers. We have already introduced concepts relations with other concepts. In this section we will look at more concept properties, including properties that are required for validation in vocabulary quality standards.

Minimum properties: prefLabel, definition and identifier

To comply with VocPub profile (AGLDWG, n.d.), each concept must have at least:

  • a skos:prefLabel which is the main way that we say and understand the concept;
  • a skos:definition - a short note that describes the concept;
  • an Identifier - a unique way of distinguishing the concept from other concepts

🚧 Exercise: 0pen, edit and save a vocabulary

These modules will include a number of editing exercises that use the VocEdit tool and the Pest Risk Pathway vocabulary (PRP). The PRP is un-published and hosted by KurrawongAI for training purposes. In this exercise we will add a new concept; a concept preferred label; a concept definition; and a concept identifier.

💡 Chrome browser is needed to use the VocEdit tool.

  1. Go to Download TTL
    (Right-click and choose “Save link as...” to download)
  2. Save the file to your local directory
  3. Open Chrome (if not already)
  4. Go to VocEdit
  5. Select Project > Open > Local file
  6. Select pestRiskPath_training.ttl from your local directory
  7. Select Resource > Create new
  8. Resource type > Concept
  9. Add http://example.com/pestRiskPath/
  10. Open a new tab and go to UUID Generator
  11. Copy the UUID
  12. Paste the UUID in the IRI field and after the stem http://example.com/pestRiskPath/. So the full IRI should look be: http://example.com/pestRiskPath/[UUID]
  13. Select Create
  14. Edit > prefLabel > "+" > Literal string with language
  15. Add Wind dispersal
  16. In Lang box, Add* "en"
  17. definition > Add a literal with language
  18. Add Dispersal of pests by wind
  19. In Lang box, Add* "en"
  20. Concept scheme relationships - topConceptOf > Select "+" > IRI
  21. Select a value > select pestRiskPath
  22. Save

The pestRiskPathway.ttl will now be updated in your local directory, with the new concept Wind dispersal added.

Broader / Narrower

We have already introduced the skos:broader and skos:narrower relationships in the sections on taxonomies and thesaurus vocabularies.

Depending on the type and complexity of a vocabulary, there may be a requirement that all concepts are related to another concept via skos:broader property. In a taxonomy or thesaurus vocabulary project, a concept that does not have a skos:broader concept may be considered an orphan, unless it is a top concept, indicated with the skos:topConceptOf property. The SKOS standard does not require concepts to be arranged in a hierarchy. Some vocabularies will be mostly flat with only selected concepts in narrower relationships to broader concepts.

If a skos:Concept does not have a skos:broader property, the VocPub profile requires that it must reference the relevant skos:ConceptScheme IRI with the skos:topConceptOf property.

Tip: Broader and narrower relationships are reciprocal - that is, if A is broader than B, then B is narrower than A. For example:

  • Dynamic land cover skos:broader Land cover and land use
  • Land cover and land use skos:narrower Dynamic land cover

  • Apples skos:broader Pomme fruit

  • Pomme fruit skos:narrower Apples

  • Hospitals skos:narrower Private hospitals

  • Private hospitals skos:broader Hospitals

Arranging concepts into a hierarchy supports discovery via:

  • Search expansion - a system can expand results by matching any narrower concepts of a search term, e.g. a search for Granitoid returns resources about granitoid OR granite
  • Navigation - top-down navigation or breadcrumb links can be launched in an interface using broader / narrower relationships. For example, clicking on Pomme fruit launches a list of links to apples, pears and quinces

In a vocabulary, it's possible to keep adding narrower relationships by creating more and more specific concepts. For example, a catalogue that is about horticulture probably needs a vocabulary with more specific (narrower) concepts than just apples (e.g. Kiku Fuji).

💡 Only add narrower concepts that you would expect to be used to describe content in a catalogue, and distinguish that content from others, with that concept. Don't make a vocabulary hierarchy very deep with specific concepts just because you can!

🚧 Exercise: add broader concept relations

In this exercise we will add a skos:broader relationship between two concepts. Note that once a concept has a broader relationship, it is no longer indicated by skos:topConceptOf and and 'top concept' status is removed.

  1. Go to VocEdit in Chrome
  2. Project > Open pestRiskPath_training.ttl from your local directory
  3. Select Spore dispersal from the left-hand list of concepts
  4. Concept relationships > Broader > Add a new value > IRI
  5. From the Select a value dropdown, search for or select Host plants > select
  6. Save

This change optimises the SKOS model by applying a broader relationship between concepts that are conceptually broader and narrower. In a retrieval system we might expect a query for resources about host plants as pest vectors to return a resource about Spore dispersal. The skos:broader relation support such an inference.

Alternative labels

Each concept must have at least one Preferred label (skos:prefLabel), based on the word or phrase that best describes the concept. We often use different terms to mean the same thing - the skos:prefLabel should be the term that is used most frequently, or understood and used by most expected users of a system or catalogue.

In addition, each concept may have one ore more Alternate labels (skos:altLabel). It's a good idea to add one or more altLabel to a concept so that it can be found in different ways. A concept can have any number of alternate labels, provided they are similar enough to the common understanding of the concept.

💡 Tip: when adding a skos:altLabel, ask this question: If I searched with a preferred label, and found some information matching an alternative label in the text, would I be satisfied by the search result?

Here are some common scenarios where we might need to choose between preferred and alternative labels:

Common vs Scientific terms

Connect scientific or technical names with common names. For example:

  • Red imported fire ant skos:altLabel Solenopsis invicta
  • Boghead Coal skos:altLabel Torbanite
  • Spore dispersal skos:altLabel Sporulation

Superseded terms

Even if a term is no longer used in recent content, users may still search a catalogue using superseded language. Storing superseded terms as alternative labels helps to group content that contains antiquated language with content written in current language. For example:

  • Aeolian Sand skos:altLabel Eskimo Sand
  • Utility hole skos:altLabel Manhole

Acronyms vs phrases

In general, an acronym or initialism should be managed as an skos:altLabel; example:

  • Greenhouse gasses skos:altLabel GHG

An exception is when the acronym is better known or more frequently used. For example:

  • TNT skos:altLabel Trinitrotoluene
  • CSIRO skos:altLabel Commonwealth Scientific and Industrial Research Organisation

Official vs common language

Use an altLabel to connect official or technical language with natural language. For example:

  • Bi-directional skos:altLabel Two way
  • Alcohol-impaired driving skos:altLabel Drink-driving

🚧 Exercise: add alternative labels

In this exercise we will add an alternative label to a concept.

💡 Tip: You will need to first add the skos:altLabel property to VocEdit as it is not required by VocPub.

  1. Go to VocEdit in Chrome
  2. Project > Open pestRiskPath_training.ttl from your local directory
  3. Select Spore dispersal from the left-hand list of concepts
  4. Other Properties > Add property
  5. Add http://www.w3.org/2004/02/skos/core#altLabel > Add property
  6. in the altLabel field you just created > "+" > Add new value > Literal with language
  7. Add Sporulation
  8. Add "en" to lang field
  9. Save

Top Concepts

If a skos:Concept does not have a skos:narrower relationship, it is automatically assumed to be a skos:topConceptOf a skos:ConceptScheme and must be declared as such.

A concept may be moved out of the

Concept Scheme

A Concept Scheme is some metadata about the vocabulary as a whole - the vocabulary title (skos:prefLabel), a definition (skos:definition), and a unique identifier are minimum requirements. All vocabularies must have a Concept Scheme, and it should include:

  • an Identifier - create an IRI following the same pattern as the IRIs for concepts. For the suffix, instead of a concept ID, add a Concept scheme ID. This may be the name of the Concept scheme (the vocabulary), e.g.: - https://linked.data.gov.au/def/road-types ... where Road types is the name of the concept scheme.

  • a Preferred label - the same property that is used for a Concept. Use a Preferred label for the name or title of the vocabulary (this may also be used for the Concept Scheme ID)

  • a Definition - a definition of the Concept Scheme. Use plain text only but paragraphs may be separated by newlines. Also used for Concepts
  • a Created date. When the Concept Scheme was first created. This might be automatically created by a vocabulary editor
  • a History note - a note on the origin or history of a vocabulary - such as how or from what it was generated.

🚧 Exercise: edit a concept scheme

We will continue to edit the Pest Risk Pathway vocabulary, but this time we will edit the concept scheme which is the metadata about the vocabulary as a whole.

  1. Go to VocEdit in Chrome
  2. Project > Open pestRiskPath.ttl from your local directory
  3. Select Pest Risk Pathway from under Vocabularies in the left-hand panel
  4. Annotations > definition > "+" > Add a new value > literal with language
  5. Add A vocabulary describing various structures, modes and activities that introduce unwanted pests, weeds and diseases.
  6. Add "en" to lang field
  7. Save

Summary

In this module we have introduced vocabularies - different types and how they are useful. We have also used a vocabulary editing tool to create the minimum elements for a concept and a concept scheme.

References and Further Reading