Content from Introduction


Last updated on 2026-04-14 | Edit this page

Estimated time: 15 minutes

Overview

Questions

  • Does FAIR data mean open data?
  • What are digital objects and persistent identifiers?
  • What kinds of persistent identifiers are commonly used?

Objectives

  • Understand that FAIR does not simply mean open.
  • Explain the difference between human-readable and machine-friendly digital objects.
  • Recognize DOI as one type of PID used to identify digital objects.

Does FAIR data mean open data?


Callout

FAIR is not identical to open

FAIR means that data and related research objects are designed to be easy for humans and machines to find, understand, access, and reuse. Some FAIR data can also be open, but openness and FAIRness are not the same thing.

Diagram connecting FAIR and Open Science concepts
FAIR and Open Science
Discussion

Human-readable vs machine-friendly

Human-readable content is easy for a person to inspect directly, but that does not guarantee that software can parse and reuse it automatically. Machine readable content uses explicit structure, such as CSV, JSON, or XML, so a computer can interpret relationships and values without guessing.

During this lesson, “machine-friendly” is used broadly to mean material that is machine-readable and also supports further automated action and interoperability.

Illustration about machine-friendly digital objects
Machine-friendly digital objects

What are digital objects and persistent identifiers?


A digital object is a sequence of bits stored in digital memory that has informational value on its own. Examples include:

  • a scientific publication
  • a dataset
  • a rich metadata record
  • a README file describing access and reuse conditions

A persistent identifier, or PID, is a durable reference to a digital or physical resource. PIDs are backed by technical infrastructure and governance arrangements that help them continue resolving even when the resource itself changes location.

Common uses of PIDs include identifying:

  • articles, datasets, and software
  • researchers
  • organizations and funders
  • projects and instruments
  • physical samples and media objects

DOI is one well-known PID type and is commonly used for datasets and publications. ORCID is another example, focused on researcher identity.

For a short explainer, see the FREYA project video on the importance of PIDs.

Challenge

Exercise

Visit the recent machine learning submissions on arXiv.

Pick a paper, open its PDF, and search for http or doi.

What kinds of links do the authors use for data or software references, and why is a DOI usually more robust than a personal website or repository link alone?

Authors often link to GitHub repositories, project pages, or other web pages for software and data. Those locations may move or disappear over time. A DOI is more robust because it resolves through persistent infrastructure and points to current metadata about the object even if the storage location changes.

Discussion

Reflection

How does your discipline usually share data? Is there a data journal, domain repository, or another community-specific mechanism for making research outputs findable and reusable?

Key Points
  • FAIR means making research objects more usable by humans and machines, not automatically making them open.
  • Machine-friendly objects are structured so software can interpret and reuse them reliably.
  • Digital objects include datasets, publications, metadata records, and related documentation.
  • DOI is a common PID used for datasets and publications.

Content from Set up your own terms


Last updated on 2026-04-14 | Edit this page

Estimated time: 25 minutes

Overview

Questions

  • What are data terms of use?
  • What should a data terms of use statement contain?
  • What format should terms of use use?
  • What standard licenses are available for data?

Objectives

  • Understand what data terms of use are and why they matter.
  • Identify the minimum components of a basic terms of use statement.
  • Recognize when a standard license is sufficient and when a custom agreement is needed.
Callout

FAIR principles used in data terms of use

Accessible:

Reusable:

What are data terms of use?


Data terms of use are a textual statement that sets out the rules, conditions, licenses, and legal considerations that govern reuse of a data source.

Screenshot of a data terms of use page
Terms of use example from the World Bank

Examples:

Screenshot of another terms of use page
Terms of use example from Numbeo

These examples show that terms of use usually describe the resource, the conditions under which it may be reused, and any expectations around attribution or restrictions.

What must a terms of use statement contain?


As a minimum, a data terms of use statement should cover the following elements:

Section Description Example
Description What the statement refers to and which digital objects it covers “These terms apply to the Happy Dataset.”
License Under which conditions reuse is allowed “The Happy Dataset is in the public domain.”
Attribution How the data should be cited or acknowledged “Please cite the Happy Dataset.”
Disclaimer Important limitations or caveats “The last 100 records may contain selection bias.”

Depending on the context, the statement may need additional clauses for multiple databases, sensitive data, embargoes, or obligations coming from a larger funded project.

An example of a broader policy framework is the FAIRsharing record for the 1958 Birth Cohort policy:

What format should terms of use use?


Terms of use should be stored as plain text in a machine-friendly format such as .txt, .md, or .html. The exact length and level of detail will vary by project, but the statement should be easy for people to read and for systems to preserve.

Callout

Keep the statement in an accessible text format

You can draft terms of use in almost any editor, but the final version should be stored in a format that does not depend on proprietary software to read it.

Many projects place the statement in a README, LICENSE, or similar documentation file.

Challenge

Exercise

Visit the City of Philadelphia terms-of-use file:

https://github.com/CityOfPhiladelphia/terms-of-use/blob/master/LICENSE.md

Answer the following:

  • What kind of resource is it about?
  • In what format is the statement written?
  • On which platform is it published?
  • It is a terms-of-use style statement published for city-maintained digital resources.
  • The file format is Markdown (.md).
  • It is published on GitHub.

The statement itself often lives next to the data documentation:

Folder structure containing a README or license file
Folder layout showing terms of use documentation

Some repositories let you define tailored reuse conditions directly on the platform. Dataverse, for example, defaults to a CC0 waiver but also allows custom terms after dataset creation.

Repository interface showing license and terms options
Terms and license settings in a repository interface
Challenge

Exercise

Is it possible to edit terms of use in DataverseNL?

Check the DataverseNL documentation or FAQ and find the answer.

Yes. After creating a dataset, you can go to the Terms tab, choose not to apply CC0, and provide your own terms and conditions.

Useful reference:

Are there standard licenses we can pick from?


Two commonly used licensing families for data are:

Creative Commons licenses are easy to understand and widely recognized, even if they were not designed only for data. Open Data Commons licenses are more explicitly data-oriented.

Mark Meaning
BY Creator must be credited
SA Derivatives or redistributions must use the same license
NC Only non-commercial uses are allowed
ND No derivatives are allowed
Challenge

Exercise

Choose a Creative Commons license for simulation data under these conditions:

  • others cannot modify the work
  • commercial reuse is allowed

Which license fits?

Attribution-NoDerivatives 4.0 International, or CC BY-ND 4.0.

Discussion

Scenario

You are collaborating on a study of quality of life in children and plan to collect potentially sensitive information about bullying, social media, and family structure.

What should be considered when drafting terms of use for this study? Should the statement be drafted only by the researchers, or should legal and governance support be involved?

Key Points
  • A data terms of use statement defines the legal and practical basis for reuse.
  • A license is the minimum requirement, but some projects need richer terms or a custom agreement.
  • Store terms of use in an accessible text format such as .md or .txt.
  • If a standard license does not fit the project, a tailored terms-of-use statement or usage agreement may be necessary.

Content from Speak the same language


Last updated on 2026-04-14 | Edit this page

Estimated time: 25 minutes

Overview

Questions

  • What are data descriptions?
  • How can data descriptions be reused?
  • Are there standard ways to create them?
  • What is the relationship between data descriptions and linked data?

Objectives

  • Recognize that different communities use different names for data descriptions.
  • Learn how to build machine-friendly data description files.
  • Understand why reusing ontology terms improves interoperability.
Callout

FAIR principles used in data descriptions

Interoperable:

Reusable:

What are data descriptions?


Data descriptions document the meaning of each data attribute or variable in a dataset.

Depending on the discipline, similar documents may be called:

  • codebooks
  • data dictionaries
  • labels or data tags
  • data glossaries

Example:

Variable name Description Scale
Weight Weight of a human kilograms
Height Height of a human centimetres
Age Age of a human years
Blood glucose level Blood glucose level of a human mg/dl
Callout

Different fields use different names

No matter what the document is called, the goal is the same: describe what the dataset variables mean so others can interpret and reuse the data correctly.

How can data descriptions be reused?


Writing documentation takes time, so when possible you should reuse accepted community definitions instead of inventing a new description every time.

BioPortal is a good example of a registry where community-maintained ontology terms can be searched and reused:

BioPortal search results interfaceDifferent ontology matches for the same conceptOntology entry with persistent identifier

Callout

Reuse ontology terms when possible

Advantages:

  • you do not need to redefine common concepts repeatedly
  • you gain a persistent identifier for the concept
  • other systems can align variables across datasets more easily

Disadvantages:

  • sometimes no existing ontology fits the exact concept you need

Are there standard ways to create data descriptions?


There is no single universal format, but the minimum useful description usually includes:

  • the variable name
  • a definition
  • ideally, a link to the reused ontology term or community definition

Some general-purpose vocabularies and metadata schemas include:

Resource Link Use
Schema.org https://schema.org/ Generic web concepts
DBpedia https://www.dbpedia.org/resources/lookup/ Concepts derived from Wikipedia
DCAT https://www.w3.org/TR/vocab-dcat-2/ Data catalog concepts
Dublin Core https://www.dublincore.org/specifications/dublin-core/dcmi-terms/ General metadata terms

Public registries that help locate vocabularies include:

Challenge

Exercise

Visit BioPortal and search for a term describing blood glucose level.

What ontology term would you reuse, and what persistent identifier does it provide?

There is more than one possible match, but a valid answer is to identify a term such as the clinical measurement concept and record its ontology URI or other persistent identifier provided by the registry.

Data descriptions are often maintained as tables in .csv, .xlsx, or similar formats. For databases, it is also useful to provide both a machine-readable schema and a human-readable diagram.

Illustration comparing data and data dictionary concepts
Example of a data dictionary table

Other human-friendly approaches include dataset nutrition labels and automated codebook tools, but the FAIR principles push further toward structured, machine-actionable formats.

What is the relationship between data descriptions and linked data?


When data descriptions reuse ontology terms and stable identifiers, datasets can be transformed into linked data formats such as RDF and become easier to combine with other resources.

Tools that can help with this include:

Tool Source GUI Note
OpenRefine https://openrefine.org/ Yes Flexible but can be heavy to install
RMLMapper https://github.com/RMLio/rmlmapper-java/releases No Powerful, but technical
SDM-RDFizer https://github.com/SDM-TIB/SDM-RDFizer No Requires programming familiarity
SPARQL-Generate https://ci.mines-stetienne.fr/sparql-generate/ Yes Good if you want to learn SPARQL
Virtuoso Universal Server https://virtuoso.openlinksw.com/ Yes Commercial licensing may apply
UM LDWizard https://github.com/MaastrichtU-IDS/ldwizard-humanities Yes Quick route to publishable linked data
Challenge

Exercise

Transform a dataset from XLSX to RDF using UM LDWizard:

  1. Download the mock dataset: MOCK DATA
  2. Convert the dataset into RDF
  3. Record which ontology terms you reused for the variables

There is no single correct answer. The important part is to reuse existing ontology terms where possible and document the choices you made.

Discussion

Scenario

Your marine biology group has discovered new organisms, but you cannot find an existing ontology that fits the concepts you need.

What should you do? Should you adapt an existing vocabulary, create local terms, or work with the community toward a new ontology extension?

Key Points
  • Data descriptions may appear under names such as codebook or data dictionary.
  • Reusing ontology terms reduces ambiguity and improves interoperability.
  • A useful description links dataset variables to accepted community concepts.
  • Linked data becomes more feasible when descriptions are structured and identifier-based.

Content from Securely share


Last updated on 2026-04-14 | Edit this page

Estimated time: 25 minutes

Overview

Questions

  • What are data access protocols?
  • Is open access a data access protocol?
  • Can data be exposed as a service through FAIR API protocols?

Objectives

  • Learn what data access protocols are.
  • Distinguish between human-facing and machine-facing access instructions.
  • Explore how FAIR API protocols can expose data as a service.
Callout

FAIR principles used in data access protocols

Accessible:

Interoperable:

What are data access protocols?


Data access protocols are the explicit rules and steps that tell humans or machines how to access a data source.

Electronic access control system with keypad
Electronic access control system

Just as you need the correct key, code, or procedure to enter a secure room, data access often depends on following the correct process.

Access protocol Example Note
Communication between machines HTTP request or API call A machine requests information using a standard network protocol
Communication between humans Data request form or email workflow A person follows explicit instructions to request access

Example of a human-oriented access workflowScreenshot of a human-facing data request form

Callout

Protocols are rules communicated in a shared language

Humans may use a written access policy or form, while machines use APIs and network protocols such as HTTP. Common API styles include SOAP, REST, and SPARQL.

Is open access a data access protocol?


Strictly speaking, open access is a policy framework rather than a network protocol, but in practice it functions as an access model with explicit rules: the data can be retrieved openly without individual authorization.

Depositing data in a public repository often gives the resource an open-access access route automatically. Some open datasets also expose machine-readable access through APIs or SPARQL endpoints.

Example:

Screenshot of a SPARQL endpoint example
EU public data SPARQL example
Challenge

Exercise

Visit the Zenodo COVID-19 community:

https://zenodo.org/communities/covid-19/

What is the default data access protocol visible for the displayed records?

The records are presented as open access resources, which is visible through the repository interface and download behavior.

Callout

Open access can be human-friendly and machine-friendly

Humans see direct download options. Machines see stable web requests and, in some systems, structured service endpoints.

Can data be exposed as a service through FAIR API protocols?


Yes, but it requires technical infrastructure and some familiarity with linked data or knowledge graph tooling.

Useful tools include:

Tool Source GUI Note
TriplyDB https://triplydb.com/ Yes Quick hosted option for exposing data
GraphDB https://www.ontotext.com/products/graphdb/ Yes Requires a server or managed deployment
FAIR Data Point https://github.com/fair-data/fairdatapoint No Powerful but technical
rdflib-endpoint https://pypi.org/project/rdflib-endpoint/ No Fast local route for experimentation
Callout

Register FAIR APIs where others can find them

If you expose a FAIR API or similar service endpoint, register it in an appropriate service registry such as SMART API:

https://www.smart-api.info/

Challenge

Exercise

Expose RDF data through a service endpoint:

  1. Reuse the RDF data generated from the data descriptions episode, or bring another RDF file.
  2. Install rdflib-endpoint.
  3. Run a local service and confirm that the data can be queried.

This is intentionally optional. The goal is to understand what it takes to move from static downloadable data to a service-oriented access model.

Discussion

Scenario

You have sensitive financial information that cannot be openly disclosed, but you still want to enable legitimate research access.

What kind of human and machine access protocols would you design? What should be openly visible, and what should only be available behind review or controlled authorization?

Key Points
  • Data access protocols describe how humans and machines gain access to data.
  • Open access is an access model that can still be expressed through explicit rules and interfaces.
  • Human-facing request forms and machine-facing APIs are both important access patterns in FAIR data practice.
  • FAIR service endpoints are more useful when they are registered and discoverable.

Content from Publish and preserve


Last updated on 2026-04-14 | Edit this page

Estimated time: 25 minutes

Overview

Questions

  • What is data archiving?
  • What are data repositories?
  • What is a DOI, and why is it important?

Objectives

  • Understand the role of trusted repositories in long-term preservation.
  • Learn why repositories improve discovery and citation.
  • Understand how DOI supports persistence and citation.
Callout

FAIR principles used in data archiving

Findable:

Accessible:

What is data archiving?


Data archiving is the long-term preservation of research data and related digital objects.

Funders and journals increasingly expect enough of the data and methods to be preserved so that results can be inspected, understood, and, where possible, replicated.

What are data repositories?


Data repositories are storage locations for digital objects such as datasets, code, supplementary files, and metadata.

Repositories make research outputs easier to find and can also provide:

  • preservation
  • backup
  • citation infrastructure
  • versioning
  • controlled access options

Examples:

Repository About
DataverseNL Community repository supporting Dutch universities and research centers
4TU Data Repository originally developed by Dutch technical universities
PANGAEA Domain repository for Earth and environmental sciences
Figshare General-purpose repository for many digital object types
Callout

General repository recommendations

Aim for a community repository when it exists. If not, a trusted general repository is usually better than leaving data only on a local drive or project website.

Typical strengths of general-purpose repositories include:

  • persistent identifiers
  • rapid publication
  • versioning
  • backup and preservation
  • usage statistics
  • open or restricted access modes
Comparison chart of repository options
Repository comparison matrix

Additional registries that help identify suitable repositories:

For context on the original Dataverse model, see Harvard Dataverse:

Challenge

Exercise

Visit https://dataverse.org/ and inspect the current installation map and the DataverseNL portal.

Answer:

  • How many Dataverse installations are listed now?
  • How many are in the Netherlands?
  • Roughly how many datasets are visible in DataverseNL at the time of your visit?

These counts change over time, so record the values you observe at the moment you complete the exercise rather than relying on historical numbers from older lesson versions.

What is a DOI, and why is it important?


DOI is a persistent identifier commonly used for articles, datasets, reports, and other scholarly outputs.

Two major DOI registration services in research are:

A DOI has three conceptual parts:

  • the resolver service
  • the registrant prefix
  • the locally assigned suffix
Illustration of DOI components
Diagram explaining DOI structure

DOIs make resources easier to cite and track, and they continue resolving even when storage details change.

Challenge

Exercise

Upload a test dataset to the DataverseNL demo or another training repository:

  1. Download the mock dataset: MOCK DATA
  2. Deposit it in a suitable demo or sandbox repository
  3. Record the DOI or persistent identifier you receive

There is no single correct answer. The point is to experience how a repository assigns persistent identifiers and what metadata are required before publication.

Discussion

Scenario

You conducted personal interviews for an ethnographic migration study. The raw transcripts cannot be openly shared, even under restricted access.

How should archiving be handled in this case? What metadata, derived outputs, or access statements should still be preserved and published?

Key Points
  • Repositories improve preservation, discovery, and citation.
  • Prefer a trusted community repository when one fits your discipline.
  • General repositories such as Zenodo still provide a strong baseline for FAIR publication.
  • DOI is a key persistent identifier for datasets and publications.

Content from Make machines work for you


Last updated on 2026-04-14 | Edit this page

Estimated time: 25 minutes

Overview

Questions

  • What is the difference between metadata and rich metadata?
  • How can a rich metadata file be created?
  • Where should rich metadata be stored?

Objectives

  • Understand the difference between plain metadata and rich metadata.
  • Learn where machine-readable metadata comes from and how to generate it.
  • Identify good places to publish rich metadata files.
Callout

FAIR principles used for rich metadata

Findable:

Interoperable:

What is the difference between metadata and rich metadata?


Metadata is data about the data. It describes properties of a digital object, such as title, creator, publisher, size, or identifier.

Example of metadata fields shown in a repository
Repository metadata example
Metadata attribute Example
Descriptive metadata DOI
Structural metadata Data size
Administrative metadata Publisher
Statistical metadata Number of files

Rich metadata goes further. It is:

  • standardized
  • structured
  • machine-readable
  • based on shared vocabularies
  • suitable for search engines and automated reuse
Callout

Rich metadata is more than plain text

Metadata alone can be descriptive but still hard for machines to interpret. Rich metadata uses a structured format such as JSON-LD and shared schemas such as Schema.org, DataCite, or Dublin Core.

Further reading:

Additional walkthrough:

How can a rich metadata file be created?


Researchers usually do not need to write rich metadata by hand. In many cases, it can be exported from a repository or generated through a form-based tool.

Platform Source Online Note
Dataverse export button https://dataverse.nl/dataset.xhtml?persistentId=doi:10.34894/Q80QUE Yes Fastest path for datasets
FAIR Metadata Wizard https://maastrichtu-ids.github.io/fair-metadata-wizard/ Yes Tailored to scientific projects
NSDRA JSON-LD generator https://nsdra.github.io/nsdra-jsonld-metadata-generator-webapp/# Yes Community-specific but adaptable
Steal Our JSON-LD https://jsonld.com/json-ld-generator/ Yes General-purpose
JSON-LD Schema Generator for SEO https://hallanalysis.com/json-ld-generator/ Yes Broad but SEO-oriented
Repository export interface showing JSON-LD options
Exporting JSON-LD metadata from a repository

Where should a rich metadata file be stored?


A simple rule is:

Callout

Publish rich metadata everywhere the data lives

  • in the project root folder
  • in the data repository
  • in the GitHub repository
  • on the project website

The exact schema may differ by context, but common choices include:

Rich metadata also helps connect publications, datasets, and project websites. Without structured metadata, search engines and aggregators may not recognize the resource as a dataset at all.

Discussion

Scenario

You are creating a project website for a research consortium.

How should rich metadata relate to that website? Which information belongs in the HTML, which belongs in repository records, and how do those pieces work together to improve discovery?

Key Points
  • Rich metadata combines descriptive metadata with shared vocabularies and a structured machine-readable format.
  • JSON-LD is a common way to publish rich metadata.
  • Repositories can often generate rich metadata automatically.
  • Rich metadata should be published anywhere the digital object is stored or represented.

Content from Responsibly reuse


Last updated on 2026-04-14 | Edit this page

Estimated time: 25 minutes

Overview

Questions

  • How should data be cited when reusing a data source?
  • How can you check whether data is likely to be reused and discovered?

Objectives

  • Understand the importance of citing datasets.
  • Learn how discoverability testing supports future data reuse.
  • Connect reuse to rich metadata and structured web data.
Callout

FAIR principles used in data reuse

Findable:

Reusable:

How should data be cited when reusing a data source?


Data reuse means working with an existing research data source rather than generating all material from scratch.

At minimum, a dataset citation should include:

  • creator
  • publication year
  • title
  • resource type or version
  • publisher
  • persistent identifier, usually DOI

Example citation:

Neff, Roni A.; L. Spiker, Marie; L. Truant, Patricia (2016). Wasted Food: U.S. Consumers’ Reported Awareness, Attitudes, and Behaviors. PLOS ONE. Dataset. https://doi.org/10.1371/journal.pone.0127881

Original dataset example:

Dataset repository page with citation tools
Repository interface with citation export options
Callout

Standard citations improve discoverability

When datasets are cited in recognizable formats, citation indexes and data aggregators can track those references much more effectively.

How can you check whether data is likely to be reused and discovered?


Datasets are digital objects on the web. To be reused, they must first be easy to find. That depends heavily on rich metadata and structured web markup.

Google’s Rich Results Test is one way to inspect whether a dataset page exposes structured information machines can recognize:

Example test:

Entering a dataset URL in Google Rich Results TestTest results after checking a dataset URL

Callout

Without rich metadata, data can be effectively invisible

A dataset may be accessible to humans through a webpage and still remain hard for search engines or data aggregators to interpret.

Challenge

Exercise

Visit the Culture Crime News database:

https://news.culturecrime.org/all

Run Google’s Rich Results Test on that resource.

Did the test detect a structured dataset?

No. This is a useful reminder that a human-friendly page is not automatically a machine-friendly dataset description.

If you want to go further, review Google’s explanation of the Rich Results Test:

Search engines and data aggregators rely on shared standards such as:

An example of dataset-oriented JSON-LD:

JSON

{
  "@context": "https://schema.org/",
  "@type": "Dataset",
  "name": "Dummy Data",
  "description": "This is an example for the coursebook",
  "url": "https://example.com",
  "identifier": ["https://doi.org/XXXXXX"],
  "license": "https://creativecommons.org/publicdomain/zero/1.0/",
  "isAccessibleForFree": true
}
Challenge

Exercise

Go back to the dataset you uploaded in the data archiving episode and export its rich metadata as JSON-LD.

What similarities do you notice between that export and the example above?

You should usually see:

  • JSON-LD as the format
  • Schema.org or another shared schema as the vocabulary context
  • Dataset or an equivalent type
  • explicit fields for identifier, title, description, and license
Discussion

Discussion

Pick one or more datasets used in this lesson and test them with a FAIR assessment tool such as FAIR enough:

https://fair-enough.semanticscience.org/

What do the scores suggest about reuse readiness, and which missing metadata or links would most improve them?

Key Points
  • Reuse starts with clear citation and persistent identifiers.
  • Rich metadata is essential for search engines and aggregators to discover datasets.
  • Human-friendly web pages are not enough; machine-readable structure matters.
  • Testing discoverability is a practical part of making data reusable.