Skip to content

Guidelines for population description

In this guide, you can read more about how to describe a population and justify the need to form a population yourself. You can also learn about the requirements for different types of populations, what to consider in the description, and how we at Research Service evaluate your population description based on the Principle of Data Minimization (according to the Danish Data Protection Law).


Requirements for the population description

The population must be described in a way that clearly indicates what it consists of (individuals, companies, addresses, a unique combination of several factors, or other), the year or period it covers, regardless of the size of the population. Additionally, the population must be defined, justified, and delimited according to the purpose. If a full population is desired, arguments for this must be included in the population description.

A good population describtion should include one or more of the following elements:

  • Which registers or external/other data will be used in forming the population.
  • Which period/years/quarters/months will be used.
  • Which conditions need to be met, preferably by actually stating the conditions with variable names and delimitation on specific values. For example, that the population should be delimited by age from 15-76 years.
  • Who will form the population(s).
  • How registers will be linked if multiple registers are to be used. Linkage based on which variables and possibly through which key register. (However, this can be specified in an extraction description attached in Denmark's Data Portal)

An extraction description should be attached to the population description if the population is particularly conditional, linked, or otherwise. This also applies even if the population is formed externally.

Requirements for case-control population description

In addition to the above-mentioned, which also applies to case-control populations, the following should be specified in an extraction description attached to the population in the project proposals:

Possible controls (gross control):

  • Which pool should controls be drawn from? (e.g., individuals residing in Region Zealand)
  • Which registers should be used to form the pool of possible controls? (e.g., BEFBOP, BEFADR, VNDS, DOD)
  • What inclusion and exclusion criteria (and based on which variables and which time period) need to be met? (e.g., gender = 2 (women), municipality = 607 (Fredericia), residence in DK from 01-01-2011 to 31-12-2014, not deceased, not emigrated)

Number of controls and retracing:

  • How many controls are desired to be drawn per case?
  • Can cases be controls for other cases?
  • Can controls change status during the inclusion period?
  • Should controls be drawn with or without retracing between cases, i.e., can a control appear as a control for more than one case?
  • Should controls be drawn with or without retracing within cases, i.e., can a control appear more than once for a given case or not.

With case-control population, exposed entities versus non-exposed entities are meant.

If users are to form the population themselves

For projects where users are allowed to form the "final" populations themselves, which will be used for data analysis and to generate results from the project, it is required to be clearly described in the project proposal (+ attachment to the population description):

  • Why it is necessary for the user to form the final population(s) themselves (why can't Research Service do this)
  • How the "final" population(s) should be defined and delimited (i.e., specific registers, variables, and possible conditions on variable values - depending on what makes sense).
  • How "gross reference population(s)" should be defined and delimited (i.e., specific registers, variables, and possible conditions on variable values - depending on what makes sense - but the point is that data minimization must still be observed. Advanced statistical matching is allowed (e.g., "High dimensional propensity score matching") to form the final reference population. For this type of matching, full access to available register information is provided, corresponding to the Basic Data registers requested for data analysis of the final populations. Only data regarding the final populations are used and stored (see below).
  • Which data (registers and variables) should only be used for data analysis to generate results for the project's purpose.
  • That only data analyses directly relevant to the project's purpose, generated analysis results, and sent analysis results will be performed based on the final delimited population(s).

Therefore, enter the following text at the bottom of the population section in the project setting:

"Only data analyses, generated analysis results, and sent analysis results that are directly relevant to the project's purpose and are based on the final delimited population(s) will be performed, generated, and sent. Data that are not part of / based on the final delimited population(s) are solely used to form the "final" population(s) and must not be sent."

Assessment of population description

FSE evaluates the population description in terms of the principle of data minimization, including:

  • Whether the desired population is clearly related to a well-defined purpose.
  • Whether the desired population is delimited according to the purpose (e.g., time period, birth year, education level, gender).
  • Whether the background for delimitation or lack thereof makes sense in relation to the purpose.
  • Whether any justification for a full population necessitates and justifies the need for this (as a starting point, it is not possible to access a "full" population, i.e., access to all information on one or more registers, but it is possible if the problem requires it and the necessity is justified).

Furthermore, FSE ensures that it is clear:

  • Who will form the population.
  • Which registers, variables, and possible values the population is based on.
  • Which registers, variables, and possible values any delimitations are based on.
  • Which time period the population is to be formed for.