Skip to content

Statistical processing

Contact info

Science, Technology and Culture, Business Statistics.
Christian Törnfelt

cht@dst.dk

Get as PDF

Book sales

Data for this statistics is collected through monthly transfers of transaction data on book sales units from SAXO.com A/S, Indeks Retail (Bog & Idé), Gucca and the major supermarket chains (COOP, Salling and Dagrofa).

The transaction data is enriched with information from DBK, which manages the Bogportalen and Publizon, allowing the books to be classified by genre, format/media, publication language, original language and binding type. If this information is missing for certain editions, efforts are made to manually complete it using various assumptions.

Source data

The statistics are based on transaction data provided by SAXO.com A/S, Indeks Retail (Bog & Idé) and Gucca. Additionally, transaction data (barcode data) is supplied by the Prices and Consumption office in the Economic Statistics department at Statistics Denmark. The data includes all unit sales of books in stores under COOP, Salling and Dagrofa.

The transaction data includes, among other things, the ISBN number of the sold book, the sale time, and the number of copies sold at the time of the transaction, as well as a few supplementary details about the purchased book, such as whether it is a physical book, e-book, audiobook, or another form of product information, depending on the data provider.

The ISBN number is used as a key to enrich the transaction data with metadata from DBK (Bogportalen) and Publizon. The information in the metadata is used to classify the books by genre, format/media, publication language, original language and binding type.

Frequency of data collection

Monthly data transfer from data providers.

Data collection

System-to-system solution.

Data validation

The establishment of the groupings and classifications based on metadata has been validated by key stakeholders in the book industry and subject matter experts. Additionally, data is compared from quarter to quarter. In the case of significant fluctuations, the data provider or metadata provider is contacted.

Data compilation

Transaction data is cleaned by removing products that are not books, such as print paper. Additionally, ISBN numbers that deviate from the 13-digit structure corresponding to books are removed. The transaction data is enriched with metadata from DBK and Publizon. The metadata is then used to categorize the data according to the classifications that form the basis for the tables in StatBank Denmark. After the data is enriched and processed, it is aggregated for the creation of tables in the StatBank.

The metadata includes information on Thema codes, language of publication, original language, format/media and binding type (digital format for e-books and audiobooks). Thema codes are part of an international classification system where books are categorized by topic. Since September 2019, classifying books using Thema codes in the Book Portal and Publizon has been mandatory. Not all publications have been assigned Thema codes on the Book Portal, but coverage is increasing as the information on the portal is updated.

Manual adjustments are made as follows:

Genre

If Thema codes are missing in the metadata, store category information is used to manually assign a Thema code. This is done based on the distribution of Thema codes for titles in the same category, if this is relevant (e.g., Crime and Thrillers, Fantasy, Fiction, Economics, Cookbooks, Textbooks for children, etc.). This supplementation of Thema codes based on store category information accounts for approximately 1 pct. of total sales.

If metadata for a book sold in a supermarket is missing, additional information from supermarket transaction data is used. If their category information, for example, indicates “Crime/Thrillers,” this information is used to place the book in a genre. This genre placement accounts for about 3 pct. of total sales.

Read more about the methodology in the document Genreopdeling ved hjælp af Themakoder (in Danish only).

Published language

If language information is available in metadata from DBK, this is used. Both direct language information and store category information (category text) from DBK data, which may contain information about the book's language, are checked. If language information is not available in DBK or Publizon metadata, the country of publication is inferred from the ISBN number, and it is assumed that, for example, books published in England are in English, books published in France are in French, and so on.

Format/Media

The data processing assumes that the books sold are primarily physical books. To ensure correct categorization of format, ISBN numbers are validated by comparing them with metadata from DBK and Publizon. If metadata indicates that the book is an e-book or audiobook, the format is corrected accordingly. Publizon data is used as a reliable source for this, as Publizon is Denmark's largest distributor of e-books and audiobooks. Additionally, information from SAXO and Indeks Retail is used for further verification. SAXO's transaction data indicates whether the product is a physical book, e-book, or audiobook using the "TYPE" field, while Indeks Retail divides their data into categories such as "Books" and "Audiobooks."

Binding Type

In DBK metadata, direct information about binding type is incomplete, so store category information in the data (category text) is also checked, as it may contain information about the book's binding.

Adjustment

No corrections are made beyond what has already been described in sections 3.4 Data Validation and 3.5 Data Processing.