Chapter 2 Data sources
We collected our data from Statista, a German company specializing in market and consumer data. The specific dataset is from its digital market outlook - digital advertising section.
The link is provided here: link
We selected United States, United Kingdom and China as our observation units and respectively downloaded the dataset.
Datasets are present as following:
## # A tibble: 6 × 16
## Region Market Chart Name Unit Source `2017` `2018` `2019` `2020` `2021`
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 China Banner Adv… Ad S… Total mill… Stati… 15853. 2.19e4 2.92e4 3.47e4 4.07e4
## 2 China Banner Adv… Ad S… Bann… mill… Stati… 6060. 7.10e3 8.07e3 8.67e3 9.24e3
## 3 China Banner Adv… Ad S… Bann… mill… Stati… 9793. 1.48e4 2.11e4 2.61e4 3.15e4
## 4 China Banner Adv… Ad S… Total perc… Stati… NA 3.79e1 3.34e1 1.91e1 1.72e1
## 5 China Banner Adv… Ad S… Bann… perc… Stati… NA 1.72e1 1.36e1 7.43e0 6.58e0
## 6 China Banner Adv… Ad S… Bann… perc… Stati… NA 5.07e1 4.29e1 2.35e1 2.08e1
## # … with 5 more variables: `2022` <dbl>, `2023` <dbl>, `2024` <dbl>,
## # `2025` <dbl>, `2026` <dbl>
## # A tibble: 6 × 16
## Region Market Chart Name Unit Source `2017` `2018` `2019` `2020` `2021`
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 United Sta… Banne… Ad S… Total mill… Stati… 36380. 4.45e4 5.28e4 6.25e4 6.88e4
## 2 United Sta… Banne… Ad S… Bann… mill… Stati… 16213. 1.76e4 1.83e4 1.93e4 2.00e4
## 3 United Sta… Banne… Ad S… Bann… mill… Stati… 20167. 2.70e4 3.45e4 4.31e4 4.88e4
## 4 United Sta… Banne… Ad S… Total perc… Stati… NA 2.24e1 1.87e1 1.82e1 1.01e1
## 5 United Sta… Banne… Ad S… Bann… perc… Stati… NA 8.37e0 4.27e0 5.52e0 3.50e0
## 6 United Sta… Banne… Ad S… Bann… perc… Stati… NA 3.37e1 2.81e1 2.49e1 1.31e1
## # … with 5 more variables: `2022` <dbl>, `2023` <dbl>, `2024` <dbl>,
## # `2025` <dbl>, `2026` <dbl>
## # A tibble: 6 × 16
## Region Market Chart Name Unit Source `2017` `2018` `2019` `2020` `2021`
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 United Sta… Banne… Ad S… Total mill… Stati… 36380. 4.45e4 5.28e4 6.25e4 6.88e4
## 2 United Sta… Banne… Ad S… Bann… mill… Stati… 16213. 1.76e4 1.83e4 1.93e4 2.00e4
## 3 United Sta… Banne… Ad S… Bann… mill… Stati… 20167. 2.70e4 3.45e4 4.31e4 4.88e4
## 4 United Sta… Banne… Ad S… Total perc… Stati… NA 2.24e1 1.87e1 1.82e1 1.01e1
## 5 United Sta… Banne… Ad S… Bann… perc… Stati… NA 8.37e0 4.27e0 5.52e0 3.50e0
## 6 United Sta… Banne… Ad S… Bann… perc… Stati… NA 3.37e1 2.81e1 2.49e1 1.31e1
## # … with 5 more variables: `2022` <dbl>, `2023` <dbl>, `2024` <dbl>,
## # `2025` <dbl>, `2026` <dbl>
Each dataset has 16 columns: Region, Market, Chart, Name, Unit, Source and Years (2017-2026). The first five columns are filled with character data, and Years columns are filled with values which are all numeric data.
Region
It is a column filled with region. Each dataset has one region, so it does not help much when we research in one country. But it will be helpful when we combine datasets to do the comparison.
Market
## [1] "Banner Advertising" "Social Media Advertising"
## [3] "Digital Advertising" "Search Advertising"
## [5] "Classifieds" "Video Advertising"
There are six factors inside this column, with the whole market “Digital Advertising”, and five markets inside it: “Banner”, “Social Media”, “Search”, “Classifieds”, and “Video”.
Chart
## [1] "Ad Spending"
## [2] "Ad Spending Growth"
## [3] "Average Ad Spending per Internet User"
## [4] "Users by Age"
## [5] "Users by Gender"
## [6] "Users by Income"
## [7] "Usage Shares"
## [8] "Top Company Revenues (Worldwide & Consolidated)"
## [9] "Reach by social network"
## [10] "Ad Spending by Segment"
## [11] "Ad Spending Growth by Segment"
## [12] "Ad Spending Share Desktop & Mobile"
## [13] "Ad Spending Growth Desktop & Mobile"
## [14] "Ad Spending Share by Industry"
## [15] "Ad Spending Share (Non-)Programmatic"
## [16] "Ad Spending Growth (Non-)Programmatic"
## [17] "Average Ad Spending per Internet User by Segment"
## [18] "Ad Spending Social Media"
## [19] "Ad Spending Change"
## [20] "Share from Digital"
This column contains different charts we would use for further analysis. Mainly, we would look at Ad Spending, Users, and Reach by Social Network charts.
Name
This column contains different Names of Charts. We will not present them here because the dataset has not been transformed to tidy version.
Unit, Source, Year
Those columns are obvious, so we will not talk about them in detail here.
Briefly, we noticed that this dataset is not in tidy format and have several NAs. We will clean the dataset in the next sections.