Package 'SchoolDataIT'

Title: Retrieve, Harmonise and Map Open Data Regarding the Italian School System
Description: Compiles and displays the available data sets regarding the Italian school system, with a focus on the infrastructural aspects. Input datasets are downloaded from the web, with the aim of updating everything to real time. The functions are divided in four main modules, namely 'Get', to scrape raw data from the web 'Util', various utilities needed to process raw data 'Group', to aggregate data at the municipality or province level 'Map', to visualize the output datasets.
Authors: Leonardo Cefalo [aut, cre] , Alessio Pollice [ctb, ths] , Paolo Maranzano [ctb]
Maintainer: Leonardo Cefalo <[email protected]>
License: GPL (>= 3)
Version: 0.2.4
Built: 2025-02-26 06:17:58 UTC

Help Index

Subset of the administrative codes of municipalities


This table includes the administrative codes of the municipalities from four regions: Molise, Campania, Apulia and Basilicata, as of June 30th 2022; some strings in field Municipality_description including accents have been forced to ASCII. The whole dataset can be retrieved with the command Get_AdmUnNames(Year = 2022, date = "06_30")




## 'example_AdmUnNames20220630' A data frame with 1,074 rows and 5 columns:

  • Province_code Numeric; the NUTS-3 administrative code

  • Province_initials Character;abbreviated NUTS-3 denomination.

  • Municipality_code Character; the ISTAT LAU (municipality) ID.

  • Municipality_description Character; the municipality name.

  • Cadastral_code Character; a LAU - level ID code, different from the official ISTAT municipality code. It is used in the school registry (see example_input_Registry23)



See Also


Subset of the school registry in school year 2022/23


This dataframe includes the classification of municipalities , from four regions: Molise, Campania, Apulia and Basilicata. Only the first 10 columns are included; some strings in field Municipality_description including accents have been forced to ASCII. The whole dataset can be retrieved with the command Get_InnerAreas(). For the definition of ISTAT inner areas class, see Get_InnerAreas




## 'example_InnerAreas' A data frame with 1074 rows and 10 columns:

  • Municipality_code Character; the ISTAT LAU (municipality) ID.

  • Municipality_code_numeric Numeric; the ISTAT LAU (municipality) ID in numeric format.

  • Cadastral_code Character; a LAU - level ID code, different from the official ISTAT municipality code.

  • Region_code Numeric; the region (NUTS-2 administrative level) ID

  • Region_description Character; the region (NUTS-2 administrative level) name.

  • Province_code Numeric; the NUTS-3 administrative code.

  • Province_initials Character; abbreviated NUTS-3 denomination.

  • Province_description Character; the province (NUTS-3 administrative level) denomination.

  • Municipality_description Character; the municipality name.

  • Inner_area_code_2014_2020 Character; the ISTAT inner areas classification between 2014 and 2020.

  • Inner_area_description_2014_2020 Character; the description of the classes identified in the previous column

  • Inner_area_code_2021_2027 Character; the ISTAT inner areas classification between 2021 and 2027.

  • Inner_area_description_2021_2027 Character; the description of the classes identified in the previous column

  • Destination_municipality_code Character; For non-central municipalities (classes C, D, E, F), the ID of the closest pole municipality according to the 2021-2027 classification

  • Destination_municipality_code Character; The denomination of the municipalities in the previous column

  • Destination_pole_code Character; An internal ID convention for the destination poles; it includes a letter (the class of the destination pole, either A or B); a number of two digits (the region code of the destination pole) and the progressive number of poles within a region.



See Also


Subset of the school buildings database in school year 2022/23


This dataframe includes the schools directly identifiable as primary, middle or high school, from four regions: Molise, Campania, Apulia and Basilicata. Only the first 35 columns are included. Some strings including accents in fields Other_disturbances_proximity, Other_specific_criticalities and Other have been forced to ASCII. The whole dataset can be retrieved with the command Get_DB_MIUR(2023)




## 'example_input_DB23_MIUR' A data frame with 7479 rows and 35 columns:

  • Year Numeric; the school year.

  • School_code Character; the school ID.

  • Order Character; the school order, either primary, middle or high school.

  • Reference_institute_code Character; the ID of the reference institute.

  • Building_code Character; the building ID; the first 6 digits usually identify the municipality.

  • Municipality_code Character; the ISTAT LAU (municipality) ID.

  • Municipality_description Character; the municipality name.

  • Province_initials Character; abbreviated NUTS-3 denomination.

  • Postal_code Character; the ZIP code; slightly finer than municipality boundaries. for big municipalities.

  • Context_without_disturbances Character; whether the school belongs to an environment devoid of disturbances; otherwise, the types of disturbances are listed in columns 11 - 18.

  • Dumps_proximity Character; whether the school is close to dumps (disturbance element).

  • Pollutant_industries_proximity Character; whether the school is close to pollutant industries (disturbance element).

  • Pollutant_waters_proximity Character; whether the school is close to pollutant or stagnant streams or ponds (disturbance element).

  • Air_pollution_sourcer_proximity Character; whether the school is close to sources of air pollution (disturbance element).

  • Acoustic_pollution_sourcer_proximity Character; whether the school is close to sources of acoustic pollution (disturbance element).

  • Electromagnetic_radiation_sources_proximity Character; whether the school is close to sources of electromagnetic radiation (disturbance element).

  • Graveyards_proximity Character; whether the school is close to a graveyard (disturbance element).

  • Other_disturbances_proximity Character; other disturbance elements to which the school is close, other than those already listed.

  • School_area_specific_criticalities Character; whether any specific criticality element occurs inside the school area; specified in columns 20 - 27.

  • Layby absence Character; whether the access to the area pertaining to the school building lacks a lay-by or pitch (school area criticality element).

  • Unfenced area Character; whether the school building area lacks fences or enclosures (school area criticality element).

  • Large_traffic Character; whether the school area is close to large traffic streams (school area criticality element).

  • Railway_traffic Character; whether the school area is close to railway traffic streams (school area criticality element).

  • Abandoned_industries Character; whether the school area is located in pre-existences of abandoned industries (school area criticality element).

  • Decayed_urban_area Character; whether the school belongs or is close to a decayed area (school area criticality element).

  • Risky_industries_proximity Character; whether the school is close to perilous industrial areas (school area criticality element).

  • Other_specific_criticalities Character; specific criticality elements regarding the school area, other than those already listed.

  • School_bus Character; whether the school is reached by school-bus service.

  • Urban_public_transport Character; whether the school is served by a urban public transport station in the range of 250 meters.

  • Interurban_public_transport Character; whether the school is served by a inter-urban public transport station in the range of 500 meters.

  • Railway_transport Character; whether the school ranges 500 meters or less from a train station.

  • Private_transport Character; whether the school can be reached by private transport.

  • Disabled_people_transport Character; whether the school is provided with disabled people specific transport.

  • Bicycle_lane Character; whether the building is in proximity of a bicycle/bike lane.

  • Other Character; whether the building can be reached in any other specific way.


Homepage; more in detail, the dataset blocks are downloaded respectively from: cols 10-18; cols 20-27; cols 28-35

See Also


Subset of the students and classes counts in school year 2022/23


This dataframe includes students and classes counts for the schools from four regions: Molise, Campania, Apulia and Basilicata. The whole dataset can be retrieved with the command Get_nstud(2023, filename = "ALUCORSOINDCLASTA")




## 'example_input_nstud23' A data frame with 21208 rows and 7 columns:

  • Year Numeric; the school year.

  • School_code Character; the school ID.

  • Order Character; the school order, either primary, middle or high school.

  • Grade Numeric; the school grade.

  • Classes Numeric; the count of classes of a given grade in each school

  • Male_students Numeric; the count of male students in all classes of a given educational grade in each school

  • Female_students Numeric; the count of female students in all classes of a given educational grade in each school


Specific link

See Also


Subset of the school registry in school year 2022/23


This dataframe includes the schools directly identifiable as primary, middle or high school, from four regions: Molise, Campania, Apulia and Basilicata. Only the first 10 columns are included. The whole dataset can be retrieved with the command Get_Registry(2023)




## 'example_input_Registry23' A data frame with 5929 rows and 10 columns:

  • Year Numeric; the school year.

  • Area Character; the macro-area of the municipality, i.e. North, Center or South.

  • Region_description Character; the region (NUTS-2 administrative level) name.

  • Province_description Character; the province (NUTS-3 administrative level) name.

  • Reference_institute_code Character; the ID of the reference institute.

  • School_code Character; the school ID.

  • Cadastral_code Character; a LAU - level ID code, different from the official LAU municipality code. The Italian Ministry of Education does provide this code in the place of the LAU code for both the Schools registry and the early school buildings DBs.

  • Municipality_description Character; the municipality name.

  • School_address Character; the school physical address.

  • Postal_code Character; the ZIP code, slightly finer than municipality boundaries for big municipalities.


Source link

See Also


Subset of the Invalsi scores in school year 2022/23


This dataframe includes the Invalsi scores of the schools from four regions: Molise, Campania, Apulia and Basilicata, for the school year 2022/23. The whole dataset can be retrieved with the command Get_Invalsi_IS(level = "NUTS-3")




## 'example_Invalsi23_prov' A data frame with 240 rows and 11 columns:

  • Year Character; the school year.

  • Grade Numeric; the school grade; only includes the school grades subjected to the Invalsi survey. Either 2, 5, 8, 10 or 13.

  • Subject Character; the school subject in which the test is taken; either Italian, Mathematics, English reading or English listening.

  • Province_code Numeric; the NUTS-3 administrative code.

  • Province_initials Character; abbreviated NUTS-3 denomination.

  • Province_description Character; the province (NUTS-3 administrative level) denomination.

  • Average_percentage_score Numeric; the province-level percentage of sufficient tests, only for primary schools; ranges 0-100.

  • Std_dev_percentage_score Numeric; the standard deviation of the percentage of sufficient tests, only for primary schools.

  • WLE_average_score Numeric; the province-level average WLE (Weighted Likelihood Estimator) score.

  • Std_dev_WLE_score Numeric; the standard deviation of WLE scores.

  • Students_coverage Numeric; the percentage of students for which the Invalsi tests are reported.



See Also


Subset of Italian provinces shapefile


This is the shapefile for the provinces belonging to four regions: Molise, Campania, Apulia and Basilicata, as of January 1st 2022. These are the latest administrative units boundaries relevant at the beginning of the school year 2022/23. The whole shapefile can be retrieved with the command Get_Shapefile(Year = 2022, level = "NUTS-3")




## 'example_Prov22_shp' A Spatial polygon data frame with 13 rows/polygons and 15 columns:

  • COD_RIP Numeric; the code for the macroarea (1 for Northwest, 2 for Northeast, 3 for Center, 4 for South and 5 for Isles)

  • COD_REG Numeric; the region (NUTS-2 administrative level) ID

  • COD_PROV Numeric; the NUTS-3 administrative code

  • COD_CM Numeric; the administrative code for Metropolitan Cities (which are always at the NUTS-3 level), obtained as 200 + NUTS-3 code, if the unit is a Metropolitan city; 0 otherwise.

  • COD_UTS Numeric; the administrative code for Metropolitan cities if the unit is a Metropolitan City; the province code otherwise.

  • DEN_PROV Character; the province (NUTS-3 administrative level) name, if the unit is not a Metropolitan City; blank otherwise.

  • DEN_CM Character; the Metropolitan City (NUTS-3 administrative level) name, if the unit is a Metropolitan City; blank otherwise.

  • DEN_UTS Character; the province or Metropolitan City (NUTS-3 administrative level) name.

  • SIGLA Character; abbreviated NUTS-3 denomination.

  • TIPO_UTS Character; the NUTS-3 type of the unit; either "Provincia" (Province) or "Citta metropolitana" (Metropolitan City)

  • Shape_Leng Numeric; the polygon perimeter.

  • Shape_Area Numeric; the polygon area.

  • geometry the polygon geometry.



See Also


Association of the municipality code to a subset of public schools 2022/23


This list maps the IDs of the schools from four regions (Molise, Campania, Apulia and Basilicata) to the corresponding LAU codes. The whole dataset can be retrieved with the command Get_School2mun(2023)




## 'example_School2mun23' A list of four elements

  • Registry_from_buildings A data frame of 5527 rows and 5 columns, including the schools listed in the buildings registry.

  • Registry_from_registry A data frame of 5929 rows and 5 columns, including the schools listed in the schools registry.

  • Any A data frame of 5954 rows and 5 columns, including schools listed in any of the registryes

  • Both A data frame of 5510 rows and 5 columns, including schools listed in both registries

For each element, rows correspond to school IDs; the columns are:

  • School_code Character; the school ID.

  • Province_code Numeric; the NUTS-3 administrative code.

  • Province_initials Character; abbreviated NUTS-3 denomination.

  • Municipality_code Character; the ISTAT LAU (municipality) ID.

  • Municipality_description Character; the municipality name.


Buildings registry (2021 onwards); Buindings registry(until 2019); Schools registry

See Also


Download the names and codes of Italian LAU and NUTS-3 administrative units


This function downloads a file provided by the Italian National Institute of Statistics including all the codes of administrative units in Italy. As of today, it is the easiest way to map directly cadastral codes to municipality codes.


Get_AdmUnNames(Year = 2023, date = "01_01", autoAbort = FALSE)



Numeric or character value. Last available is 2024. For coherence with school data, it is also in the formats: 2023, "2022/2023", 202223, 20222023. 2023 by default.


Character. The reference date, in format "mm_dd", either "01_01" "06_30", or "09_01" (close to the beginning of the school year). "01_01" by default.


Logical. Whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.


An object of class tbl_df, tbl and data.frame, including: NUTS-3 code, NUTS-3 abbreviation, LAU code, LAU name (description) and cadastral code. All variables are characters except for the NUTS-3 code.




Get_AdmUnNames(2024, autoAbort = TRUE)

Download the data regarding the broad band connection activation in Italian schools


Retrieves the data regarding the activation date of the broad band connection in schools. It also indicates whether the connection was activated or not at a certain date.


  Date = Sys.Date(),
  verbose = TRUE,
  show_col_types = FALSE,
  autoAbort = FALSE



Object of class Date. The date at which it is required to determine if the broad band connection has been activated or not. By default it is the current date.


Logical. If TRUE, the user keeps track of the main underlying operations. TRUE by default.


Logical. If TRUE, if the verbose argument is also TRUE, the columns of the raw dataset are shown during the download. FALSE by default.


Logical. Whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.


Ultra - Broadband is defined as everlasting internet connection with a maximum speed of 1 gigabit per second, with a minimum guaranteed speed of 100 megabits/second both on the uploading and downloading operations, until the peering point is reached, as declared on the data provider's website. In the example the broadband availability at the beginning of school year 2022/23 (1st september 2022) is shown.


An object of class tbl_df, tbl and data.frame. The variables BB_Activation_date and BB_Activation_staus indicate the activation date and activation status of the broadband connection at the selected date.


Broadband dashboard: <>


Broadband_220901 <- Get_BroadBand(Date = as.Date("2022-09-01"), autoAbort = TRUE)


Broadband_220901[, c(9,6,13,14)]

Download the database of Italian public schools buildings


This function downloads the School Buildings Open Database provided by the Italian Ministry of Education, University and Research.

It is one of the main sources of information regarding the infrastructure system of public schools in Italy. For a given year, all available data are downloaded (except for the structural units section, which has a different level of detail) and gathered into a unique dataframe.


  Year = 2023,
  verbose = TRUE,
  input_Registry = NULL,
  input_AdmUnNames = NULL,
  show_col_types = FALSE,
  certifications = FALSE,
  autoAbort = FALSE



Numeric or character value. Reference school year (last available is 2023). Available in the formats: 2023, "2022/2023", 202223, 20222023. 2022 by default (other databases are not currently available for 2023).


Logical. If TRUE, the user keeps track of the main underlying operations. TRUE by default.


Object of class tbl_df, tbl and data.frame. The school registry corresponding to the year in scope, obtained as output of the function Get_Registry. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default.


Object of class tbl_df, tbl and data.frame. The ISTAT file including all the codes and all the names of the administrative units for the year in scope, obtained as output of the function Get_AdmUnNames. Only necessary for school years 2015/16, 2017/18 and 2018/19. If NULL and required, it will be downloaded automatically but not saved in the global environment. NULL by default.


Logical. If TRUE, if the verbose argument is also TRUE, the columns of the raw dataset are shown during the download. FALSE by default.


Logical. From year 2021/22 onwards, whether to include some safety certifications in the database. Given the particular level of definition of this file, it requires extra computational time (other than the downloading time). FALSE by default.


Logical. Whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.


This function downloads the raw data; missing observations are not edited; all variables are characters. Since certifications are defined at the level of structural units of the single buildings, here the fields read as the percentage of structural units in a building having a given certificate. To edit the output of this function and convert the relevant variables to numeric or Boolean, please Util_DB_MIUR_num. Schools different from primary, middle or high schools are classified as "NR". In the example, the data for school year 2022/23 are retrieved.


An object of class tbl_df, tbl and data.frame.




input_DB23_MIUR <- Get_DB_MIUR(2023, autoAbort = TRUE)


Download the classification of peripheral municipalities


Retrieves the classification of Italian municipalities into six categories; classes D, E, and F are the so-called internal/inner areas; classes A, B and C are the central areas.


Get_InnerAreas(verbose = TRUE, autoAbort = FALSE)



Logical. Whether to keep track of computational time. TRUE by default.


Logical. Whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.


Classes are defined according to these criteria; see the methodological note (in Italian) for more detail:

  • A - Standalone pole municipalities, the highest degree of centrality; they are characterised by a thorough and self-sufficient combined endowment of school, health and transport infrastructure, i.e. there are at least a lyceum and a technical high school; a railway station of medium dimensions and a hospital provided with an emergency ward.

  • B - Intermunicipality poles; the endowment of such infrastructures is complete if a small set of contiguous municipalities is considered

The remaining classes are defined in terms of the national distribution of the road distances from a municipality to the closest pole:

  • C - Belt municipalities, travel time below the median (< 27'42”) .

  • D - Intermediate municipalities, travel time between the median and the third quartile (27'42” - 40'54”).

  • E - Peripheral municipalities, travel time between the third quartile and 97.5th percentile (40'54” - 1h 6' 54”).

  • F - Ultra-peripheral municipalities, travel time over the 97.5th percentile (>1h 6' 54”).

For more information regarding the dataset, it is possible to check the ISTAT methodological note (in Italian) available at <>


An object of class tbl_df, tbl and data.frame.




InnerAreas <- Get_InnerAreas(autoAbort = TRUE)
InnerAreas[, c(1,9,13)]

Download the Invalsi census survey data


Downloads the full database of the Invalsi scores, detailed either at the municipality or province level.


  level = "LAU",
  verbose = TRUE,
  show_col_types = FALSE,
  multiple_out = FALSE,
  autoAbort = FALSE



Character. The level of aggregation of Invalsi census data. Either "NUTS-3", "Province", "LAU", "Municipality". "LAU" by default.


Logical. If TRUE, the user keeps track of the main underlying operations. TRUE by default.


Logical. If TRUE, if the verbose argument is also TRUE, the columns of the raw dataset are shown during the download. FALSE by default.


Logical. Wheter keeping multiple dataframes as outputs (thus overriding the level argument) or not. FALSE by default.


Logical. Whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.


Numeric variables provided are:

  • Average_percentage_score Average direct score (percentage of sufficient tests)

  • Std_dev_percentage_score Standard deviation of the direct score

  • WLE_average_score Average WLE score. The WLE score is calculated through the Rasch's psychometric model and is suitable for middle and high schools in that it is cleaned from the effect of cheating (which would affect both the average score and the score variability). By construction it has a mean around 200 points.

  • Std_dev_WLE_score Standard deviation of the WLE score. By construction it ranges around 40 points at the school level.

  • Students_coverage Students coverage percentage

Additional numeric variables, not always available for all observational units, are:

  • Mean and SD of ESCS indicator

  • First-Fifth_Level: Distribution of the proficiency level of students

  • Targets_percentage: Percentage of students reaching targets

Numeric codes 888 and 999 denote not applicable and not available fields respectively.

If multiple_out == TRUE, provides the following datasets:

  • Municipality_data: LAU-level data

  • Province_data: NUTS-3-level data

  • Region_data: NUTS-2-level data

  • LLS_data: data at the level of local labour systems (Sistemi Locali del Lavoro; see ISTAT webpage for details)

  • Inner_Areas_2021_data aggregated data for inner areas according to the 2020 taxonomy

  • Inner_Areas_2014_data aggregated data for inner areas according to the former 2014 taxonomy

  • Macroarea_data data aggregated for North-West, North-East, Center, South and Islands


Unless multiple_out == TRUE, an object of class tbl_df, tbl and data.frame. Otherwise, a list including objects of the aforementioned classes




Get_Invalsi_IS(level = "NUTS-3", autoAbort = TRUE, verbose = FALSE)

Download students' number data


This functions downloads the data regarding the number of students, from the open website of the Italian Ministry of Education, University and Research


  Year = 2023,
  verbose = TRUE,
  show_col_types = FALSE,
  autoAbort = FALSE



Numeric or character. Reference school year (last available is 2023). Available in the formats: 2022, "2021/2022", 202122, 20212022. 2023 by default


Character. A string included in the name of the file to download. By default it is c("ALUCORSOETASTA", "ALUCORSOINDCLASTA"), which are the file names used so far for the number of students by age and the number of studentsin public schools by age and class.

Other file names are the following. The output is not currently supported by the remainder of the functions involving the number of students.

"ALUITASTRACITSTA" for the number of Italian and foreign students in public schools

"ALUSECGRADOINDSTA" for the number of students of public schools by high school address

"ALUTEMPOSCUOLASTA" for the number of students of public schools by school running time

"ALUCORSOETAPAR", "ALUCORSOINDCLAPAR", "ALUITASTRACITPAR", "ALUSECGRADOINDPAR", "ALUTEMPOSCUOLAPAR" for the data of the previous file but referring to private schools.


Logical. If TRUE, the user keeps track of the main underlying operations. TRUE by default.


Logical. If TRUE, if the verbose argument is also TRUE, the columns of the raw dataset are shown during the download. FALSE by default.


Logical. Whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.


By default, a list of two tbl_df, tbl and data.frame objects:

  • $ALUCORSOETASTA: The number of students by school, school grade and age. It provides a higher number of school than the other element

  • $ALUCORSOINDCLASTA: The number of students and classes by school and school grade. This is a long-format dataframe.




Get_nstud(2023, filename = "ALUCORSOINDCLASTA", autoAbort = TRUE)

Download the number of teachers in Italian schools by province


This functions downloads the number of teachers by province from the open website of the Italian Ministry of Education, University and Research.


  Year = 2023,
  verbose = TRUE,
  show_col_types = FALSE,
  filename = c("DOCTIT", "DOCSUP"),
  autoAbort = FALSE



Numeric or character value. Reference school year for the school registry data (last available is 2023). Available in the formats: 2022, "2021/2022", 202122, 20212022. 2023 by default


Logical. If TRUE, the user keeps track of the main underlying operations. TRUE by default.


Logical. If TRUE, if the 'verbose' argument is also TRUE, the columns of the raw dataset are shown during the download. FALSE by default.


Character. Which data to retrieve among the province counts of teachers/school personnel. By default it is c("DOCTIT", "DOCSUP"), which are the file names used so far for the number of tenured and temporary teachers respectively. Other file names are the following:

"ATATIT" for the number of tenured non-teaching personnel

"ATASUP" for the number of temporary non-teaching personnel


Logical. Whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.


Please notice that by default, the function returns the count of the number of tenured and temporary teachers. If either the count of non-teaching personnel or the count of a single category of teaching personnel is needed, please adapt the filename argument accordingly.


An object of class tbl_df, tbl and data.frame.




nteachers23 <- Get_nteachers_prov(2023, filename = "DOCTIT", autoAbort = TRUE)
nteachers23[, c(3,4,5)]

Download the registry of Italian public schools from the school registry section


This function returns two main pieces of information regarding Italian schools, namely:

  • The denomination of the region, province and municipality to which the school belongs.

  • The mechanographical code to the reference institute of each school.

It is possible to access schools in all the national territory, including the autonomous provinces of Aosta, Trento and Bozen.


  Year = 2023,
  show_col_types = FALSE,
  autoAbort = FALSE



Numeric or character. Reference school year (last available is 2024). Available in the formats: 2023, "2022/2023", 202223, 20222023. 2023 by default.


Character. A string included in the name of the file to download, identifying the schools included. By default it is c("SCUANAGRAFESTAT", "SCUANAAUTSTAT"), i.e. the file names used for public school registries, respectively across all the national territory except for the autonomous provinces of Aosta, Trento or Bozen, and only in the three If instead the registry of the private schools is needed, please insert "SCUANAGRAFEPAR" and/or "SCUANAAUTPAR".

For the registry of private schools, either in all the national territory except for the aforementioned provinces, and for these provinces, please use "SCUANAGRAFEPAR" and "SCUANAAUTPAR" respectively. Please notice that data regarding private schools are not available for most functions in this package.


Logical. If TRUE, the columns of the raw dataset are shown during the download. FALSE by default.


Logical. Whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.


Schools different from primary, middle or high schools are classified as "NR".


An object of class tbl_df, tbl and data.frame.




Get_Registry(2024, filename = "SCUANAGRAFESTAT", autoAbort = TRUE)

Associate a Municipality (LAU) code to each school


This function associates the relevant municipality codes to all the schools listed in the two main registries provided by the Italian Ministry of Education, University and Research, namely:

  • The registry of school buildings, here referred to as Registry_from_buildings (Get_DB_MIUR)

  • The official schools registry, here referred to as Registry_from_registry (see Get_Registry)


  Year = 2023,
  show_col_types = FALSE,
  verbose = TRUE,
  input_AdmUnNames = NULL,
  input_Registry = NULL,
  autoAbort = FALSE



Numeric or character value (last available is 2023). Available in the formats: 2023, "2022/2023", 202223, 20222023. 2023 by default.


Logical. If TRUE, if the verbose argument is also TRUE, the columns of the raw dataset are shown during the download. FALSE by default.


Logical. If TRUE, the user keeps track of the main underlying operations. TRUE by default.


Object of class tbl_df, tbl and data.frame, obtained as output of the function Get_AdmUnNames The ISTAT file including all the administrative units codes for the year in scope. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default.


Object of class tbl_df, tbl and data.frame, obtained as output of the function Get_Registry The school registry corresonding to the year in scope. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default


Logical. Whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.


An object of class list, including 4 elements:

  • $Registry_from_buildings: Object of class tbl_df, tbl and data.frame: the schools listed in the buildings registry

  • $Registry_from_registry: Object of class tbl_df, tbl and data.frame: the schools listed in the schools registry

  • $Any: Object of class tbl_df, tbl and data.frame: schools listed anywhere

  • $Both: Object of class tbl_df, tbl and data.frame: schools listed in both the sections


Buildings registry (2021 onwards); Buindings registry(until 2019); Schools registry


Get_School2mun(Year = 2023, autoAbort = TRUE)

Download the shapefiles of Italian NUTS-3 and LAU administrative units


Downloads either the boundaries or the centroids of the relevant administrative units, either provinces or municipalities, from the ISTAT website. Geometries are expressed in meters.


  level = "LAU",
  lightShp = TRUE,
  autoAbort = FALSE,
  centroids = FALSE



Numeric. Reference year for the administrative units.


Character. Either "LAU"/"Municipality", "NUTS-3"/"Province", "NUTS-2"/"Region", . "LAU" by default


Logical. If TRUE, the function downloads a generalised, i.e.less detailed, and lighter version of the shapefiles. TRUE by default.


Logical. Whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.


Logical. Whether to switch from polygon geometry to point geometry. In the latter case, the point is located at the centroid of the relevant area. FALSE by default.


A spatial data frame of class data.frame and sf.





  Prov23_shp <- Get_Shapefile(2023, lightShp = TRUE, level = "NUTS-3", autoAbort = TRUE)
  ggplot2::ggplot() + ggplot2::geom_sf(data = Prov23_shp) +
    ggplot2::ggtitle("Italian provinces in 2023/01/01")

Aggregate the database of Italian public schools buildings at the municipality and province level


This function transforms the output of the Util_DB_MIUR_num function (which is detailed at the level of single school buildings) at the municipality/LAU and province/NUTS-3 level. It also allows the user to classify the grade of centrality of municipalities through the variable Inner_area.


  data = NULL,
  Year = 2023,
  count_units = TRUE,
  countname = "nbuildings",
  count_missing = TRUE,
  verbose = TRUE,
  track_deleted = TRUE,
  InnerAreas = TRUE,
  ord_InnerAreas = FALSE,
  input_InnerAreas = NULL,
  autoAbort = FALSE,



Object of class tbl_df, tbl and data.frame. The database of school buildings, preferably already converted to numeric, obtained via Util_DB_MIUR_num


Numeric or Character. The reference school year, if either data or input_InnerAreas must be retrieved. Available in the formats: 2023, "2022/2023", 202223, 20222023. Important: use the same Year argument used to retrieve the input school buildings data if they are provided as input. 2023 by default


Logical. Whether the rows to aggregate at each level must be counted or not. True by default.


character. The name of the variable indicating the number of schools included in each municipality of province, if the argument 'count' is TRUE. "nbuildings" by default.


Logical. Whether the function should return two dataframes including the percentage of NAs in the data object at the territorial level. TRUE by default


Logical. If TRUE, the user keeps track of the main underlying operations. TRUE by default.


Logical. If TRUE, the function returns the IDs of schools not included. TRUE by default.


Logical. Whether an indicator of the percentage of schools belonging to peripheral (Inner) areas mus be included or not.


Logical. Whether the Inner areas classification should be treated as an ordinal variable rather than as a binary one (see Get_InnerAreas for the classification). Please notice than the function creates a column for each class, and if this database must be used in a statistical model, one of the 6 resulting columns must be dropped. False by default.


Object of class tbl_df, tbl and data.frame. The classification of peripheral municipalities, needed only if InnerAreas == TRUE, obtained as output of the Get_InnerAreas function. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default


Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.


Additional arguments to the function Util_DB_MIUR_num in case no data are provided or data.


Numerical variables are summarised by the mean; Boolean variables are summarised by the mean as well, thus they become frequency indicators. Qualitative values, if included, are summarised by the mode. Summary measures do not include NAs. The output dataframes are also detailed at the school order level (i.e. Primary, Midde, High school, or different orders). This means that rows are unique combinations of territorial unities and school order.


An object of class list including:

  • $Municipality_data: object of class tbl_df, tbl and data.frame, the output dataframe detailed at the municipality level; all variables besides the first 5 (which identify the record) are numeric

  • $Province_data: object of class 'tbl_df', 'tbl' and 'data.frame', the output dataframe detailad at the province level; all variables besides the first 3 (which identify the record) are numeric

  • $Municipality_missing (Only if count_missing == TRUE); object of class tbl_df, tbl and data.frame, the percentage of NAs in each variable at the municipality level.

  • $Province_missing: (Only if count_missing == TRUE); object of class 'tbl_df', 'tbl' and 'data.frame', the percentage of NAs in each variable at the province level.

  • $deleted: character vector. The schools removed from the original dataframe for data quality reasons. This object is returned only if track_deleted == TRUE


DB23_MIUR <- example_input_DB23_MIUR %>% Util_DB_MIUR_num(verbose = FALSE) %>%
    Group_DB_MIUR(InnerAreas = FALSE)

DB23_MIUR$Municipality_data[, -c(1,2,4)]

DB23_MIUR$Province_data[, -c(1,3)]

Aggregate the students number data by class at the municipality and province level


This function creates two dataframes with the number of students, classes and students by class, aggregated at the province and municipality level


  data = NULL,
  Year = 2023,
  check = TRUE,
  verbose = TRUE,
  check_registry = "Any",
  InnerAreas = TRUE,
  ord_InnerAreas = FALSE,
  check_ggplot = FALSE,
  missing_to_1 = FALSE,
  input_Registry = NULL,
  input_InnerAreas = NULL,
  input_Prov_shp = NULL,
  input_School2mun = NULL,
  input_AdmUnNames = NULL,
  autoAbort = FALSE,



Either an object of class list, obtained as output of the Get_nstud function, or an object of class class tbl_df, tbl and data.frame, obtained as output of the Util_nstud_wide function, if NULL, the function will download it automatically but it will not be saved in the global environment. NULL by default.


Numeric or character value. The reference school year, if either of the input_ arguments must be retrieved. Available in the formats: 2022, "2022/2023", "202223", "20222023". 2023 by default


Logical. If TRUE, the function runs the test of the students number availability across all school included in the school registries (see Util_Check_nstud_availability). TRUE by default


Logical. If TRUE, the user keeps track of the main underlying operations. TRUE by default.


Character. If check == TRUE, the school registries included in the input_School2mun object (see Get_School2mun) whose availability has to be checked. Either "Registry_from_buildings" (buildings section), "Registry_from_registry" (registry section), "Any" or "Both". "Any" by default.


Logical. If check == TRUE, Whether it must be checked if municipalities belong to Inner areas or not. TRUE by default.


Logical. If check == TRUE and InnerAreas == TRUE, whether the Inner areas classification should be treated as an ordinal variable rather than as a categorical one (see Get_InnerAreas for the classification). FALSE by default.


Logical. If check == TRUE, whether to display or not a static map of the availability of the students number by province; see also Util_Check_nstud_availability. TRUE by default.


Logical. Only needed if data is not provided in wide format. Whether the number of classes should be imputed to 1 when it is missing; see Util_nstud_wide. FALSE by default.


Object of class tbl_df, tbl and data.frame, obtained as output of the function Get_Registry If check == TRUE, the school registry (the properly said one, from the registry section). If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default


Object of class tbl_df, tbl and data.frame. The classification of peripheral municipalities, obtained as output of the Get_InnerAreas function. Needed only if check == TRUE and InnerAreas == TRUE. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default


Object of class sf, tbl_df, tbl, data.frame. The relevant shapefile of Italian municipalities, if both the check and check_ggplot options are chosen. If NULL it is downloaded automatically but not saved in the global environment. NULL by default.


Object of class list with elements of class tbl_df, tbl and data.frame, obtained as output of the function Get_School2mun. The mapping from school codes to municipality (and province) codes. Needed only if 'check == TRUE'. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default.


Object of class tbl_df, tbl and data.frame, obtained as output of the function Get_AdmUnNames The ISTAT file including all the codes and the names of the administrative units for the year in scope. Only needed if check == TRUE and the argument input_School2mun is NULL. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default.


Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.


Additional arguments to the function Util_nstud_wide if data is not provided.


Numerical variables are summarised by the mean; Boolean variables are summarised by the mean as well, thus they become frequency indicators. Qualitative values, if included, are summarised by the mode. Summary measures do not include NAs.


An object of class list including:

  • $Municipality_data: object of class tbl_df, tbl and data.frame, the output dataframe detailed at the municipality level

  • $Province_data: object of class 'tbl_df', 'tbl' and 'data.frame', the output dataframe detailad at the province level


Year <- 2023

nstud23_aggr <- Group_nstud(data = example_input_nstud23, Year = Year,
                           input_Registry = example_input_Registry23,
                           InnerAreas = FALSE,
                           input_School2mun = example_School2mun23)



Arrange the number of teachers per students in public Italian schools at the province level


This function provides the average number of teachers per students in Italian public schools at the province level.


  Year = 2023,
  input_nteachers = NULL,
  nteachers_filename = c("DOCTIT", "DOCSUP"),
  verbose = TRUE,
  input_nstud_raw = NULL,
  input_nstud_aggr = NULL,
  autoAbort = FALSE,



Numeric or character value. Reference school year for the school registry data (last available is 2022). Available in the formats: 2022, "2021/2022", 202122, 20212022. 2023 by default


Object of class tbl_df, tbl and data.frame. The number of teachers by province, obtained as output of the function Get_nteachers_prov. If NULL, the function will download it automatically but it will not be saved in the global environment. NULL by default.


Character. If input_nteachers is not provided, which data to retrieve regarding the number of teachers/personnel; see Get_nteachers_prov c("DOCTIT", "DOCSUP") by default, i.e. tenured theachers and temporary teachers.


Logical. If TRUE, the user keeps track of the main underlying operations. TRUE by default.


Object of class 'list', including two objects of class tbl_df', tbl and data.frame, obtainded as output of the Get_nstud function with the default filename parameter. Not necessary if the argument input_nstud_aggr is provided. If NULL, the function will download it automatically but it will not be saved in the global environment. NULL by default.


Object of class list, including two objects of class tbl_df, tbl and data.frame, obtained as output of the function Group_nstud. If NULL, the function will compute it manually but it will not be saved in the global environment. NULL by default.


Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.


Arguments to Group_nstud if argument input_nstud_aggr is not provided


An object of class tbl_df, tbl and data.frame


input_nstud23 <- Get_nstud(2023, filename ="ALUCORSOINDCLASTA", autoAbort = TRUE)
  Registry23 <- Get_Registry(2023, autoAbort = TRUE)
  School2mun23 <- Get_School2mun(2023, input_Registry = Registry23, autoAbort = TRUE)

  nstud23.aggr <- Group_nstud(Year = 2023, data = input_nstud23,
    input_Registry = Registry23, input_School2mun = School2mun23,
    autoAbort = TRUE)

  input_nteachers23 <- Get_nteachers_prov(2023, autoAbort = TRUE)

  teachers4stud <- Group_teachers4stud(Year = 2023,
                   input_nteachers = input_nteachers23,
                   input_nstud_aggr = nstud23.aggr, autoAbort = TRUE)

  teachers4stud[, -c(1, 2, 10, 11)]


Map school data


This function displays a map of the data arranged trough the function Set_DB. It supports two kinds of map:

  • Interactive map (default option), which allows the user to visualize all the data in scope through the interactive popup, and

  • Static map (ggplot), which can be easily exported in .pdf objects.

The user must select a variable to display. It is possible to insert either a readily-downloaded database obtained through the function Set_DB or the basic inputs to plug in that function, other than an input shapefile. Relevant arguments not provided by the user will be download automatically, but not saved into the global environment. However we suggest to plug in at least some inputs, as otherwise the running time may be long. This function generalises the functionalities of the more data-specific functions Map_School_Buildings and Map_Invalsi.


  data = NULL,
  Year = 2023,
  level = "LAU",
  plot = "mapview",
  popup_height = 200,
  col_rev = FALSE,
  pal = "viridis",
  input_shp = NULL,
  region_code = c(1:20),
  main_pos = "top",
  main = "",
  order = NULL,
  autoAbort = FALSE,
  only_observed = FALSE,



Object of class tbl.df, tbl and data.frame, obtained as output of the Set_DB function. If NULL, it will be arranged automatically but not saved into the global environment. NULL by default.


Numeric or Character. The reference school year, needed if either data or input_shp are not provided. Available in the formats: 2023, "2022/2023", 202223, 20222023. 2023 by default.


Character. The variable to display in the map.


Character. The administrative level of detailed at which the target variable must be displayed. Either "LAU"/"Municipality" or "NUTS-3"/"Province". If the "data" argument is plugged in, please select the same level. "LAU" by default.


Character. The type of map to display; either "mapview" for interactive maps, or "ggplot" for static maps. "mapview" by default.


Numeric. The height of the popup table in terms of pixels if the "mapview" mode is chosen. 200 by default.


Logical. Whether the scale of the colour palette should be reverted or not. FALSE by default.


Character. The palette to use if the "mapview" mode is chose. "viridis" by default.


Object of class sf, tbl.df, tbl and data.frame. The relevant shapefiles of Italian administrative boundaries, at the selected level of detail (LAU or NUTS-3). If NULL, it is downloaded automatically but not saved in the global environment. NULL by default.


Numeric. The NUTS-2 codes of the units that must be displayed. If the level is set to "LAU", choosing a limited number of regions is recommended. By default, c(1,3,5:20), i.e. all Italian regions except the provinces of Aosta, Trento and Bozen which have data availability issues.


Character.Where the header should be placed if the ggplot mode is chosen. The header is located on the top if "top" is given as input, and above the legend scale otherwise. "top" by default.


Character. The title to display in the "ggplot" rendering options.


Character. The educational level. Either "Primary" (primary school), "Middle" (middle school), or "High" (high school). If the data include the Invalsi census survey, please select a level consistent with the chosen educational grade. "Media" by default.


Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.


Logical. Whether to remove unobserved areas from the plot. FALSE by default.


Additional arguments for the input database, if not provided; see Set_DB


If plot == "mapview", an object of class mapview. Otherwise, if plot == "ggplot", an object of class gg and ggplot.


DB23 <- Set_DB(Year = 2023, level = "NUTS-3",
       Invalsi_grade = c(10,13), NA_autoRM = TRUE,
       input_Invalsi_IS = example_Invalsi23_prov, input_nstud = example_input_nstud23,
       input_InnerAreas = example_InnerAreas,
       input_School2mun = example_School2mun23,
       input_AdmUnNames = example_AdmUnNames20220630,
       nteachers = FALSE, BroadBand = FALSE, SchoolBuildings = FALSE)

Map_DB(DB23, field = "Students_per_class_13", input_shp = example_Prov22_shp, level = "NUTS-3",
 col_rev = TRUE, plot = "ggplot")

Map_DB(DB23, field = "Inner_area", input_shp = example_Prov22_shp, order = "High",
 level = "NUTS-3",col_rev = TRUE, plot = "ggplot")

Map_DB(DB23, field = "M_Mathematics_10", input_shp = example_Prov22_shp, level = "NUTS-3",
 plot = "ggplot")

Display a map of Invalsi scores


This function displays either a static or interactive map of the Invalsi scores, either at the municipality or province level. It supports two kinds of map:

  • Interactive map (default option), which allows the user to visualize all the data in scope through the interactive popup, and

  • Static map (ggplot), which can be easily exported in .pdf objects.


  data = NULL,
  Year = 2023,
  subj_toplot = "ITA",
  grade = 8,
  level = "LAU",
  main = "",
  main_pos = "top",
  region_code = c(1:20),
  plot = "mapview",
  pal = "viridis",
  col_rev = FALSE,
  popup_height = 200,
  only_observed = FALSE,
  verbose = TRUE,
  input_shp = NULL,
  autoAbort = FALSE



Object of class tbl_df, tbl and data.frame. The raw Invalsi survey data that has to be filtered, obtained as output of the Get_Invalsi_IS function. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default


Numeric or character value. Reference school year for the data (last available is 2022/23). Available in the formats: 2022, "2021/2022", 202122, 20212022. 2022 by default


Character. The school subject to display in the map, The school subject to include, one among: "Englis_listening"/"ELI", "English_reading"/"ERE", "Italian"/"ITA" and "Mathematics"/"MAT". "ITA" (Italian) by default.


Numeric. The school grade to chose. Either 2 (2nd year of primary school), 5 (last year of primary school), 8 (last year of middle shcool), 10 (2nd year of high school) or 13 (last year of school). 8 by default


Character. The level of aggregation of Invalsi census data. Either "NUTS-3", "Province", "LAU", "Municipality". If an input dataframe is provided, please select the same level of aggregation. "LAU" by default


Character. A customary title to the map. If NULL, the title will mention: subject, year and school grade. Empty by default.


Character.Where the header should be placed if the ggplot mode is chosen. The header is located on the top if "top" is given as input, and above the legend scale otherwise. "top" by default.


Numeric. The NUTS-2 codes of the units that must be displayed. If the level is set to "LAU", choosing a limited number of regions is recommended. By default, c(1,3,5:20), i.e. all Italian regions except the provinces of Aosta, Trento and Bozen which have data availability issues.


Character. The type of map to display; either "mapview" for interactive maps, or "ggplot" for static maps. "mapview" by default.


Character. The palette to use if the "mapview" mode is chose. "viridis" by default.


Logical. Whether the variable to chose should be the average WLE score rather that the percentage of sufficient tests, if both are available. FALSE by default


Logical. Whether the scale of the colour palette should be reverted or not, if the mapview mode is chosen. FALSE by default


Numeric. The height of the popup table in terms of pixels if the "mapview" mode is chosen. 200 by default.


Logical. Whether to remove unobserved areas from the plot. FALSE by default.


Logical. If TRUE, the user keeps track of the main underlying operations. TRUE by default.


Object of class sf, tbl_df, tbl, data.frame. The relevant shapefiles of Italian administrative boudaries, at the selected level of detail (LAU or NUTS-3). If NULL, it is downloaded automatically but not saved in the global environment. NULL by default.


Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.


If plot == "mapview", an object of class mapview. Otherwise, if plot == "ggplot", an object of class gg and ggplot.


Map_Invalsi(subj = "Italian", grade = 13, level = "NUTS-3", Year = 2023, WLE = FALSE,
  data = example_Invalsi23_prov, input_shp = example_Prov22_shp, plot = "ggplot")

 Map_Invalsi(subj = "Italian", grade = 5, level = "NUTS-3", Year = 2023, WLE = TRUE,
  data = example_Invalsi23_prov, input_shp = example_Prov22_shp, plot = "ggplot")

Display data fom the school buildings database


This function displays a map of the data downloaded trough the Get_DB_MIUR function. It supports two kinds of map:

  • Interactive map (default option), which allows the user to visualize all the data in scope through the interactive popup, and

  • Static map (ggplot), which can be easily exported in .pdf objects.


  data = NULL,
  order = NULL,
  level = "LAU",
  region_code = c(1:20),
  plot = "mapview",
  pal = "viridis",
  col_rev = FALSE,
  popup_height = 200,
  main_pos = "top",
  main = "",
  only_observed = FALSE,
  verbose = TRUE,
  input_shp = NULL,
  autoAbort = FALSE,



Object of class list or tbl_df, tbl and data.frame. Input data obtained as output of the function Group_DB_MIUR If NULL, it will be downloaded automatically but not saved in the global environment. NULL by default.


Character. The variable to display in the map.


Character. The school order. Either "Primary", "Middle", or "High" (high school). If NULL, an average of the three school orders will be displayed for the target variable. NULL by default.


Character. The administrative level of detailed at which the target variable must be displayed. Either "LAU"/"Municipality" or "NUTS-3"/"Province". "LAU" by default.


Numeric. The NUTS-2 codes of the units that must be displayed. If the level is set to "LAU", choosing a limited number of regions is recommended. By default, c(1:20), i.e. all Italian regions.


Character. The type of map to display; either "mapview" for interactive maps, or "ggplot" for static maps. "mapview" by default.


Character. The palette to use if the "mapview" mode is chose. "viridis" by default.


Logical. Whether the scale of the colour palette should be reverted or not, if the "mapview" mode is chosen. FALSE by default


Numeric. The height of the popup table in terms of pixels if the "mapview" mode is chosen. 200 by default.


Character. Where the header should be placed if the ggplot mode is chosen. The header is located on the top if "top" is given as input, and above the legend scale otherwise. "top" by default.


Character. The customary title to display in the "ggplot" rendering options


Logical. Whether to remove unobserved areas from the plot. FALSE by default.


Logical. If TRUE, the user keeps track of the main underlying operations. TRUE by default.s


Object of class sf, tbl_df, tbl, data.frame. The relevant shapefiles of Italian administrative boudaries, at the selected level of detail (LAU or NUTS-3). If NULL it is downloaded automatically but not saved in the global environment. NULL by default.


Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.


If data is not provided, the arguments to Group_DB_MIUR.


If plot == "mapview", an object of class mapview. Otherwise, if plot == "ggplot", an object of class gg and ggplot.



  DB23_MIUR <- example_input_DB23_MIUR %>%
    Util_DB_MIUR_num(track.deleted = FALSE) %>%
    Group_DB_MIUR(InnerAreas = FALSE, count_missing = FALSE)

  DB23_MIUR %>% Map_School_Buildings(field = "School_bus",
     order = "Primary",level = "NUTS-3",  plot = "ggplot",
     input_shp = example_Prov22_shp)

  DB23_MIUR %>% Map_School_Buildings(field = "Railway_transport",
     order = "High",level = "NUTS-3", plot = "ggplot",
     input_shp = example_Prov22_shp)

  DB23_MIUR %>% Map_School_Buildings(field = "Context_without_disturbances",
     order = "Middle",level = "NUTS-3", plot = "ggplot",
     input_shp = example_Prov22_shp, col_rev = TRUE)

Build up a comprehensive database regarding the school system


This function generates a unique dataframe of the school system data including a customary choice of available datasets. This function allows the user to aggregate the desired datasets, when available, among these:

  • Invalsi census survey

  • School buildings

  • Number of students and school classes

  • Number of teachers

  • Broadband connection availability

To save as much time as possible it is possible to plug in ready-made input data; otherwise they will be downloaded automatically but not saved in the global environment When a new dataset is joined to the existing ones, it is possible that some observations in this datasets are missing. In this case, by default, the choice of keeping as much observational units as possible, or to remove units with missing variables is left to the user.


  Year = 2023,
  level = "LAU",
  conservative = TRUE,
  Invalsi = TRUE,
  SchoolBuildings = TRUE,
  nstud = TRUE,
  nteachers = TRUE,
  BroadBand = TRUE,
  verbose = TRUE,
  show_col_types = FALSE,
  Invalsi_subj = c("ELI", "ERE", "ITA", "MAT"),
  Invalsi_grade = c(2, 5, 8, 10, 13),
  Invalsi_WLE = FALSE,
  SchoolBuildings_certifications = FALSE,
  SchoolBuildings_include_numerics = TRUE,
  SchoolBuildings_include_qualitatives = FALSE,
  SchoolBuildings_row_cutout = FALSE,
  SchoolBuildings_col_cut_thresh = 20000,
  SchoolBuildings_flag_outliers = TRUE,
  SchoolBuildings_count_missing = FALSE,
  nstud_imputation_thresh = 19,
  nstud_missing_to_1 = FALSE,
  UB_nstud_byclass = 99,
  LB_nstud_byclass = 1,
  InnerAreas = TRUE,
  ord_InnerAreas = FALSE,
  nstud_check = TRUE,
  nstud_check_registry = "Any",
  BroadBand_impute_missing = TRUE,
  Date = as.Date(paste0(substr(year.patternA(Year), 1, 4), "-09-01")),
  NA_autoRM = NULL,
  input_Invalsi_IS = NULL,
  input_Registry = NULL,
  input_SchoolBuildings = NULL,
  input_nstud = NULL,
  input_School2mun = NULL,
  input_AdmUnNames = NULL,
  input_InnerAreas = NULL,
  input_teachers4student = NULL,
  input_nteachers = NULL,
  input_BroadBand = NULL,
  autoAbort = FALSE



Numeric or Character. The relevant school year. Available in the formats: 2023, "2022/2023", 202223, 20222023. Important: if input datasets are plugged in, please select the same Year argument used to download the input data. 2023 by default.


Character. The administrative level of detail at which data must be aggregated. Either "LAU"/"Municipality" or "NUTS-3"/"Province". "LAU" by default.


Logical. If FALSE, only the schools included in all the datasets are taken as input. TRUE by default.


Logical. Whether the Invalsi census data must be included (see Get_Invalsi_IS. TRUE by default.


Logical. Whether the school buildings dataset must be included (see link{Get_DB_MIUR}, Util_DB_MIUR_num. TRUE by default.


Logical. Whether the students number per class must be included (see Get_nstud. TRUE by default.


Logical. Whether the number of teachers by province must be included (see link{Get_nteachers_prov}). TRUE by default.


Logical. Whether the broadband availability in schools must be included (see Get_BroadBand). TRUE by default


Logical. If TRUE, the user keeps track of the main underlying operations. TRUE by default.


Logical. If TRUE, if the verbose argument is also TRUE, the columns of the raw dataset are shown during the download. FALSE by default.


Character. If Invalsi == TRUE, the school subject(s) to include, among "Englis_listening"/"ELI", "English_reading"/"ERE", "Italian"/"Ita" and "Mathematics"/"MAT". All four by default.


Numeric. If Invalsi == TRUE, the educational grade to choose. Either 2 (2nd year of primary school), 5 (last year of primary school), 8 (last year of middle shcool), 10 (2nd year of high school) or 13 (last year of school). All by default.


Logical. Whether to express Invalsi scores as averagev WLE score rather that the percentage of sufficient tests, if both are Invalsi_grade is either or 2 5. FALSE by default


Logical. If the school buldings database has to be downloaded, whether to include safety certifications. Only relevant from schol year 2020/21 onwards (see Get_DB_MIUR). FALSE by default


Logical. Whether to include strictly numeric variables alongside with Boolean ones in the school buildings database (see Util_DB_MIUR_num). TRUE by default.


Logical. Whether to include qualitative variables alongside with Boolean ones in the school buildings database (see Util_DB_MIUR_num). FALSE by default.


Logical. Whether to filter out rows including missing fields in the school buildings database (see Util_DB_MIUR_num). FALSE by default.


Numeric. The threshold of missing values allowed for each variable in the school buildings database (see Util_DB_MIUR_num). If a variable as a higher number of missing observations, then it is cut out. 20.000 by default. Warning: if the option SchoolBuildings_row_cutout is active, please select a lower threshold (e.g. 1000)


Logical. Whether to assign NA to outliers in numeric variables; see Util_DB_MIUR_num for more details. TRUE by default.


Logical. Whether the function should return the percentage of NAs in the input school buildings database (see also Group_DB_MIUR). FALSE by default.


Numeric. If nstud_missing_to_1 == TRUE, the minimum threshold below which the number of classes is imputed to 1 if missing; see also Util_nstud_wide. 19 by default.


Numeric. If nstud == TRUE, whether the number of classes should be imputed to 1 when it is missing and the number of students is below a threshold (argument nstud_imputation_thresh, see Util_nstud_wide). FALSE by default.


Numeric. The upper limit of the acceptable school-level average of the number of students by class if nstud == TRUE; see also Util_nstud_wide. 99 by default, i.e. no restriction is made. Please notice that boundaries are included in the acceptance interval.


Numeric. The lower limit of the acceptable school-level average of the number of students by class if nstud == TRUE; see also Util_nstud_wide. 1 by default. Please notice that boundaries are included in the acceptance interval.


Logical. Whether the percentage of schools belonging to inner/internal areas must be included (see Get_InnerAreas). TRUE by default.


Logical. If check == TRUE and InnerAreas == TRUE, whether the Inner areas classification should be treated as an ordinal variable rather than as a categorical one (see Get_InnerAreas for the classification). FALSE by default.


Logical. If nstud == TRUE, whether to check the students number availability across all school included in the school registries (see Util_Check_nstud_availability). TRUE by default.


Character. If nstud == TRUE and nstud_check == TRUE, the school registries whose availability has to be checked. Either "Registry_from_buildings" (buildings registry), "Registry_from_registry" (proper registry), "Any" or "Both". "Any" by default.


Whether the schools not included in the Broadband dataset must be considered in the total of schools (i.e. the denominator to the Broadband availability indicator). TRUE by default.


Character or Date. The threshold date to broadband activation to consider it activated for a school, i.e. the date before which the works of broadband activation must be finished in order to consider a school as provided with the broadband. By default, September 1st at the beginning of the school year.


Logical. Either TRUE, FALSE or NULL. If TRUE, the values missing in a single dataset are automatically deleted from the final DB. If FALSE, the missing observations are kept automatically. If NULL, the choice is left to the user by an interactive menu. NULL by default.


Object of class tbl_df, tbl and data.frame. If INVALSI == TRUE, the raw Invalsi survey data, obtained as output of the Get_Invalsi_IS function. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default


Object of class tbl_df, tbl and data.frame. The school registry corresponding to the year in scope, obtained as output of the function Get_Registry. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default


Object of class tbl_df, tbl and data.frame. If SchoolBuildings == TRUE, the raw school buildings dataset obtained as output of the function Get_DB_MIUR. If NULL, it will be downloaded automatically but not saved in the global environment. NULL by default.


Object of class list, including two objects of classtbl_df, tbl and data.frame. If nstud == TRUE, the students and classes counts, obtained as output of the function Get_nstud with default filename parameter. If NULL, the function will download it automatically but it will not be saved in the global environment. NULL by default.


Object of class list with elements of class tbl_df, tbl and data.frame If nstud == TRUE, the mapping from school codes to municipality (and province) codes. Needed only if check == TRUE, obtained as output of the function Get_School2mun. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default.


Object of class tbl_df, tbl and data.frame, obtained as output of the function Get_AdmUnNames If necessary,the ISTAT file including all the codes and the names of the administrative units for the year in scope. Required either if nstud == TRUE & nstud_check == TRUE, or if SchoolBuildings == TRUE, input_DB_MIUR is not provided, and the school year is one of 2015/16, 2017/18 or 1018/19 If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default.


Object of class tbl_df, tbl and data.frame. If InnerAreas == TRUE, the classification of peripheral municipalities, obtained as output of the function Get_InnerAreas If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default


Object of class tbl_df, tbl and data.frame. If nteachers == TRUE and nstud = TRUE, the number of teachers for studets by province. Please notice that this object cannot be considered a substitute for the number of students by class since it provides no information on the number of schools in single educational grades but only at the school order level. Obtained as output of the function Group_teachers4stud. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default.


Object of class tbl_df, tbl and data.frame. If nteachers == TRUE, the number of teachers by province, obtained as output of the function Get_nteachers_prov. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default


Object of classs tbl_df, tbl and data.frame. If BroadBand == TRUE, the raw Broadband connection dataset obtaned as output of the function Get_BroadBand If NULL, it will be downloaded automatically but not saved in the global environment. NULL by default.


Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.


An object of class tbl_df, tbl and data.frame

See Also

Util_DB_MIUR_num, Group_DB_MIUR, Group_nstud, Util_Check_nstud_availability, Get_School2mun for similar arguments.


DB23_prov <- Set_DB(Year = 2023, level = "NUTS-3",Invalsi_grade = c(5, 8, 13),
      Invalsi_subj = "Italian",nteachers = FALSE, BroadBand = FALSE,
      SchoolBuildings_count_missing = FALSE,NA_autoRM= TRUE,
      input_SchoolBuildings = example_input_DB23_MIUR[, -c(11:18, 10:27)],
      input_Invalsi_IS = example_Invalsi23_prov,
      input_nstud = example_input_nstud23,
      input_InnerAreas = example_InnerAreas,
      input_School2mun = example_School2mun23,
      input_AdmUnNames = example_AdmUnNames20220630)


summary(DB23_prov[, -c(22:62)])

Check how many schools in the school registries are included in the students count dataframe


This function checks for which schools listed in the two registries (the buildings registry and the properly said schools registry) the count of students is available. The first registry is referred to as as Registry_from_buildings and the second one as Registry_from_registry.


  cutout = c("IC", "IS", "NR"),
  verbose = TRUE,
  ggplot = TRUE,
  toplot_registry = "Any",
  InnerAreas = TRUE,
  ord_InnerAreas = FALSE,
  input_Registry = NULL,
  input_InnerAreas = NULL,
  input_Prov_shp = NULL,
  input_AdmUnNames = NULL,
  input_School2mun = NULL,
  autoAbort = FALSE



Object of class tbl_df, tbl and data.frame, obtained as output of the Util_nstud_wide function


Numeric or character value. Reference school year. Available in the formats: 2023, "2022/2023", 202223, 20222023.


Character. The types of schools not to be taken into account (because not relevant or because they are out of scope in the students number section). By default c("IC", "IS", "NR") , i.e. the check does not regard comprehensive institutes, superior institutes, and all the schools that cannot be classified either as primary, middle or high schools.


Logical. If TRUE, the user keeps track of the main underlying operations. TRUE by default.


Logical. If TRUE, the function displays a static map of the availability of the students number by province (but it does not save the ggplot object into the global environment). TRUE by default.


Character. If the ggplot option is chosen, the students number availability of which registry must be plotted; either "Registry_from_buildings", "Registry_from_registry", "Any" or "Both". "Any" by default.


Logical. Whether it must be checked if municipalities belong to inner areas or not. TRUE by default.


Logical. Whether the inner areas classification should be treated as an ordinal variable rather than as a categorical one (see Get_InnerAreas for the classification). FALSE by default.


Object of class tbl_df, tbl and data.frame, obtained as output of the function Get_Registry The school registry from the registry section. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default


Object of class tbl_df, tbl and data.frame. The classification of peripheral municipalities, obtained as output of the Get_InnerAreas function. Needed only if the InnerAreas option is chosen. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default


Object of class sf, tbl_df, tbl, data.frame. The relevant shapefile of Italian municipalities, if the ggplot option is chosen. If NULL it is downloaded automatically but not saved in the global environment. NULL by default.


Object of class tbl_df, tbl and data.frame, obtained as output of the function Get_AdmUnNames The ISTAT file including all the codes and the names of the administrative units for the year in scope. Only needed if the argument input_School2mun is NULL and has to be computed. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default.


Object of class list with elements of class tbl_df, tbl and data.frame, obtained as output of the function Get_School2mun. The mapping from school codes to municipality (and province) codes. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default.


Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.


An object of class list including two elements:

  • $Municipality_data

  • $Province_data

Both the elements are objects of class list including four elements:

  • $Registry_from_buildings: object of class of class tbl_df, tbl and data.frame: the availability of the number of students in the schools listed in the buildings section.

  • $Registry_from_registry: object of class of class tbl_df, tbl and data.frame: the availability of the number of students in the schools listed in the registry section.

  • $Any: object of class of class tbl_df, tbl and data.frame: the availability of the number of students in the schools listed anywhere.

  • $Both: object of class of class tbl_df, tbl and data.frame: the availability of the number of students in the schools listed in both sections.


Buildings Registry; Schools Registry


nstud23 <- Util_nstud_wide(example_input_nstud23, verbose = FALSE)

Util_Check_nstud_availability(nstud23, Year = 2023,
  input_Registry = example_input_Registry23, InnerAreas = FALSE,
  input_School2mun = example_School2mun23, input_Prov_shp = example_Prov22_shp)

Convert the raw school buildings data to numeric or Boolean variables


This function transforms the output variables of the Get_DB_MIUR into Boolean or Numeric. Additionally, it removes the columns with an excessive number of missing observations (20.000 by default), and if required it may also delete the rows including missing fields. In this case, it is possible to keep track of the deleted rows.


  data = NULL,
  include_numerics = TRUE,
  include_qualitatives = FALSE,
  row_cutout = FALSE,
  track_deleted = TRUE,
  verbose = TRUE,
  col_cut_thresh = 20000,
  flag_outliers = TRUE,
  autoAbort = FALSE,



Object of class tbl_df, tbl and data.frame. Input data obtained through the function Get_DB_MIUR. If NULL it will be downloaded automatically with the appropriate arguments, but not saved in the global environment. NULL by default.


Logical. Whether to include strictly numeric variables alongside with Boolean ones. TRUE by default.


Logical. Whether to include qualitative variables alongside with Boolean ones. FALSE by default.


Logical. Whether to filter out rows including missing fields. FALSE by default.


Logical. If TRUE, the function returns the names of the school not included in the output dataframe. TRUE by default.


Logical. If TRUE, the user keeps track of the main underlying operations. TRUE by default.


Numeric. The threshold of missing values allowed for each variable. If a variable as a higher number of missing observations, then it is cut out. 20.000 by default. Warning: if the option row_cutout is active, please select a lower threshold (e.g. 1000)


Logical. Whether to assign NA to outliers in numeric variables. TRUE by default.


Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.


Additional arguments to the function Get_DB_MIUR if data is not provided.


The outliers to be set to NA if flag_outliers is active are defined as follows: School area or free area surface of less than 50 squared meters, building volume of less than 150 cubic meters, 0 floors in the building.


If track_deleted == TRUE, An object of class list including two objects:

  • $data: object of class tbl_df, tbl and data.frame, the output dataframe.

  • $deleted: object of class tbl_df, tbl and data.frame. The school IDs of the deleted units.

If track_deleted == FALSE, the output is only the first element of the list.



DB23_MIUR_num <- example_input_DB23_MIUR %>% Util_DB_MIUR_num(track_deleted = FALSE)

DB23_MIUR_num[, -c(1,4,6,8,9,10)]

Filter the Invalsi data by subject, school grade and year.


This function filters the database of Invalsi scores (see Get_Invalsi_IS) by school year, education grade and subject and returns a dataframe in wide format. Each row corresponds to one territorial unit (either municipality or province); the numerical variables are three (the mean score, the score's standard deviation and the students coverage percentage) for each selected subject.


  data = NULL,
  subj = c("ELI", "ERE", "ITA", "MAT"),
  grade = 8,
  level = "LAU",
  Year = 2023,
  verbose = TRUE,
  autoAbort = FALSE



Object of class tbl_df, tbl and data.frame. The raw Invalsi survey data that has to be filtered, obtained as output of the Get_Invalsi_IS function. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default


Character. The school subject(s) to include, among "Englis_listening"/"ELI", "English_reading"/"ERE", "Italian"/"ITA" and "Mathematics"/"MAT". All four by default.


Numeric. The school grade to chose. Either 2 (2nd year of primary school), 5 (last year of primary school), 8 (last year of middle shcool), 10 (2nd year of high school) or 13 (last year of school). 8 by default


Character. The level of aggregation of Invalsi census data. Either "NUTS-3", "Province", "LAU", "Municipality". If an input dataframe is provided, please select the same level of aggregation. "LAU" by default


Logical. Whether the variable to choose should be the average WLE score rather that the percentage of sufficient tests, if both are available. FALSE by default


Numeric or character value. Reference school year for the data (last available is 2022/23). Available in the formats: 2022, "2021/2022", 202122, 20212022. 2023 by default


Logical. If TRUE, the function informs about the time needed. TRUE by default.


Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.


An object of class tbl_df, tbl and data.frame. For all subjects and school grades, the variables indicate:

  • M The mean score, either WLE or percentage of sufficient tests

  • S The standard deviation of the score

  • C The students coverage percentage (expressed in the scale 1 - 100)


Util_Invalsi_filter(subj = c("Italian", "Mathematics"), grade = 5, level = "NUTS-3", Year = 2023,
                   WLE = FALSE, data = example_Invalsi23_prov)

Util_Invalsi_filter(subj = c("Italian", "Mathematics"), grade = 5, level = "NUTS-3", Year = 2023,
                    WLE = TRUE, data = example_Invalsi23_prov)

Invalsi23_high <- Util_Invalsi_filter(subj = "Italian", grade = c(10,13), level = "NUTS-3",
                                      Year = 2023, data = example_Invalsi23_prov)


Clean the raw dataframe of the number of students and arrange it in a wide format


This function rearranges the output of the Get_nstud function in such a way to represent the counts of students and, if required, either the number of students by class and number of classes, or the counts of students per school timetable (running time) in a unique observation per school. If the focus is on class size, this function firstly cleans the data from the outliers in terms of average number of students by class at the school level and imputates the number of classes to 1 when missing.


  data = NULL,
  missing_to_1 = FALSE,
  nstud_imputation_thresh = 19,
  UB_nstud_byclass = 99,
  LB_nstud_byclass = 1,
  verbose = TRUE,
  autoAbort = FALSE,



Object of class list, including two objects of class tbl_df, tbl and data.frame, obtainded as output of the Get_nstud function with the default filename parameter. If NULL, the function will download it automatically but it will not be saved in the global environment. NULL by default.


Logical. If focus is on class size, whether the number of classes should be imputed to 1 when it is missing and the number of students is below a threshold (argument nstud_imputation_thresh). TRUE by default.


Numeric. If focus is on class size, the minimum threshold below which the number of classes is imputed to 1 if missing, if missing_to_1 == TRUE. E.g. if the threshold is 19, for all the schools in which there are 19 or less students in a given grade but the number of classes for that grade is missing, the number of classes is imputated to 1. 19 by default.


Numeric. If focus is on class size, the upper limit of the acceptable school-level average of the number of students by class. If a school has, on average, a higher number of students by class, the record is considered an outlier and filtered out. 99 by default, i.e. no restriction is made. Please notice that boundaries are included in the acceptance interval.


Numeric. If focus is on class size, the lower limit of the acceptable school-level average of the number of students by class. If a school has, on average, a smaller number of students by class, the record is considered an outlier and filtered out. 1 by default. Please notice that boundaries are included in the acceptance interval.


Logical. If TRUE, the user keeps track of the main underlying operations. TRUE by default.


Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.


Arguments to Get_nstud, needed if data is not provided.


In the example, we compare the dataframe obtained with the default settings and the one imposed setting narrow inclusion criteria


An object of class tbl_df, tbl and data.frame


nstud.default <- Util_nstud_wide(example_input_nstud23)

nstud.narrow <- Util_nstud_wide(example_input_nstud23,
  UB_nstud_byclass = 35, LB_nstud_byclass = 5 )


