Smoke Exposure (PM2.5) Monthly (Naman Paul, Jiayun Yao, Kathleen E. McLean, David M. Stieb, Sarah B. Henderson, The Canadian Optimized Statistical Smoke Exposure Model (CanOSSEM): A machine learning approach to estimate national daily fine particulate matter (PM2.5) exposure, Science of The Total Environment, 2022.)

The Canadian Optimized Statistical Smoke Model (CanOSSEM) was developed by the Environmental Health Services of the BC Centre for Disease Control and used to produce estimated concentrations of fine particulate across all populated regions of Canada. The estimates are optimized for wildfire smoke through use of multiple variables that are specific to this source. NOTE: Daily data indexed to postal codes will be available shortly. Un-indexed grid files are available on request to naman.paul@bccdc.ca.CanOSSEM is a random forest machine learning model that uses potential predictor variables integrated from multiple data sources and estimates daily mean (24-hour) PM2.5 concentrations at a 5 km × 5 km spatial resolution. The training and prediction datasets were generated using observations from National Air Pollution Surveillance (NAPS) network. The Root Mean Squared Error (RMSE) between predicted and observed PM2.5 concentration was 2.85 µg/m3 for the entire prediction set, with over 95% of the predictions lying within an absolute difference of 5 µg/m3 from the NAPS PM2.5 measurements. The model was evaluated using 10-fold cross-validation, leave-one-region-out and leave-one-year-out cross-validation.

Datasets available for download

Additional Info

Field Value
Last Updated April 18, 2024, 17:10 (UTC)
Created September 18, 2023, 23:37 (UTC)
Domain / Topic
Domain or topic of the dataset being cataloged.
Environment
Format (CSV, XLS, TXT, PDF, etc)
File format of the dataset.
.pdf - application/pdf
Dataset Size
Dataset size in megabytes.
Metadata Identifier
Metadata identifier – can be used as the unique identifier for catalogue entry
Published Date
Published date of the dataset.
2021-03-21
Time Period Data Span (start date)
Start date of the data in the dataset.
2000-01-01
Time Period Data Span (end date)
End date of time data in the dataset.
2019-12-31
GeoSpatial Area Data Span
A spatial region or named place the dataset covers.
Canada
fair_rda_a1_02d Yes
fair_rda_a1_03d No
fair_rda_a1_04d No
fair_rda_a1_05d No
fair_rda_a1_1_01d Yes
fair_rda_a1_2_01d No
fair_rda_i1_01d No
fair_rda_i1_02d No
fair_rda_i2_01d No
fair_rda_i3_01d Yes
fair_rda_r1_3_01d No
Field Value
Access category
Type of access granted for the dataset (open, closed, service, etc).
visible
Limits on use
Limits on use of data.
These data files are provided solely for the purposes stated in the CANUE Data Sharing and Use Agreement and should not be re-distributed for any reason. These data also contain proprietary postal code data and may only be used for the project named in the CANUE Data Sharing and Use Agreement. Data can be shared only within a project team for the exclusive purposes of teaching, academic research and publishing, and/or planning of educational services in accordance to DMTI End User Agreement associated with the Spatial Mapping Academic Research Tools (SMART) Program.
Location
Location of the dataset.
https://www.canue.ca
Data Service
Data service for accessing a dataset.
Owner
Owner of the dataset.
CANUE (Canadian Urban Environmental Health Research Consortium)|Dalla Lana School of Public Health, University of Toronto
Contact Point
Who to contact regarding access?
Publisher
Publisher of the dataset.
Publisher Email
Email of the publisher.
Author
Author of the dataset.
CANUE (Canadian Urban Environmental Health Research Consortium)|Dalla Lana School of Public Health, University of Toronto
Author Email
Email of the author.
info@canue.ca
Accessed At
Date the data and metadata was accessed.
2023-09-18
Field Value
Identifier
Unique identifier for the dataset.
https://doi.org/10.1016/j.scitotenv.2022.157956
Language
Language(s) of the dataset
English
Link to dataset description
A URL to an external document describing the dataset.
Persistent Identifier
Data is identified by a persistent identifier.
No
Globally Unique Identifier
Data is identified by a persistent and globally unique identifier.
No
Contains data about individuals
Does the data hold data about individuals?
N/A
Contains data about identifiable individuals
Does the data hold identifiable data about individual?
N/A
Contains Indigenous Data
Does the data hold data about Indigenous communities?
N/A
Field Value
Source
Source of the dataset.
https://doi.org/10.1016/j.scitotenv.2022.157956
Version notes
Version notes about the dataset.
Is version of another dataset
Link to dataset that it is a version of.
Other versions
Link to datasets that are versions of it.
Provenance Text
Provenance Text of the data.
Provenance URL
Provenance URL of the data.
Temporal resolution
Describes how granular the date/time data in the dataset is.
Monthly
GeoSpatial resolution in meters
Describes how granular (in meters) geospatial data is in the dataset.
GeoSpatial resolution (in regions)
Describes how granular (in regions) geospatial data is in the dataset.
Field Value
Indigenous Community Permission
Who holds the Indigenous Community Permission. Who to contact regarding access to a dataset that has data about Indigenous communities.
Community Permission
Community permission (who gave permission).
The Indigenous communities the dataset is about
Indigenous communities from which data is derived.
Field Value
Number of data rows
If tabular dataset, total number of rows.
Number of data columns
If tabular dataset, total number of unique columns.
Number of data cells
If tabular dataset, total number of cells with data.
Number of data relations
If RDF dataset, total number of triples.
Number of entities
If RDF dataset, total number of entities.
Number of data properties
If RDF dataset, total number of unique properties used by the triples.
Data quality
Describes the quality of the data in the dataset.
NoData = -9999 (for numeric fields) - NoData=null (for category fields) - Data insufficient to calculate value = -1111
Metric for data quality
A metric used to measure the quality of the data, such as missing values or invalid formats.

0 Comments

Please login or register to comment.