Smoke Exposure (PM2.5) Monthly (Naman Paul, Jiayun Yao, Kathleen E. McLean, David M. Stieb, Sarah B. Henderson, The Canadian Optimized Statistical Smoke Exposure Model (CanOSSEM): A machine learning approach to estimate national daily fine particulate matter (PM2.5) exposure, Science of The Total Environment, 2022.)

The Canadian Optimized Statistical Smoke Model (CanOSSEM) was developed by the Environmental Health Services of the BC Centre for Disease Control and used to produce estimated concentrations of fine particulate across all populated regions of Canada. The estimates are optimized for wildfire smoke through use of multiple variables that are specific to this source. NOTE: Daily data indexed to postal codes will be available shortly. Un-indexed grid files are available on request to naman.paul@bccdc.ca.CanOSSEM is a random forest machine learning model that uses potential predictor variables integrated from multiple data sources and estimates daily mean (24-hour) PM2.5 concentrations at a 5 km × 5 km spatial resolution. The training and prediction datasets were generated using observations from National Air Pollution Surveillance (NAPS) network. The Root Mean Squared Error (RMSE) between predicted and observed PM2.5 concentration was 2.85 µg/m3 for the entire prediction set, with over 95% of the predictions lying within an absolute difference of 5 µg/m3 from the NAPS PM2.5 measurements. The model was evaluated using 10-fold cross-validation, leave-one-region-out and leave-one-year-out cross-validation.

Datasets available for download

Smoke Exposure (PM2.5) Monthly (Naman Paul,...PDF
The Canadian Optimized Statistical Smoke Model (CanOSSEM) was developed by...
To Download
- More information
- Go to resource

Additional Info

Field	Value
Last Updated	April 18, 2024, 17:10 (UTC)
Created	September 18, 2023, 23:37 (UTC)
Domain / Topic Domain or topic of the dataset being cataloged.	Environment
Description A description of the dataset.	The Canadian Optimized Statistical Smoke Model (CanOSSEM) was developed by the Environmental Health Services of the BC Centre for Disease Control and used to produce estimated concentrations of fine particulate across all populated regions of Canada. The estimates are optimized for wildfire smoke through use of multiple variables that are specific to this source. NOTE: Daily data indexed to postal codes will be available shortly. Un-indexed grid files are available on request to naman.paul@bccdc.ca.CanOSSEM is a random forest machine learning model that uses potential predictor variables integrated from multiple data sources and estimates daily mean (24-hour) PM2.5 concentrations at a 5 km × 5 km spatial resolution. The training and prediction datasets were generated using observations from National Air Pollution Surveillance (NAPS) network. The Root Mean Squared Error (RMSE) between predicted and observed PM2.5 concentration was 2.85 µg/m3 for the entire prediction set, with over 95% of the predictions lying within an absolute difference of 5 µg/m3 from the NAPS PM2.5 measurements. The model was evaluated using 10-fold cross-validation, leave-one-region-out and leave-one-year-out cross-validation.
Tags Keywords/tags categorizing the dataset.	Air Air Quality Exposure Fine Learning Machine Matter Monitoring Particulate Pm2.5 Quality Satellite Smoke
Format (CSV, XLS, TXT, PDF, etc) File format of the dataset.	.pdf - application/pdf
Dataset Size Dataset size in megabytes.
Metadata Identifier Metadata identifier – can be used as the unique identifier for catalogue entry
Published Date Published date of the dataset.	2021-03-21
Time Period Data Span (start date) Start date of the data in the dataset.	2000-01-01
Time Period Data Span (end date) End date of time data in the dataset.	2019-12-31
GeoSpatial Area Data Span A spatial region or named place the dataset covers.	Canada
fair_rda_a1_02d	Yes
fair_rda_a1_03d	No
fair_rda_a1_04d	No
fair_rda_a1_05d	No
fair_rda_a1_1_01d	Yes
fair_rda_a1_2_01d	No
fair_rda_i1_01d	No
fair_rda_i1_02d	No
fair_rda_i2_01d	No
fair_rda_i3_01d	Yes
fair_rda_r1_3_01d	No

Field	Value
Access category Type of access granted for the dataset (open, closed, service, etc).	visible
License License used to access the dataset.	License not specified
Limits on use Limits on use of data.	These data files are provided solely for the purposes stated in the CANUE Data Sharing and Use Agreement and should not be re-distributed for any reason. These data also contain proprietary postal code data and may only be used for the project named in the CANUE Data Sharing and Use Agreement. Data can be shared only within a project team for the exclusive purposes of teaching, academic research and publishing, and/or planning of educational services in accordance to DMTI End User Agreement associated with the Spatial Mapping Academic Research Tools (SMART) Program.
Location Location of the dataset.	https://www.canue.ca
Data Service Data service for accessing a dataset.
Owner Owner of the dataset.	CANUE (Canadian Urban Environmental Health Research Consortium)\|Dalla Lana School of Public Health, University of Toronto
Contact Point Who to contact regarding access?
Contact Point Email The email to contact regarding access?
Publisher Publisher of the dataset.
Publisher Email Email of the publisher.
Author Author of the dataset.	CANUE (Canadian Urban Environmental Health Research Consortium)\|Dalla Lana School of Public Health, University of Toronto
Author Email Email of the author.	info@canue.ca
Accessed At Date the data and metadata was accessed.	2023-09-18

Field	Value
Identifier Unique identifier for the dataset.	https://doi.org/10.1016/j.scitotenv.2022.157956
Language Language(s) of the dataset	English
Link to dataset description A URL to an external document describing the dataset.
Persistent Identifier Data is identified by a persistent identifier.	No
Globally Unique Identifier Data is identified by a persistent and globally unique identifier.	No
Contains data about individuals Does the data hold data about individuals?	N/A
Contains data about identifiable individuals Does the data hold identifiable data about individual?	N/A
Contains Indigenous Data Does the data hold data about Indigenous communities?	N/A

Field	Value
Source Source of the dataset.	https://doi.org/10.1016/j.scitotenv.2022.157956
Version notes Version notes about the dataset.
Is version of another dataset Link to dataset that it is a version of.
Other versions Link to datasets that are versions of it.
Provenance Text Provenance Text of the data.
Provenance URL Provenance URL of the data.
Temporal resolution Describes how granular the date/time data in the dataset is.	Monthly
GeoSpatial resolution in meters Describes how granular (in meters) geospatial data is in the dataset.
GeoSpatial resolution (in regions) Describes how granular (in regions) geospatial data is in the dataset.

Field	Value
Indigenous Community Permission Who holds the Indigenous Community Permission. Who to contact regarding access to a dataset that has data about Indigenous communities.
Community Permission Community permission (who gave permission).
The Indigenous communities the dataset is about Indigenous communities from which data is derived.

Field	Value
Number of data rows If tabular dataset, total number of rows.
Number of data columns If tabular dataset, total number of unique columns.
Number of data cells If tabular dataset, total number of cells with data.
Number of data relations If RDF dataset, total number of triples.
Number of entities If RDF dataset, total number of entities.
Number of data properties If RDF dataset, total number of unique properties used by the triples.
Data quality Describes the quality of the data in the dataset.	NoData = -9999 (for numeric fields) - NoData=null (for category fields) - Data insufficient to calculate value = -1111
Metric for data quality A metric used to measure the quality of the data, such as missing values or invalid formats.

Datasets available for download

Additional Info

0 Comments

Please login or register to comment.