Overview

Dataset statistics

Number of variables5
Number of observations153
Missing cells44
Missing cells (%)5.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory6.1 KiB
Average record size in memory40.8 B

Variable types

DateTime1
Numeric4

Alerts

Ozone is highly correlated with Wind and 1 other fieldsHigh correlation
Wind is highly correlated with OzoneHigh correlation
Temp is highly correlated with OzoneHigh correlation
Ozone is highly correlated with Wind and 1 other fieldsHigh correlation
Wind is highly correlated with OzoneHigh correlation
Temp is highly correlated with OzoneHigh correlation
Ozone is highly correlated with TempHigh correlation
Temp is highly correlated with OzoneHigh correlation
Ozone is highly correlated with Wind and 1 other fieldsHigh correlation
Solar is highly correlated with TempHigh correlation
Wind is highly correlated with Ozone and 1 other fieldsHigh correlation
Temp is highly correlated with Ozone and 2 other fieldsHigh correlation
Ozone has 37 (24.2%) missing values Missing
Solar has 7 (4.6%) missing values Missing
Date has unique values Unique

Reproduction

Analysis started2022-07-08 18:02:06.452324
Analysis finished2022-07-08 18:02:08.801309
Duration2.35 seconds
Software versionpandas-profiling v3.1.1
Download configurationconfig.json

Variables

Date
Date

UNIQUE

Distinct153
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.3 KiB
Minimum1976-05-01 00:00:00
Maximum1976-09-30 00:00:00
2022-07-09T02:02:08.994061image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-09T02:02:09.132284image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Ozone
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct67
Distinct (%)57.8%
Missing37
Missing (%)24.2%
Infinite0
Infinite (%)0.0%
Mean42.12931034
Minimum1
Maximum168
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.3 KiB
2022-07-09T02:02:09.272905image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile7.75
Q118
median31.5
Q363.25
95-th percentile108.5
Maximum168
Range167
Interquartile range (IQR)45.25

Descriptive statistics

Standard deviation32.98788451
Coefficient of variation (CV)0.7830150611
Kurtosis1.290302678
Mean42.12931034
Median Absolute Deviation (MAD)17.5
Skewness1.241796404
Sum4887
Variance1088.200525
MonotonicityNot monotonic
2022-07-09T02:02:09.409393image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
236
 
3.9%
214
 
2.6%
164
 
2.6%
204
 
2.6%
184
 
2.6%
134
 
2.6%
144
 
2.6%
323
 
2.0%
443
 
2.0%
113
 
2.0%
Other values (57)77
50.3%
(Missing)37
24.2%
ValueCountFrequency (%)
11
 
0.7%
41
 
0.7%
61
 
0.7%
73
2.0%
81
 
0.7%
93
2.0%
101
 
0.7%
113
2.0%
122
1.3%
134
2.6%
ValueCountFrequency (%)
1681
0.7%
1351
0.7%
1221
0.7%
1181
0.7%
1151
0.7%
1101
0.7%
1081
0.7%
972
1.3%
961
0.7%
911
0.7%

Solar
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct117
Distinct (%)80.1%
Missing7
Missing (%)4.6%
Infinite0
Infinite (%)0.0%
Mean185.9315068
Minimum7
Maximum334
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.3 KiB
2022-07-09T02:02:09.545915image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum7
5-th percentile24.25
Q1115.75
median205
Q3258.75
95-th percentile311.5
Maximum334
Range327
Interquartile range (IQR)143

Descriptive statistics

Standard deviation90.05842223
Coefficient of variation (CV)0.4843634291
Kurtosis-0.9684667515
Mean185.9315068
Median Absolute Deviation (MAD)66.5
Skewness-0.4280445256
Sum27146
Variance8110.519414
MonotonicityNot monotonic
2022-07-09T02:02:09.672540image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2384
 
2.6%
2594
 
2.6%
1753
 
2.0%
2233
 
2.0%
2203
 
2.0%
2502
 
1.3%
2642
 
1.3%
1272
 
1.3%
2732
 
1.3%
2912
 
1.3%
Other values (107)119
77.8%
(Missing)7
 
4.6%
ValueCountFrequency (%)
71
0.7%
81
0.7%
131
0.7%
141
0.7%
191
0.7%
201
0.7%
242
1.3%
251
0.7%
271
0.7%
311
0.7%
ValueCountFrequency (%)
3341
0.7%
3321
0.7%
3231
0.7%
3222
1.3%
3201
0.7%
3141
0.7%
3131
0.7%
3071
0.7%
2991
0.7%
2951
0.7%

Wind
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct31
Distinct (%)20.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.95751634
Minimum1.7
Maximum20.7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.3 KiB
2022-07-09T02:02:09.800532image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum1.7
5-th percentile4.6
Q17.4
median9.7
Q311.5
95-th percentile15.5
Maximum20.7
Range19
Interquartile range (IQR)4.1

Descriptive statistics

Standard deviation3.523001352
Coefficient of variation (CV)0.3538032208
Kurtosis0.111418252
Mean9.95751634
Median Absolute Deviation (MAD)2.3
Skewness0.3478177747
Sum1523.5
Variance12.41153853
MonotonicityNot monotonic
2022-07-09T02:02:09.911565image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
11.515
 
9.8%
10.311
 
7.2%
811
 
7.2%
9.711
 
7.2%
7.410
 
6.5%
6.99
 
5.9%
6.38
 
5.2%
9.28
 
5.2%
10.98
 
5.2%
8.68
 
5.2%
Other values (21)54
35.3%
ValueCountFrequency (%)
1.71
 
0.7%
2.31
 
0.7%
2.81
 
0.7%
3.41
 
0.7%
41
 
0.7%
4.11
 
0.7%
4.64
2.6%
5.13
 
2.0%
5.73
 
2.0%
6.38
5.2%
ValueCountFrequency (%)
20.71
 
0.7%
20.11
 
0.7%
18.41
 
0.7%
16.63
 
2.0%
16.11
 
0.7%
15.53
 
2.0%
14.98
5.2%
14.36
3.9%
13.85
3.3%
13.22
 
1.3%

Temp
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct40
Distinct (%)26.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean77.88235294
Minimum56
Maximum97
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.3 KiB
2022-07-09T02:02:10.031450image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Quantile statistics

Minimum56
5-th percentile60.2
Q172
median79
Q385
95-th percentile92
Maximum97
Range41
Interquartile range (IQR)13

Descriptive statistics

Standard deviation9.465269741
Coefficient of variation (CV)0.1215329196
Kurtosis-0.4035053804
Mean77.88235294
Median Absolute Deviation (MAD)6
Skewness-0.3778844643
Sum11916
Variance89.59133127
MonotonicityNot monotonic
2022-07-09T02:02:10.140484image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Histogram with fixed size bins (bins=40)
ValueCountFrequency (%)
8111
 
7.2%
829
 
5.9%
769
 
5.9%
867
 
4.6%
777
 
4.6%
786
 
3.9%
796
 
3.9%
855
 
3.3%
805
 
3.3%
925
 
3.3%
Other values (30)83
54.2%
ValueCountFrequency (%)
561
 
0.7%
573
2.0%
582
1.3%
592
1.3%
613
2.0%
622
1.3%
631
 
0.7%
642
1.3%
652
1.3%
663
2.0%
ValueCountFrequency (%)
971
 
0.7%
961
 
0.7%
942
 
1.3%
933
2.0%
925
3.3%
912
 
1.3%
903
2.0%
892
 
1.3%
883
2.0%
875
3.3%

Interactions

2022-07-09T02:02:07.993949image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-09T02:02:06.667785image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-09T02:02:07.086357image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-09T02:02:07.558483image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-09T02:02:08.102863image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-09T02:02:06.768965image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-09T02:02:07.195799image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-09T02:02:07.671457image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-09T02:02:08.219867image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-09T02:02:06.879777image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-09T02:02:07.320500image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-09T02:02:07.782485image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-09T02:02:08.305616image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-09T02:02:06.988014image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-09T02:02:07.445457image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
2022-07-09T02:02:07.899474image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Correlations

2022-07-09T02:02:10.225811image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-07-09T02:02:10.335190image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-07-09T02:02:10.428983image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-07-09T02:02:10.546136image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-07-09T02:02:08.459163image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
A simple visualization of nullity by column.
2022-07-09T02:02:08.588418image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-07-09T02:02:08.690455image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-07-09T02:02:08.752337image/svg+xmlMatplotlib v3.5.0, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

DateOzoneSolarWindTemp
01976-05-0141.0190.07.467
11976-05-0236.0118.08.072
21976-05-0312.0149.012.674
31976-05-0418.0313.011.562
41976-05-05NaNNaN14.356
51976-05-0628.0NaN14.966
61976-05-0723.0299.08.665
71976-05-0819.099.013.859
81976-05-098.019.020.161
91976-05-10NaN194.08.669

Last rows

DateOzoneSolarWindTemp
1431976-09-2113.0238.012.664
1441976-09-2223.014.09.271
1451976-09-2336.0139.010.381
1461976-09-247.049.010.369
1471976-09-2514.020.016.663
1481976-09-2630.0193.06.970
1491976-09-27NaN145.013.277
1501976-09-2814.0191.014.375
1511976-09-2918.0131.08.076
1521976-09-3020.0223.011.568