This example analysis will use the abr1 data
set from the metaboData package. It
is nominal mass flow-injection mass spectrometry (FI-MS) fingerprinting
data from a plant-pathogen infection time course experiment. The
analysis will also include use of the pipe %>% from the
magrittr package. First
load the necessary packages.
For this example we will use only the negative acquisition mode data
(abr1$neg) and sample meta-information
(abr1$fact). Create an AnalysisData class
object using the following:
The data includes 120 samples and 2000 mass spectral features as shown below.
The clsAvailable() function can be used to identify the
columns available in our meta-information table.
clsAvailable(d)
#> [1] "injorder" "pathcdf" "filecdf" "name.org" "remark" "name" "rep"
#> [8] "day" "class"For this analysis, we will be using the infection time course class
information contained in the day column. This can be
extracted and the class frequencies tabulated using the following:
As can be seen above, the experiment is made up of six infection time
point classes that includes a healthy control class (H) and
five day infection time points (1-5), each with 20
replicates.
For data pre-treatment prior to statistical analysis, a two-thirds maximum class occupancy filter can be applied. Features where the maximum proportion of non-missing data per class is above two-thirds are retained. A total ion count normalisation will also be applied.
d <- d %>%
occupancyMaximum(cls = 'day', occupancy = 2/3) %>%
transformTICnorm() %>%
transformLog10()This has reduced the data set to 1760 relevant features.
The structure of the data can be visualised using both unsupervised and supervised methods. For instance, the first two principle components from a principle component analysis (PCA) of the data with the sample points coloured by infection class can be plotted using:
And similarly, multidimensional scaling (MDS) of sample proximity values from a supervised random forest classification model along with receiver operator characteristic (ROC) curves.
A progression can clearly be seen from the earliest to latest infected time points.
For feature selection, one-way analysis of variance (ANOVA) can be performed for each feature to identify features significantly explanatory for the infection time point.
A table of the significantly explanatory features can be extracted with a bonferroni correction adjusted p value < 0.05 using:
explan_feat
#> # A tibble: 397 × 10
#> response comparison feature term df sumsq meansq statistic p.value
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 day 1~2~3~4~5~H N341 response 5 38.4 7.67 443. 6.21e-73
#> 2 day 1~2~3~4~5~H N342 response 5 52.2 10.4 256. 3.01e-60
#> 3 day 1~2~3~4~5~H N1083 response 5 7.07 1.41 171. 2.68e-51
#> 4 day 1~2~3~4~5~H N513 response 5 43.3 8.66 162. 3.91e-50
#> 5 day 1~2~3~4~5~H N133 response 5 78.9 15.8 159. 1.17e-49
#> 6 day 1~2~3~4~5~H N683 response 5 18.6 3.72 158. 1.33e-49
#> 7 day 1~2~3~4~5~H N1084 response 5 4.90 0.979 129. 2.64e-45
#> 8 day 1~2~3~4~5~H N1085 response 5 4.69 0.938 129. 3.08e-45
#> 9 day 1~2~3~4~5~H N171 response 5 31.6 6.32 121. 5.70e-44
#> 10 day 1~2~3~4~5~H N163 response 5 92.4 18.5 119. 1.64e-43
#> # ℹ 387 more rows
#> # ℹ 1 more variable: adjusted.p.value <dbl>The ANOVA has identified 397 features significantly explanatory over the infection time course. A heat map of the mean relative intensity for each class of these explanatory features can be plotted to visualise their trends between the infection time point classes.
Many of the explanatory features can be seen to be most highly
abundant in the final infection time point 5.
Finally, box plots of the trends of individual features can be
plotted, such as the N341 feature below.