Title: | Approximate False Positive Rate Control in Selection Frequency for Random Forest |
---|---|
Description: | Approximate false positive rate control in selection frequency for random forest using the methods described by Ender Konukoglu and Melanie Ganz (2014) <arXiv:1410.2838>. Methods for calculating the selection frequency threshold at false positive rates and selection frequency false positive rate feature selection. |
Authors: | Tom Wilson [aut, cre] , Jasen Finch [aut] |
Maintainer: | Tom Wilson <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.2 |
Built: | 2024-11-12 04:26:08 UTC |
Source: | https://github.com/aberhrml/forestcontrol |
This package is an implementation of the methods described by Ender Konukoglu and Melanie Ganz in Konukoglu, E. and Ganz, M., 2014. Approximate false positive rate control in selection frequency for random forest. arXiv preprint arXiv:1410.2838 https://arxiv.org/abs/1410.2838.
For a randomForest
or ranger
classification object, extract the parameters needed to calculate an approximate selection frequency threshold
extract_params(x)
extract_params(x)
x |
a |
a list of four elements
Fn The number of features considered at each internal node (mtry)
Ft The total number of features in the data set
K The average number of binary tests/internal nodes across the enitre forest
Tr The total number of trees in the forest
Tom Wilson [email protected]
library(randomForest) data(iris) iris.rf <- randomForest(iris[,-5], iris[,5], forest = TRUE) iris.params <- extract_params(iris.rf) print(iris.params)
library(randomForest) data(iris) iris.rf <- randomForest(iris[,-5], iris[,5], forest = TRUE) iris.params <- extract_params(iris.rf) print(iris.params)
Calculate the False Positive Rate (FPR) for each feature using it's selection frequency
fpr_fs(x)
fpr_fs(x)
x |
a |
a tibble
of selection frequencies and their false positive rate
Jasen Finch [email protected]
library(randomForest) data(iris) iris.rf <- randomForest(iris[,-5], iris[,5], forest = TRUE) iris.features <- fpr_fs(iris.rf) print(iris.features)
library(randomForest) data(iris) iris.rf <- randomForest(iris[,-5], iris[,5], forest = TRUE) iris.features <- fpr_fs(iris.rf) print(iris.features)
Extract variable selection frequencies from randomForest
and ranger
model objects
selection_freqs(x)
selection_freqs(x)
x |
a |
tibble
of variable selection frequencies
library(randomForest) data(iris) iris.rf <- randomForest(iris[,-5], iris[,5], forest = TRUE) iris.freqs <- selection_freqs(iris.rf) print(iris.freqs)
library(randomForest) data(iris) iris.rf <- randomForest(iris[,-5], iris[,5], forest = TRUE) iris.freqs <- selection_freqs(iris.rf) print(iris.freqs)
Determine the selecton frequency threshold of a model at a specified false positive rate
sft(x, alpha)
sft(x, alpha)
x |
a |
alpha |
a false positive rate (ie, 0.01) |
a list of two elements
sft Tthe selection frequency threshold
probs_atsft The esimated false positive rate
Tom Wilson [email protected]
library(randomForest) data(iris) iris.rf <- randomForest(iris[,-5], iris[,5], forest = TRUE) # For a false positive rate of 1% iris.sft <- sft(iris.rf, 0.01) print(iris.sft) # To iterate through a range of alpha values alpha <- c(0.01,0.05, 0.1,0.15,0.2, 0.25) threshold <- NULL for(i in seq_along(alpha)){ threshold[i] <- sft(iris.rf, alpha[i])$sft } plot(alpha, threshold, type = 'b')
library(randomForest) data(iris) iris.rf <- randomForest(iris[,-5], iris[,5], forest = TRUE) # For a false positive rate of 1% iris.sft <- sft(iris.rf, 0.01) print(iris.sft) # To iterate through a range of alpha values alpha <- c(0.01,0.05, 0.1,0.15,0.2, 0.25) threshold <- NULL for(i in seq_along(alpha)){ threshold[i] <- sft(iris.rf, alpha[i])$sft } plot(alpha, threshold, type = 'b')