-
Notifications
You must be signed in to change notification settings - Fork 18
/
Copy path600_SSL.Rmd
54 lines (36 loc) · 1.6 KB
/
600_SSL.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# Further Classification Models
## Multilabel classification
Some datasets, for example, reviews of applications and mobile applications repositories such as App Store or Google play contain reviews that can have several labels at the same time (e.g. bugs, feature requests, etc.)
## Semi-supervised Learning
Self train a model on semi-supervised data
[http://www.inside-r.org/packages/cran/dmwr/docs/SelfTrain](http://www.inside-r.org/packages/cran/dmwr/docs/SelfTrain)
```{r ssl, message=FALSE, warning=FALSE, eval = FALSE}
library(DMwR2)
## Small example with the Iris classification data set
data(iris)
## Dividing the data set into train and test sets
idx <- sample(150,100)
tr <- iris[idx,]
ts <- iris[-idx,]
## Learn a tree with the full train set and test it
stdTree <- rpartXse(Species~ .,tr,se=0.5)
table(predict(stdTree,ts,type='class'),ts$Species)
## Now let us create another training set with most of the target
## variable values unknown
trSelfT <- tr
nas <- sample(100,70)
trSelfT[nas,'Species'] <- NA
## Learn a tree using only the labelled cases and test it
baseTree <- rpartXse(Species~ .,trSelfT[-nas,],se=0.5)
table(predict(baseTree,ts,type='class'),ts$Species)
## The user-defined function that will be used in the self-training process
f <- function(m,d) {
l <- predict(m,d,type='class')
c <- apply(predict(m,d),1,max)
data.frame(cl=l,p=c)
}
## Self train the same model using the semi-superside data and test the
## resulting model
treeSelfT <- SelfTrain(Species~ .,trSelfT,learner('rpartXse',list(se=0.5)),'f')
table(predict(treeSelfT,ts,type='class'),ts$Species)
```