Statistics on proFold

Result comparison

In order to test the performance of proFold, we first select the widely used DD-dataset for evaluation. The overall accuracy is 76.2%. Comparison with existing ensemble learning methods on DD-dataset is shown in Table 1.

Table 1. Comparison with existing ensemble learning methods on DD-dataset

Methods Overall Accuracy (%)
PFP-Pred (2006) 62.1
GAOEC (2008) 64.7
ThePFP-FunDSeqE (2009) 70.5
Dehzangi et al. (2010a) 62.7
Dehzangi et al. (2010b) 62.4
MarFold (2011) 71.7
PFP-RFSM (2013) 73.7
Feng et al. (2014) 70.2
Feng et al. (2015) 70.8
PFPA (2015) 73.6
proFold (the proposed method) 76.2
 

In order to further evaluate the performance of proFold, we also select another two large scale datasets: EDD-dataset and TG-dataset. Training and testing dataset are not clearly distinguished in the two datasets, so a k-fold cross validation is implemented on them.

We calculated the classification accuracy of EDD-dataset by 10-fold cross validation for 10 times and compared the result with other methods. The results are shown in Table 2.

Table 2. Comparison with the different methods on EDD-dataset by 10-fold cross validation

Methods Overall Accuracy (%)
Paliwal et al. (2014a) 90.6
Paliwal et al. (2014b) 86.2
Dehzangi et al. (2014) 88.2
HMMFold (2015) 86.0
Saini et al. (2015) 89.9
Lyons et al. (2016) 92.9
proFold (the proposed method) 93.2
 

Regarding TG-dataset, we also took experiments by 10-fold cross validation for 10 times and compared the results with other methods. The results of the comparison are shown in Table 3.

Table 3. Comparison with the different methods on TG-dataset by 10-fold cross validation

Methods Overall Accuracy (%)
Paliwal et al. (2014a) 77.0
Paliwal et al. (2014b) 73.3
Dehzangi et al. (2014) 73.8
HMMFold (2015) 93.8
Saini et al. (2015) 74.5
NiRecor (2016) 84.6
Lyons et al. (2016) 85.6
proFold (the proposed method) 94.3