To
test the version of HELIQUEST, we have run tests on proteins whose 3D
structures are known and on well-annotated data set. More specifically is the
screening procedure able to extract transmembrane segment or amphipathic
helices in protein whose structure is known or that can bind to lipid membrane
surface? Does the implementation of the decision tree permit to classify
screening results?
1- Test on PDB
datasets.
We ran HELIQUEST on sequences corresponding to a
subset of non-redundant PDB (less than 30 % homology: one set with 7762 RX and
RMN structures with a resolution < 3 Å and a
second one with 1858 RX structure with a resolution < 1.6 Å (Wang G,
Dunbrack RL. PISCES: a protein sequence culling server. Bioinformatics, 19:1589-1591, 2003).
We examined if increasing µH in the absence of other constraints (see Table 1)
better identify helical amphipathic segments that exist in PDB. To determine this,
segments positive for the screening are classified as helical or not
(helix/random coil + a high propensity of b-sheet) and are compared to PDB sequences
assigned by P-SEA(Labesse G, Colloc'h N, Pothier J, Mornon JP.P-SEA: a new
efficient assignment of secondary structure from C alpha trace of proteins. Comput Appl Biosci. 1997, 13 :291-5). For each dataset, we
found for various µH range 1) a very good sensitivity despite a poor
specificity. It should be noted that increasing the µH permits to enhance the positive
predictive value (PPV) to 85 % (with µH ≥ 0.6).
On the other hand, at lower µH, PSIPRED helps HELIQUEST to distinguish with
high specificity true helices from false helices (71.5 % for µH between 0.3 and 0.4). We observed also that, without any structural
prediction with PSIPRED, increasing µH value allow to better detect true
amphipathic helices in structures (Number of true amphipathic helices/Number of
total retrieved segments = 0.81). This indicates that the amphipathic moment is
a good “predictor” of the presence of amphipathic helix in proteins.
Table 1A. Analysis of screeninga carried
out on a small, non-redundant PDB set (1858 RX structure < 1.6 Å)
|
Classification with PSIPRED (Helical/Non helical) |
Without PSIPREDc |
|||||||||||
µH |
Sensitivity (%) |
Specificity (%) |
TPb |
FP |
FN |
TN |
% PPV |
% Acc |
Helical in PDB |
Non helical in PDB |
NTRUE HELICES /
NTOTAL |
NTOTAL |
|
0.6 ≤ µH ≤ 1.0 |
99 |
30.4 |
95 |
16 |
1 |
7 |
85.6 |
85.7 |
96 |
23 |
0.81 |
119 |
|
0.5 ≤ µH ≤ 0.6 |
88.9 |
42.5 |
464 |
184 |
58 |
136 |
71.6 |
71.2 |
522 |
320 |
0.61 |
842 |
|
0.4 ≤ µH ≤ 0.5 |
79.3 |
56.8 |
1145 |
539 |
299 |
710 |
67.9 |
68.9 |
1444 |
1249 |
0.54 |
2693 |
|
0.3 ≤µH ≤ 0.4 |
68.0 |
71.5 |
1525 |
823 |
719 |
2064 |
64.9 |
69.9 |
2244 |
2887 |
0.44 |
5131 |
|
0.5 ≤ µH ≤ 1.0 |
89.2 |
41.8 |
472 |
185 |
57 |
133 |
71.8 |
71.4 |
529 |
318 |
0.63 |
847 |
|
a Parameters : 0.7 ≤ H ≤ 1.5 ; -8 ≤ z ≤ +8 ; NPolar ≥ 0 ;
NCharged Residues ≤ 10 ; NGly ≥ 0 ; Cys accepted ; no Pro
accepted. The algorithm refining the identification of well-defined amphipathic
helices was deactivated.
bTP : true positive ; FP : false
positive ; FN : false negative ; TN : true negative ;
Sensitivity = TP/(TP+FN) : Specificity
=TN/(TN+FP) ; PPV= Positive Predictive Value= TP/(TP+FP); Acc=accuracy = (TP +
TN ) / (TP+ FN+FP+TN) ; NTOTAL : total number of segments identified
by the screening.
c in this case, without PSIPRED prediction, the NTOTAL
segments are considered as helical – we simply examined if these sequences are
helical or not in the PDB
2 - Test on a TM-containing protein dataset.
We ran HELIQUEST on a
dataset from the MPtopo database (Jayasinghe, S., Hristova, K., and White, S.H. MPtopo: A database of
membrane protein topology. Protein Sci 10:
455-458. 2001) and containing 131 sequences where the positions of TM are
determined structurally. The dataset is screened with µH ≤ 0.5 and H between 0.7 and 1.5: TMHMM recognizes among sorted sequences,
transmembrane segment with a positive predictive value of about 95 %
(sensitivity= 95.5 % ; specificity=53.5 %).
Table 2. Analysis of
screeninga carried out on a dataset of TM-containing proteins
Sensitivity (%) |
Specificity (%) |
TPb |
FP |
FN |
TN |
%PPV |
% Acc |
95.5 |
53.5 |
382 |
20 |
18 |
23 |
95.1 |
91.4 |
a Parameters : 0.7 ≤ H ≤ 1.5 ; µH ≤ 0.5 ; -8 ≤ z ≤ +8 ; NPolar ≥ 0 ;
NCharged Residues ≤ 10 ; NGly ≥ 0 ; Cys accepted ; no Pro
accepted. The algorithm refining the identification of well-defined
amphipathic helices was deactivated.
b TP : true positive ; FP : false
positive ; FN: false negative ; TN : true negative ; Sensitivity =
TP/(TP+FN) : Specificity =TN/(TN+FP) ; PPV= Positive Predictive Value= TP/(TP+FP);
Acc=accuracy = (TP + TN ) / (TP+ FN+FP+TN)
3 - Test on a dataset of lipid-binding protein helices
We screened a small database containing 18 proteins that bind to lipid membrane surface and that has not been used for our discriminant analysis. We selected screening parameters (see Table 3) that allowed to extract from the database 10 sequences (among a total of 38 retrieved segments) that corresponds to or overlap known lipid-binding segments: the decision tree classified 2 segments as Lipid-Binding Helix, 6 segments as Possible Lipid-Binding Helix, 1 segment as simply helical and one segment as a TM segment. Those results suggest that the decision tree would be able to help users to better identify interesting hits.
Table 3. Analysis of screening carried out on a dataset of lipid-binding
protein segment.
Lipid-binding
protein helices (bracketed numbers indicate sequence position) |
UNIPROT |
Lipid-binding-Helix |
Possible
Lipid-binding Helix |
TM |
Helix |
Helix/coil |
High propensity in b-sheet |
Aerobic
Glycerol-3-Phosphate Dehydrogenase
[355-370] |
P13035 |
|
X |
|
|
|
|
FHV
Coat Protein
[364-385 ] |
P12870 |
|
X |
|
|
|
|
Hepatitis
C Core protein [117-134] |
P27957 |
|
|
X |
|
|
|
Dense
Granule Protein 2 [69-87] |
P13404 |
|
X |
|
|
|
|
G
protein-coupled Receptor Kinase 5
[546-565] |
P34947 |
|
|
|
|
|
|
Glucose-specific
IIa component [1-18 ] |
Q8XBL1 |
|
|
|
|
|
|
Lactophorin
[116-153 ] |
P80195 |
X |
|
|
|
|
|
Myelin
Basic Protein [81-97] |
P02687 |
|
|
|
|
|
|
Sterol
Carrier Protein 2 [1-32] |
P22307 |
|
|
|
|
|
|
Phosphodiesterase
4A cAMP specific [1-25] |
Q684M5 |
|
|
|
|
|
|
Synuclein a [3-37] and [45-92] |
P37840 |
|
X |
|
|
|
|
GMAP-210
[1829-1843] |
Q15643 |
|
X |
|
|
|
|
Cholinephosphate
CytidylylTransferase [240-295] |
P19836 |
|
|
|
|
|
|
RGS4
[1-33] |
O08899 |
|
X |
|
|
|
|
DnaA
[357-374] |
P03004 |
|
|
|
X |
|
|
Spo20p
[62-79] |
Q04359 |
|
|
|
|
|
|
BVDV
NS5A [1-28] |
P19711 |
X |
|
|
|
|
|
Measles
virus F1 protein [197-225] |
Q9YJ94 |
|
|
|
|
|
|
a Parameters : 0 ≤ H ≤
2.25 ; -2 ≤ z ≤ +5 ; NPolar ≥ 5 ;
NCharged Residues ≤ 6 ; NGly ≥ 0 ; Cys and Pro accepted. The
algorithm refining the identification of well-defined amphipathic helices was
activated. Dashed rows = protein whose lipid-binding segment was not retrieved
by the screening.