Steepwise Discriminant analysis
Table
1.List of segments used for the SDA.
Name |
Sequence |
Lipid Bindinga |
L |
<H> |
<µH> |
z |
Reference |
pAntp |
RQIKIWFQNRRMKWKK |
YES |
16 |
0,193 |
0,327 |
7 |
Biochemistry
40,1824-1834 (2001) |
AP2AL |
RQIKIWFQAARMLWKK |
YES |
16 |
0,501 |
0,51 |
5 |
Biochemistry
40,1824-1834 (2001) |
18A |
DWLKAFYDKVAEKLKEAF |
YES |
18 |
0,309 |
0,565 |
0 |
J
Biol Chem 269,23904-23910 (1994) |
18L |
GIKKFLGSIWKFIKAFVG |
YES |
18 |
0,681 |
0,698 |
4 |
Biochim
Biophys Acta 1368,343-354 (1998) |
Mastoparan
X |
INWKGIAAMAKKLL |
YES |
14 |
0,56 |
0,419 |
3 |
Biochemistry
35, 8450-8456 (1996) |
Mellitin |
GIGAVLKVLTTGLPALISWIKRKRQQ |
YES |
26 |
0,511 |
0,394 |
5 |
Biochim
Biophys Acta 1667,26-37 (2004) |
M2a |
GIGKFLHSAKKFGKAFVGEIMNS |
YES |
23 |
0,373 |
0,475 |
3 |
Biochemistry
35, 10844-10853 (1996) |
Cecropin
P1 |
SWLSKTAKKLENSAKKRISEGIAIAIQGGPR |
YES |
31 |
0,188 |
0,214 |
5 |
Biochemistry
34,11479-11488 (1995) |
p25
CoxIV |
MLSLRQSIRFFKPATRTLCSSRYLL |
YES |
25 |
0,55 |
0,222 |
5 |
Biochemistry
35, 3141-3146 (1996) |
PhoE
signal peptide |
MKKSTLALVVMGIVASASVQA |
YES |
21 |
0,558 |
0,045 |
2 |
Biochemistry
31, 1672-1677 (1992) |
TMX1 |
WNALAAVAAALAAVAAALAAVAASKSKSKSK |
YES |
31 |
0,353 |
0,215 |
4 |
Biochemistry
43, 5782-5791 (2004) |
Helix
1 |
NPAEAAARAEACRRAWAAAAARARA |
YES |
25 |
0,077 |
0,18 |
3 |
Protein
Eng 11,539-547 (1998) |
Helix
2 |
NENAKKAAACKKKWEELKKKLAALK |
YES |
25 |
-0,051 |
0,35 |
6 |
Biochemistry
36, 12869-80 (1997) |
Helix
4 |
NELNAKLAACKKKWEELKKKLAALK |
YES |
25 |
0,112 |
0,443 |
5 |
Biochemistry
36, 12869-12880 (1997) |
Helix
5 |
NELKKKLELCKAKWLEAKKKLEALK |
YES |
25 |
0,114 |
0,361 |
5 |
Biochemistry
36, 12869-12880 (1997) |
100-M2a |
GIAKFGKAAAHFGKKWVGELMNS |
YES |
23 |
0,344 |
0,461 |
3 |
Biochemistry
36, 12869-12880 (1997) |
120-M2a |
GIGKFLHSAKKFGKAWVGEIMNS |
YES |
23 |
0,393 |
0,492 |
3 |
Biochemistry
36, 12869-12880 (1997) |
140-M2a |
GIGKFLHTLKTFGKKWVGEIMNS |
YES |
23 |
0,465 |
0,53 |
3 |
Biochemistry
36, 12869-12880 (1997) |
160-M2a |
GIGKFLHKVKSFGKSWIGEIMNS |
YES |
23 |
0,443 |
0,525 |
3 |
Biochemistry
36, 12869-12880 (1997) |
180-M2a |
GIGKFLHKVGSFIKSWKGEIMNS |
YES |
23 |
0,443 |
0,549 |
3 |
Biochemistry
36, 12869-12880 (1997) |
Arf1
[1-17] |
MGNIFANLFKGLFGKKE |
YES |
17 |
0,429 |
0,462 |
2 |
Biochemistry
36, 12869-12880 (1997) |
Sar1p
[1-23] |
MAGWDIFGWFRDVLASLGLWNKH |
YES |
23 |
0,707 |
0,27 |
0 |
Nat
Cell Biol 3, 531-537 (2001) |
Sar1p [1-23] (W4A) |
MAGADIFGWFRDVLASLGLWNKH |
NO |
23 |
0,622 |
0,351 |
0 |
Cell
122, 605-617 (2005) |
Sar1p [1-23] (IF-6/7-AA) |
MAGWDAAGWFRDVLASLGLWNKH |
NO |
23 |
0,577 |
0,227 |
0 |
Cell
122, 605-617 (2005) |
dAmph
1[1-33] |
MTENKGIMLAKSVQKHAGRAKEKILQNLGKVDR |
YES |
33 |
0,098 |
0,276 |
5 |
Science
303,495-499 (2004) |
Endophilin
A1 [7-35] |
KKQFHKATQKVSEKVGGAEGTKLDDDFKE |
YES |
29 |
-0,091 |
0,2 |
1 |
J
Cell Biol 155,193-200 (2001) |
Endophilin
A1 [7-35] (F10E) |
KKQEHKATQKVSEKVGGAEGTKLDDDFKE |
NO |
29 |
-0,175 |
0,158 |
0 |
J
Cell Biol 155,193-200 (2001) |
Kes1p
[7-29] |
SSSWTSFLKSIASFNGDLSSLSA |
NO |
23 |
0,473 |
0,423 |
0 |
Nat
Struct Mol Biol 14,138-146 (2007) |
Nup
133 [245-267] |
LPQGQGMLSGIGRKVSSLFGILS |
NO |
23 |
0,555 |
0,417 |
2 |
Nat
Struct Mol Biol 14,138-146 (2007) |
GMAP-210
[1-38] |
MSSWLGGLGSGLGQSLGQVGGSLASLTGQISNFTKDML |
NO |
38 |
0,499 |
0,475 |
0 |
Nat
Struct Mol Biol 14,138-146 (2007) |
ArfGAP1
[199-234 ] |
FLNSAMSSLYSGWSSFTTGASKFASAAKEGATKFGS |
NO |
36 |
0,363 |
0,386 |
2 |
EMBO
J 24,2244-2253 (2005) |
ArfGAP1
[199-234] (L207A-W211A-F214A) |
FLNSAMSSAYSGASSATTGASKFASAAKEGATKFGS |
NO |
36 |
0,229 |
0,267 |
2 |
EMBO
J 24,2244-2253 (2005) |
ArfGAP1
[ 199-234](L207D) |
FLNSAMSSDYSGWSSFTTGASKFASAAKEGATKFGS |
NO |
36 |
0,294 |
0,317 |
1 |
Biochemistry
46,1779-1790 (2007) |
ArfGAP1
[ 264-295 ] |
IFDDVSSGVSQLASKVQGVGSKGWRDVTTFFS |
NO |
32 |
0,377 |
0,456 |
0 |
Biochemistry
46,1779-1790 (2007) |
ArfGAP1
[264-295] (V279D) |
IFDDVSSGVSQLASKDQGVGSKGWRDVTTFFS |
NO |
32 |
0,315 |
0,396 |
-1 |
Nat
Struct Mol Biol 14,138-146 (2007) |
ArfGAP1
[199-234] 2Ki |
FLNSAMSKLYSGWSSFKTGASKFASAAKEGATKFGS |
YES |
36 |
0,302 |
0,384 |
4 |
Nat
Struct Mol Biol 14,138-146 (2007) |
ALPS1
[199-234] 4Ki |
FLNSAMSKLKSGWSKFKTGASKFASAAKEGATKFGS |
YES |
36 |
0,221 |
0,407 |
6 |
Nat
Struct Mol Biol 14,138-146 (2007) |
|
FLNSAMSSLYKGWSSFTKGASKFASAAKEGATKFGS |
YES |
36 |
0,302 |
0,446 |
4 |
Nat
Struct Mol Biol 14,138-146 (2007) |
|
FLNSAMEKLKEGWEKFKEGASKFASAAKEGATKFGS |
YES |
36 |
0,146 |
0,476 |
2 |
Nat
Struct Mol Biol 14,138-146 (2007) |
MinD
[248-266] |
VLEEQNKGMMAKIKSFFGVRS |
YES |
21 |
0,292 |
0,322 |
2 |
J
Biol Chem 25, 22193-22198 (2003) |
MinD
[248-266] mutant 1 |
IKVSVFNESRAEFGLKQGMKM |
NO |
21 |
0,292 |
0,068 |
2 |
J
Biol Chem 278, 40050-40056 (2003) |
MinD
[248-266] mutant 2 |
IEEEKKGFLKREFGG |
NO |
15 |
0,036 |
0,277 |
0 |
J
Biol Chem 278, 40050-40056 (2003) |
Epsin
[ 1-18 ] |
MSTSSLRRQMKNIVHNYS |
YES |
18 |
0,219 |
0,367 |
3 |
Nature
419:361-366 (2002) |
Epsin
[1-18] (L6E) |
MSTSSERRQMKNIVHNYS |
NO |
18 |
0,089 |
0,241 |
2 |
Nature
419:361-366 (2002) |
FtsA
[406-420] |
GSWIKRLNSWLRKEF |
YES |
15 |
0,411 |
0,609 |
3 |
Science
320, 792-794 (2008) |
ColicinE1
[381-405] |
KKIGNVNEALAAFEKYKDVLNKKFS |
YES |
25 |
0,131 |
0,369 |
3 |
Biochemistry
46, 6074-6085 (2007) |
FtsY
NG+1 |
MFARLKRSLLKTKENLG |
YES |
17 |
0,242 |
0,498 |
4 |
J
Biol Chem 282,32176-32184 (2007) |
FtsY
NG |
MARLKRSLLKTKENLG |
NO |
16 |
0,146 |
0,477 |
4 |
J
Biol Chem 282,32176-32184 (2007) |
aAbility of a
segment to bind in vitro to large liposomes containing phosphatidylcholine
and negatively-charged lipids (phosphatidylserine, phosphatidylinositol or
phosphatidylglycerol)
Boolean data (“Yes”
for a lipid-binding segment, “No” for a non lipid-binding segment) were
subjected to Steepwise Discriminant Analysis module implemented in TSAR 3.3
(Oxford Molecular) with segment length, <H>, <µH>, z as explanatory
variables A stepwise procedure was used
to select a subset of the explanatory variables and optimize the classification
rule.
Table
2. Statistics
of stepping discriminant analysis using parameters calculated by HELIQUEST
“Yes”
predicted “Yes” |
“Yes” predicted “Yes”
(CV)a |
“No” predicted
“No” |
“No”
predicted “No” (CV) |
Confidence
for “Yes” |
Confidence
for “No” |
Overall
confidence |
28 |
27 |
14 |
12 |
0.8182 |
0.8000 |
0.8125 |
a Results of the
cross-validation run.
The analysis of
SDA reveals that the variables <µH> and z enter as explanatory variables
into the model in the first two steps of stepping and offer significant
classification of the data, whereas the other variable (H and segment length)
never enters into the model. Of the 15 non lipid-binding segments, 14 were
predicted correctly and of the 33 lipid-biding segments, 28 were predicted
correctly. The confidence for correctly predicting the ‘binding’ helices is 0.8182, for
‘non binding’ segment is 0.800, and
the overall confidence is 0.8125 (Table 2). Thus, the <µH> and z value calculated
by HELIQUEST was able to classify the molecules into a “lipid-binding” and “non
lipid-binding” classes with a high level of statistical confidence.