Help & Methods

Steepwise Discriminant analysis

 

Table 1.List of segments used for the SDA.

Name

Sequence

Lipid

Bindinga

L

<H>

<µH>

z

Reference

pAntp

RQIKIWFQNRRMKWKK

YES

16

0,193

0,327

7

Biochemistry 40,1824-1834 (2001)

AP2AL

RQIKIWFQAARMLWKK

YES

16

0,501

0,51

5

Biochemistry 40,1824-1834 (2001)

18A

DWLKAFYDKVAEKLKEAF

YES

18

0,309

0,565

0

J Biol Chem 269,23904-23910 (1994)

18L

GIKKFLGSIWKFIKAFVG

YES

18

0,681

0,698

4

Biochim Biophys Acta 1368,343-354 (1998)

Mastoparan X

INWKGIAAMAKKLL

YES

14

0,56

0,419

3

Biochemistry 35, 8450-8456 (1996)

Mellitin

GIGAVLKVLTTGLPALISWIKRKRQQ

YES

26

0,511

0,394

5

Biochim Biophys Acta 1667,26-37 (2004)

M2a

GIGKFLHSAKKFGKAFVGEIMNS

YES

23

0,373

0,475

3

Biochemistry 35, 10844-10853 (1996)

Cecropin P1

SWLSKTAKKLENSAKKRISEGIAIAIQGGPR

YES

31

0,188

0,214

5

Biochemistry 34,11479-11488  (1995)

p25 CoxIV

MLSLRQSIRFFKPATRTLCSSRYLL

YES

25

0,55

0,222

5

Biochemistry 35, 3141-3146 (1996)

PhoE signal peptide

MKKSTLALVVMGIVASASVQA

YES

21

0,558

0,045

2

Biochemistry 31, 1672-1677 (1992)

TMX1

WNALAAVAAALAAVAAALAAVAASKSKSKSK

YES

31

0,353

0,215

4

Biochemistry  43, 5782-5791 (2004)

Helix 1

NPAEAAARAEACRRAWAAAAARARA

YES

25

0,077

0,18

3

Protein Eng 11,539-547 (1998)

Helix 2

NENAKKAAACKKKWEELKKKLAALK

YES

25

-0,051

0,35

6

Biochemistry 36, 12869-80 (1997)

Helix 4

NELNAKLAACKKKWEELKKKLAALK

YES

25

0,112

0,443

5

Biochemistry 36, 12869-12880 (1997)

Helix 5

NELKKKLELCKAKWLEAKKKLEALK

YES

25

0,114

0,361

5

Biochemistry 36, 12869-12880 (1997)

100-M2a

GIAKFGKAAAHFGKKWVGELMNS

YES

23

0,344

0,461

3

Biochemistry 36, 12869-12880 (1997)

120-M2a

GIGKFLHSAKKFGKAWVGEIMNS

YES

23

0,393

0,492

3

Biochemistry 36, 12869-12880 (1997)

140-M2a

GIGKFLHTLKTFGKKWVGEIMNS

YES

23

0,465

0,53

3

Biochemistry 36, 12869-12880 (1997)

160-M2a

GIGKFLHKVKSFGKSWIGEIMNS

YES

23

0,443

0,525

3

Biochemistry 36, 12869-12880 (1997)

180-M2a

GIGKFLHKVGSFIKSWKGEIMNS

YES

23

0,443

0,549

3

Biochemistry 36, 12869-12880 (1997)

Arf1 [1-17]

MGNIFANLFKGLFGKKE

YES

17

0,429

0,462

2

Biochemistry 36, 12869-12880 (1997)

Sar1p [1-23]

MAGWDIFGWFRDVLASLGLWNKH

YES

23

0,707

0,27

0

Nat Cell Biol  3, 531-537 (2001)

Sar1p  [1-23] (W4A)

MAGADIFGWFRDVLASLGLWNKH

NO

23

0,622

0,351

0

Cell 122, 605-617 (2005)

Sar1p  [1-23] (IF-6/7-AA)

MAGWDAAGWFRDVLASLGLWNKH

NO

23

0,577

0,227

0

Cell 122, 605-617 (2005)

dAmph 1[1-33]

MTENKGIMLAKSVQKHAGRAKEKILQNLGKVDR

YES

33

0,098

0,276

5

Science 303,495-499 (2004)

Endophilin A1 [7-35]

KKQFHKATQKVSEKVGGAEGTKLDDDFKE

YES

29

-0,091

0,2

1

J Cell Biol  155,193-200 (2001)

Endophilin A1 [7-35] (F10E)

KKQEHKATQKVSEKVGGAEGTKLDDDFKE

NO

29

-0,175

0,158

0

J Cell Biol  155,193-200 (2001)

Kes1p [7-29]

SSSWTSFLKSIASFNGDLSSLSA

NO

23

0,473

0,423

0

Nat Struct Mol Biol 14,138-146 (2007)

Nup 133 [245-267]

LPQGQGMLSGIGRKVSSLFGILS

NO

23

0,555

0,417

2

Nat Struct Mol Biol 14,138-146 (2007)

GMAP-210 [1-38]

MSSWLGGLGSGLGQSLGQVGGSLASLTGQISNFTKDML

NO

38

0,499

0,475

0

Nat Struct Mol Biol 14,138-146 (2007)

ArfGAP1 [199-234 ]

FLNSAMSSLYSGWSSFTTGASKFASAAKEGATKFGS

NO

36

0,363

0,386

2

EMBO J 24,2244-2253 (2005)

ArfGAP1 [199-234]  (L207A-W211A-F214A)

FLNSAMSSAYSGASSATTGASKFASAAKEGATKFGS

NO

36

0,229

0,267

2

EMBO J 24,2244-2253 (2005)

ArfGAP1 [ 199-234](L207D)

FLNSAMSSDYSGWSSFTTGASKFASAAKEGATKFGS

NO

36

0,294

0,317

1

Biochemistry 46,1779-1790 (2007)

ArfGAP1 [ 264-295 ]

IFDDVSSGVSQLASKVQGVGSKGWRDVTTFFS

NO

32

0,377

0,456

0

Biochemistry 46,1779-1790 (2007)

ArfGAP1 [264-295] (V279D)

IFDDVSSGVSQLASKDQGVGSKGWRDVTTFFS

NO

32

0,315

0,396

-1

Nat Struct Mol Biol 14,138-146 (2007)

ArfGAP1 [199-234] 2Ki

FLNSAMSKLYSGWSSFKTGASKFASAAKEGATKFGS

YES

36

0,302

0,384

4

Nat Struct Mol Biol 14,138-146 (2007)

ALPS1 [199-234] 4Ki

FLNSAMSKLKSGWSKFKTGASKFASAAKEGATKFGS

YES

36

0,221

0,407

6

Nat Struct Mol Biol 14,138-146 (2007)

ALPS [199-234] 2Kt

FLNSAMSSLYKGWSSFTKGASKFASAAKEGATKFGS

YES

36

0,302

0,446

4

Nat Struct Mol Biol 14,138-146 (2007)

ALPS [199-234] 4Ki/4Et

FLNSAMEKLKEGWEKFKEGASKFASAAKEGATKFGS

YES

36

0,146

0,476

2

Nat Struct Mol Biol 14,138-146 (2007)

MinD [248-266]

VLEEQNKGMMAKIKSFFGVRS

YES

21

0,292

0,322

2

J Biol Chem 25, 22193-22198  (2003)

MinD [248-266] mutant 1

IKVSVFNESRAEFGLKQGMKM

NO

21

0,292

0,068

2

J Biol Chem 278, 40050-40056 (2003)

MinD [248-266] mutant 2

IEEEKKGFLKREFGG

NO

15

0,036

0,277

0

J Biol Chem 278, 40050-40056 (2003)

Epsin [ 1-18 ]

MSTSSLRRQMKNIVHNYS

YES

18

0,219

0,367

3

Nature 419:361-366 (2002)

Epsin [1-18] (L6E)

MSTSSERRQMKNIVHNYS

NO

18

0,089

0,241

2

Nature 419:361-366 (2002)

FtsA [406-420]

GSWIKRLNSWLRKEF

YES

15

0,411

0,609

3

Science 320, 792-794 (2008)

ColicinE1 [381-405]

KKIGNVNEALAAFEKYKDVLNKKFS

YES

25

0,131

0,369

3

Biochemistry 46, 6074-6085 (2007)

FtsY NG+1

MFARLKRSLLKTKENLG

YES

17

0,242

0,498

4

J Biol Chem 282,32176-32184 (2007)

FtsY NG

MARLKRSLLKTKENLG

NO

16

0,146

0,477

4

J Biol Chem 282,32176-32184 (2007)

 

aAbility of a segment to bind in vitro to large liposomes containing phosphatidylcholine and negatively-charged lipids (phosphatidylserine, phosphatidylinositol or phosphatidylglycerol)

Boolean data (“Yes” for a lipid-binding segment, “No” for a non lipid-binding segment) were subjected to Steepwise Discriminant Analysis module implemented in TSAR 3.3 (Oxford Molecular) with segment length, <H>, <µH>, z as explanatory variables  A stepwise procedure was used to select a subset of the explanatory variables and optimize the classification rule.

 

Table 2. Statistics of stepping discriminant analysis using parameters calculated by HELIQUEST

“Yes” predicted “Yes”

“Yes”

predicted

“Yes” (CV)a

“No”

predicted “No”

“No” predicted “No” (CV)

Confidence for “Yes”

Confidence for “No”

Overall confidence

28

27

14

12

0.8182

0.8000

0.8125

a Results of the cross-validation run.

 

The analysis of SDA reveals that the variables <µH> and z enter as explanatory variables into the model in the first two steps of stepping and offer significant classification of the data, whereas the other variable (H and segment length) never enters into the model. Of the 15 non lipid-binding segments, 14 were predicted correctly and of the 33 lipid-biding segments, 28 were predicted correctly. The confidence for correctly predicting the ‘binding’ helices is 0.8182, for ‘non binding’ segment is 0.800, and the overall confidence is 0.8125 (Table 2). Thus, the <µH> and z value calculated by HELIQUEST was able to classify the molecules into a “lipid-binding” and “non lipid-binding” classes with a high level of statistical confidence.