| Links: | BOTTOM |
|
|
reference predict_h16490 (Jun 20, 2000 00:24:39) reference pred_h16490 (Jun 20, 2000 00:25:05) PPhdr from: kapilm@cs.brandeis.edu PPhdr resp: MAIL PPhdr orig: HTML PPhdr want: HTML PPhdr password(###) prediction of: - default prediction of: - PHDsec PHDacc PHDhtm ProSite SEG ProDom return msf format ret html # default: single protein sequence description=L-lysine 2,3-aminomutase MINRRYELFKDVSDADWNDWRWQVRNRIETVEELKKYIPLTKEEEEGVAQCVKSLRMAITPYYLSLIDPNDPNDPVRKQAIPTALELNKAAADLEDPLHEDTDSPVPGLTHRYPDRVLLLITDMCSMYCRHCTRRRFAGQSDDSMPMERIDKAIDYIRNTPQVRDVLLSGGDALLVSDETLEYIIAKLREIPHVEIVRIGSRTPVVLPQRITPELVNMLKKYHPVWLNTHFNHPNEITEESTRACQLLADAGVPLGNQSVLLRGVNDCVHVMKELVNKLVKIRVRPYYIYQCDLSLGLEHFRTPVSKGIEIIEGLRGHTSGYCVPTFVVDAPGGGGKTPVMPNYVISQSHDKVILRNFEGVITTYSEPINYTPGCNCDVCTGKKKVHKVGVAGLLNGEGMALEPVGLERNKRHVQE
-------------------------------------------------------------
Pattern-ID: ASN_GLYCOSYLATION PS00001 PDOC00001
Pattern-DE: N-glycosylation site
Pattern: N[^P][ST][^P]
257 NQSV
Pattern-ID: PKC_PHOSPHO_SITE PS00005 PDOC00005
Pattern-DE: Protein kinase C phosphorylation site
Pattern: [ST].[RK]
54 SLR
110 THR
133 TRR
241 STR
381 TGK
Pattern-ID: CK2_PHOSPHO_SITE PS00006 PDOC00006
Pattern-DE: Casein kinase II phosphorylation site
Pattern: [ST].{2}[DE]
13 SDAD
30 TVEE
41 TKEE
65 SLID
83 TALE
169 SGGD
364 TYSE
Pattern-ID: MYRISTYL PS00008 PDOC00008
Pattern-DE: N-myristoylation site
Pattern: G[^EDRKHPFYW].{2}[STAGCN][^P]
47 GVAQCV
252 GVPLGN
264 GVNDCV
360 GVITTY
393 GLLNGE
406 GLERNK
Pattern-ID: AMIDATION PS00009 PDOC00009
Pattern-DE: Amidation site
Pattern: .G[RK][RK]
381 TGKK
Pattern-ID: ATP_GTP_A PS00017 PDOC00017
Pattern-DE: ATP/GTP-binding site motif A (P-loop)
Pattern: [AG].{4}GK[ST]
331 APGGGGKT
Pattern-ID: LEUCINE_ZIPPER PS00029 PDOC00029
Pattern-DE: Leucine zipper pattern
Pattern: L.{6}L.{6}L.{6}L
167 LLSGGDALLVSDETLEYIIAKL
>prot (#) ppOld, default: single protein sequence description=l-lysine 2,3-aminomutase /home/phd/server/work/predict_h16490
MINRRYELFKDVSDADWNDWRWQVRNRIETVEELKKYIPLTKEEEEGVAQCVKSLRMAIT
PYYLSLIDPNDPNDPVRKQAIPTALELNKAAADLEDPLHEDTDSPVPGLTHRYPDRVLLL
ITDMCSMYCRHCTRRRFAGQSDDSMPMERIDKAIDYIRNTPQVRDVLLSGGDALLVSDET
LEYIIAKLREIPHVEIVRIGSRTPVVLPQRITPELVNMLKKYHPVWLNTHFNHPNEITEE
STRACQLLADAGVPLGNQSVLLRGVNDCVHVMKELVNKLVKIRVRPYYIYQCDLSLGLEH
FRTPVSKGIEIIEGLRGHTSGYCVPTFVVDAPGGGGKTPVMPNYVISQSHDKVILRNFEG
VITTYSEPINYTPGCNCDVCTxxxxxxxxxxxx
LLNGEGMALEPVGLERNKRHVQE
Identities computed with respect to: (query) prot Colored by: consensus/70% and property
HSP processing: ranked
17 [ . . . : . . . . 1 . . . . : . . . . 2 . . . . : . . . . 3 . . . . : . . ] 373
prot (#) ppOld, default: single ... score P(N) N 100.0% WNDWRWQVRNRIETVEELKKYIPLTKEEEEXXXXXXXXXXXXXXXXXXXXXDPNDPNDPVRKQAIPTALELNKAAADLEDPLHEDTDSPVPGLTHRYPDRVLLLITDMCSMYCRHCTRRRFAGQSDDSMPMERIDKAIDYIRNTPQVRDVLLSGGDALLVSDETLEYIIAKLREIPHVEIVRIGSRTPVVLPQRITPELVNMLXXYHPVWLNTHFNHPNEITEESTRACQLLADAGVPLGNQSVLLRGVNDCVHVMKELVNKLVKIRVRPYYIYQCDLSLGLEHFRTPVSKGIEIIEGLRGHTSGYCVPTFVVDAPGGGGKTXVMPNYVISQSHDKVILRNFEGVITTYSEPINYTP
1 PD008727 p99.2 (8) YJEK(2) // PROTE... 351 1.0e-121 5 53.2% WFKWLWQLTNGVKTLKELRKVLNLKVEDED---------------------NPYVEBDPIRRQVIPTEWEIEKZVWHKEDFMGEDEYSPVPGLTHRYPDRVLLLVTDSCAVYCRYCFRRWFIQQENQGVPKEEVEKALDYIREHPEINEVLISGGDPLTLSDHKLEKLLKRLREIPHVKIIRIGTRLPVVAPQRITDDLLELL--YKPIWIMTHINHPYEITEEAREAVEKLRKTGIPIYNQSVLLRGVNDDFETLATLFHALTKIGVKPYYLFQCDPTPGTGHFRVPIEETLEIMRTLRGRISGYAIPTLAVDLPGGGGKT-----------------------------------
2 PD041312 p99.2 (1) YODO_BACSU // HYP... 137 2.4e-11 1 76.5% -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------LQPNYVLSQSPDKVILRNFEGVITSYPEPENYIP
3 PD091955 p99.2 (1) O34400_BACSU // Y... 58 0.083 2 23.5% ---------------------------------------------------------------------------------------------------------TTLCNMRCEHC-----------------IDLLLKRLEEIPRLRSISITGGEPMLSLKSVKEYVVPLLK----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
consensus/100% .............................. ........................................................................................................................................................ ..................................................................................................................... ..................................
consensus/90% .............................. ........................................................................................................................................................ ..................................................................................................................... ..................................
consensus/80% .............................. ........................................................................................................................................................ ..................................................................................................................... ..................................
consensus/70% .............................. ......................................................Ts.CshhCcaC.................l-hhlchlcphPplppl.loGG-shh..cphhEhll.hL+.............................. ..................................................................................................................... ..................................
|
--- ------------------------------------------------------------ --- --- Again: these results were obtained based on the domain data- --- base collected by Daniel Kahn and his coworkers in Toulouse. --- --- PLEASE quote: --- F Corpet, J Gouzy, D Kahn (1998). The ProDom database --- of protein domain families. Nucleic Ac Res 26:323-326. --- --- The general WWW page is on: ---- --------------------------------------- --- http://www.toulouse.inra.fr/prodom.html ---- --------------------------------------- --- --- For WWW graphic interfaces to PRODOM, in particular for your --- protein family, follow the following links (each line is ONE --- single link for your protein!!): --- http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=PD008727 ==> multiple alignment, consensus, PDB and PROSITE links of domain PD008727 http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=PD008727 ==> graphical output of all proteins having domain PD008727 http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=PD041312 ==> multiple alignment, consensus, PDB and PROSITE links of domain PD041312 http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=PD041312 ==> graphical output of all proteins having domain PD041312 http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=PD091955 ==> multiple alignment, consensus, PDB and PROSITE links of domain PD091955 http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=PD091955 ==> graphical output of all proteins having domain PD091955 --- --- NOTE: if you want to use the link, make sure the entire line --- is pasted as URL into your browser! --- --- END of PRODOM --- ------------------------------------------------------------
--- ------------------------------------------------------------ --- MAXHOM multiple sequence alignment --- ------------------------------------------------------------ --- --- MAXHOM ALIGNMENT HEADER: ABBREVIATIONS FOR SUMMARY --- ID : identifier of aligned (homologous) protein --- STRID : PDB identifier (only for known structures) --- PIDE : percentage of pairwise sequence identity --- WSIM : percentage of weighted similarity --- LALI : number of residues aligned --- NGAP : number of insertions and deletions (indels) --- LGAP : number of residues in all indels --- LSEQ2 : length of aligned sequence --- ACCNUM : SwissProt accession number --- NAME : one-line description of aligned protein --- --- MAXHOM ALIGNMENT HEADER: SUMMARY ID STRID IDE WSIM LALI NGAP LGAP LEN2 ACCNUM NAME yodo_bacsu 60 73 410 1 5 471 O34676 HYPOTHETICAL 54.1 KD PROT y454_aquae 48 62 366 1 1 370 O66761 HYPOTHETICAL PROTEIN AQ_4 y121_trepa 35 44 336 3 11 355 O83158 HYPOTHETICAL PROTEIN TP01 yjek_haein 34 48 320 2 2 338 P44641 HYPOTHETICAL PROTEIN HI03 yg32_aquae 34 43 350 6 43 374 O67554 HYPOTHETICAL PROTEIN AQ_1 yjek_ecoli 33 46 327 4 4 342 P39280 HYPOTHETICAL 38.7 KD PROT yjek_bucap 34 46 143 1 1 144 Q44634 HYPOTHETICAL PROTEIN IN G --- --- MAXHOM ALIGNMENT: IN MSF FORMAT
--- --- Version of database searched for alignment: --- SWISS-PROT release 38.0 (7/99) with 80000 proteins ---
Identities computed with respect to: (1) predict_h1640 Colored by: consensus/70% and property
1 [ . . . . : . . . . 1 . . . . : . . . . 2 . . . . : . . . . 3 . . . . : . . . . 4 . ] 416
1 predict_h1640 100.0% MINRRYELFKDVSDADWNDWRWQVRNRIETVEELKKYIPLTKEEEEGVAQCVKSLRMAITPYYLSLIDPNDPNDPVRKQAIPTALELNKAAADLEDPLHEDTDSPVPGLTHRYPDRVLLLITDMCSMYCRHCTRRRFAGQSDDSMPMERIDKAIDYIRNTPQVRDVLLSGGDALLVSDETLEYIIAKLREIPHVEIVRIGSRTPVVLPQRITPELVNMLKKYHPVWLNTHFNHPNEITEESTRACQLLADAGVPLGNQSVLLRGVNDCVHVMKELVNKLVKIRVRPYYIYQCDLSLGLEHFRTPVSKGIEIIEGLRGHTSGYCVPTFVVDAPGGGGKTPVMPNYVISQSHDKVILRNFEGVITTYSEPINYTPGCNCDVCTGKKKVHKVGVAGLLNGEGMALEPVGLERNKRHVQE
2 yodo_bacsu 60.2% ---KEIELWKDVPEEKWNDWLWQLTHTVRTLDDLKKVINLTEDEEEGVRISTKTIPLNITPYYASLMDPDNPRCPVRMQSVPLSEEMHKTKYDLEDPLHEDEDSPVPGLTHRYPDRVLFLVTNQCSMYCRYCTRRRFSGQIGMGVPKKQLDAAIAYIRETPEIRDCLISGGDGLLINDQILEYILKELRSIPHLEVIRIGTRAPVVFPQRITDHLCEILKKYHPVWLNTHFNTSIEMTEESVEACEKLVNAGVPVGNQAVVLAGINDSVPIMKKLMHDLVKIRVRPYYIYQCDLSEGIGHFRAPVSKGLEIIEGLRGHTSGYAVPTFVVDAPGGGGKIALQPNYVLSQSPDKVILRNFEGVITSYPEPENYIPNQADAYfeTADKKEPIGLSAIFADKEVSFTPENVDRIKRR---
3 y454_aquae 47.8% -------FFENVPENLWRSYEWQIQNRIKTLKEIKKYLKLLPEEEEGIKRTQGLYPFAITPYYLSLINPEDPKDPIRLQAIPRVVEVDEKVQSAGEPDALKEEGDIPGLTHRYPDRVLLNVTTFCAVYCRHCMRKRIFSQGERARTKEEIDTMIDYIKRHEEIRDVLISGGEPLSLSLEKLEYLLSRLREIKHVEIIRFGTRLPVLAPQRFFnkLLDILEKYSPIWINTHFNHPNEITEYAEEAVDRLLRRGIPVNNQTVLLKGVNDDPEVMLKLFRKLLRIKVKPQYLFHCDPIKGAVHFRTTIDKGLEIMRYLRGRLSGFGIPTYAVDLPGGKGKVPLLPNYVKKRKGNKFWFESFTGEVVEYEVTEVWEP-------------------------------------------
4 y121_trepa 34.8% -----------------------------TREQRKRRGAGRADEHWRTLsaADALTEHISPAYAHLIAqgADAQALKRQVCFAPQERVVHACECADPLGEDRYCVTPFLVHQYANRVLMLATGRCFSHCRYCFRRGFIAQRAGWIPNEEREKIITYLRATPSVKEILVSGGDPLTGSFAQVTSLFRALRSVAPDLIIRLCTRAVTFAPQAFTPELIAFLQEMKPVWIIPHINHPAELGSTQRAVLEACVGAGLPVQSQSVLLRGVNDSVETLCTLFHALTCLGVKPGYLFQLDLAPGTGDFRVPLSDTLALWRTLKERLSGLSLPTLAVDLPGGGGKFPLvqDVTWHQEREAFSARGIDGAWYTY---------------------------------------------------
5 yjek_haein 33.6% -----------------QNWLTILKNAISDPKLLLKALNLPEDDFEQSIAARKLFSLRVPQPFIDKIEKGNPQDPLFLQVMCSDLEFVQAEGFSTDPLEEKNANAVPNILHKYRNRLLFMAKGGCAVNCRYCFRRHFPYDENPGNKKS-WQLALDYIAAHSEIEEVIFSGGDPLMAKDHELAWLIKHLENIPHLQRLRIHTRLPVVIPQRITDEFCTLLAETrqTVMVTHINHPNEIDQIFAHAMQKLNAVNVTLLNQSVLLKGVNDDAQILKILSDKLFQTGILPYYLHLLDKVQGASHFLISDIEAMQIYKTLQSLTSGYLVPKLAREIAGEPNKT------------------------------------------------------------------------------
6 yg32_aquae 31.7% -----------------------MGKKLKYIIDLKFIEEIPEEERRELEKVTEKFAFRTNTYYNSLINWDNPNDPIRRIVIPTTEELEVWGK--LDASNESKYMKVHGLEHKYPDTALLLVTDVCGIYCRFCFRKRLFMNDNDEVARD-VSEGLEYIRNHPEINNVLLTGGDPLILATFKLEKILKALAEIPHVRIVRIGSKMLAVNPFRVlpKLLELFEWfkKLYLMNHFNHPRELTKEARKAVELVQKTGTTLTNQTPILKGINDDFETLKTLLEELSFIGVPPYYVFQCRPTAGNKAYSTPIEETIDLVEAVRAEVSGL----------------AARVRYVMSHETGKIEILGKTDEHIFFRYHRAADPENRGKFmvAEYKSSLSGVS------------------------
7 yjek_ecoli 32.5% -----------LNTPSREDWLTQLADVVTDPDELLRLLNIDAEEKLLAGRSAKKLflRVPRSFIDRMEKGNPDDPLLRQVLTSQDEFVIAPGFSTDPLEEQ-HSVVPGLLHKYHNRALLLVKGGCAVNCRYCFRRHFPYAENQGNKRN-WQTALEYVAAHPELDEMIFSGGDPLMAKDHELDWLLTQLEAIPHIKRLRIHSRLPIVIPARITEALVECFARStqILLVNHINHANEVDETFRQAMAKLRRVGVTLLNQSVLLRDVNDNAQTLANLSNALFDAGVMPYYLHVLDKVQGAAHFMVSDDEARQIMRELLTLVSGYLVPKLAREIGGEPSKTPL----------------------------------------------------------------------------
8 yjek_bucap 34.3% ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------IHTRLPIVIPNRITSDLCQIFSNslKIIIVTHINHPQEINEQLSDSLLKLKKSNVILLNQSVLLKNINDNAIILAELSSRLCENNIIPYYLHILDKVKGTSHFLVSNKKAKSIISDLMKMISGFLVPRLVFDNGSKDNKLIII---------------------------------------------------------------------------
consensus/100% ......................................................................................................................................................................................................hto+h.hh.P.thh.thhthht....hhh.sHhNps.Ehsp...tsh.hh.t.sh.l.sQsslLtslNDs..hhh.L.ptL...tl.P.Yla.hc...G..ta.hs..cshtlhp.l.t.hSGh..............................................................................................
consensus/90% ......................................................................................................................................................................................................hto+h.hh.P.thh.thhthht....hhh.sHhNps.Ehsp...tsh.hh.t.sh.l.sQsslLtslNDs..hhh.L.ptL...tl.P.Yla.hc...G..ta.hs..cshtlhp.l.t.hSGh..............................................................................................
consensus/80% ..............................hhhh..h..--......s.t.h..ths..a.phht.tsstpslhh.sh....Eh..h.....-s.t.p....h..l.HpY.sphLh.hps.C...CRaChR+th..t.t.....p.hp.hltYlttp.plpphlhoGG-sL.ht...lt.lhttLttl....hlRltoRhshlhPtRhhscLhphhtp.p.lhl.sHhNHstElsp..tpuhthl.tsslsl.NQoVlL+slNDss.hhtpL.ptLhphtlhPhYla.hD.s.GhtcFhss.pcshplhctLhthhSGh.lPphsh-.sut.sKh.h.........................................................................
consensus/70% ................htphlps.cplh+hhtl.t-Ec.thhtstchh.htls..ahshhp.ssPpsPlhhQshsts.Eh..t.t..tDP.tEpp.sslPsLhH+Y.sRsLhhspshCuh.CRaChR++h.hpts.t..pp.hpthlpYlttpsplc-hlhSGGDsLhhp.tpLphllptLcpIsHlphlRItoRhPlVhPpRlTscLhphhpchp.lhlssHhNHPpElscp.pcAhptLhpsGlslhNQoVLL+GlNDssphhtpL.pcLhphtVhPYYla.hDhstGssHFhss.pcuhpIhcsLpshhSGahlPphsh-hsGtssKhsl....................................................
|
****************************************************************************
* *
* Prediction of: *
* - secondary structure, by PHDsec *
* - solvent accessibility, by PHDacc *
* *
* PHD: Profile fed neural network systems from HeiDelberg *
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *
* *
* Author: Burkhard Rost *
* EMBL, Heidelberg, FRG *
* Meyerhofstrasse 1, 69 117 Heidelberg *
* Internet: Predict-Help@EMBL-Heidelberg.DE *
* *
* All rights reserved. *
* *
****************************************************************************
* *
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *
* Secondary structure prediction by PHDsec: *
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *
* *
* Author: Burkhard Rost *
* EMBL, Heidelberg, FRG *
* Meyerhofstrasse 1, 69 117 Heidelberg *
* Internet: Rost@EMBL-Heidelberg.DE *
* *
* All rights reserved. *
* *
* *
****************************************************************************
* *
* About the network method *
* ~~~~~~~~~~~~~~~~~~~~~~~~ *
* *
* The network procedure is described in detail in: *
* 1) Rost, Burkhard; Sander, Chris: *
* Prediction of protein structure at better than 70% accuracy. *
* J. Mol. Biol., 1993, 232, 584-599. *
* *
* A brief description is given in: *
* Rost, Burkhard; Sander, Chris: *
* Improved prediction of protein secondary structure by use of se- *
* quence profiles and neural networks. *
* Proc. Natl. Acad. Sci. U.S.A., 1993, 90, 7558-7562. *
* *
* The PHD mail server is described in: *
* 2) Rost, Burkhard; Sander, Chris; Schneider, Reinhard: *
* PHD - an automatic mail server for protein secondary structure *
* prediction. *
* CABIOS, 1994, 10, 53-60. *
* *
* The latest improvement steps (up to 72%) are explained in: *
* 3) Rost, Burkhard; Sander, Chris: *
* Combining evolutionary information and neural networks to predict *
* protein secondary structure. *
* Proteins, 1994, 19, 55-72. *
* *
* To be quoted for publications of PHD output: *
* Papers 1-3 for the prediction of secondary structure and the pre- *
* diction server. *
* *
****************************************************************************
* *
* About the input to the network *
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *
* *
* The prediction is performed by a system of neural networks. *
* The input is a multiple sequence alignment. It is taken from an HSSP *
* file (produced by the program MaxHom: *
* Sander, Chris & Schneider, Reinhard: Database of Homology-Derived *
* Structures and the Structural Meaning of Sequence Alignment. *
* Proteins, 1991, 9, 56-68. *
* *
* For optimal results the alignment should contain sequences with varying *
* degrees of sequence similarity relative to the input protein. *
* The following is an ideal situation: *
* *
* +-----------------+----------------------+ *
* | sequence: | sequence identity | *
* +-----------------+----------------------+ *
* | target sequence | 100 % | *
* | aligned seq. 1 | 90 % | *
* | aligned seq. 2 | 80 % | *
* | ... | ... | *
* | aligned seq. 7 | 30 % | *
* +-----------------+----------------------+ *
* *
****************************************************************************
* *
* Estimated Accuracy of Prediction *
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *
* *
* A careful cross validation test on some 250 protein chains (in total *
* about 55,000 residues) with less than 25% pairwise sequence identity *
* gave the following results: *
* *
* ++================++-----------------------------------------+ *
* || Qtotal = 72.1% || ("overall three state accuracy") | *
* ++================++-----------------------------------------+ *
* *
* +----------------------------+-----------------------------+ *
* | Qhelix (% of observed)=70% | Qhelix (% of predicted)=77% | *
* | Qstrand(% of observed)=62% | Qstrand(% of predicted)=64% | *
* | Qloop (% of observed)=79% | Qloop (% of predicted)=72% | *
* +----------------------------+-----------------------------+ *
*..........................................................................*
* *
* These percentages are defined by: *
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *
* *
* | number of correctly predicted residues *
* |Qtotal = --------------------------------------- (*100)*
* | number of all residues *
* | *
* | no of res correctly predicted to be in helix *
* |Qhelix (% of obs) = -------------------------------------------- (*100)*
* | no of all res observed to be in helix *
* | *
* | *
* | no of res correctly predicted to be in helix *
* |Qhelix (% of pred)= -------------------------------------------- (*100)*
* | no of all residues predicted to be in helix *
* *
*..........................................................................*
* *
* Averaging over single chains *
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *
* *
* The most reasonable way to compute the overall accuracies is the above *
* quoted percentage of correctly predicted residues. However, since the *
* user is mainly interested in the expected performance of the prediction *
* for a particular protein, the mean value when averaging over protein *
* chains might be of help as well. Computing first the three state *
* accuracy for each protein chain, and then averaging over 250 chains *
* yields the following average: *
* *
* +-------------------------------====--+ *
* | Qtotal/averaged over chains = 72.2% | *
* +-------------------------------====--+ *
* | standard deviation = 9.3% | *
* +-------------------------------------+ *
* *
*..........................................................................*
* *
* Further measures of performance *
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *
* *
* Matthews correlation coefficient: *
* *
* +---------------------------------------------+ *
* | Chelix = 0.63, Cstrand = 0.53, Cloop = 0.52 | *
* +---------------------------------------------+ *
*..........................................................................*
* *
* Average length of predicted secondary structure segments: *
* *
* . +------------+----------+ *
* . | predicted | observed | *
* +-----------+------------+----------+ *
* | Lhelix = | 10.3 | 9.3 | *
* | Lstrand = | 5.0 | 5.3 | *
* | Lloop = | 7.2 | 5.9 | *
* +-----------+------------+----------+ *
*..........................................................................*
* *
* The accuracy matrix in detail: *
* *
* +---------------------------------------+ *
* | number of residues with H, E, L | *
* +---------+------+------+------+--------+ *
* | |net H |net E |net L |sum obs | *
* +---------+------+------+------+--------+ *
* | obs H |12447 | 1255 | 3990 | 17692 | *
* | obs E | 949 | 7493 | 3750 | 12192 | *
* | obs L | 2604 | 2875 |19962 | 25441 | *
* +---------+------+------+------+--------+ *
* | sum Net |16000 |11623 |27702 | 55325 | *
* +---------+------+------+------+--------+ *
* *
* Note: This table is to be read in the following manner: *
* 12447 of all residues predicted to be in helix, were observed to *
* be in helix, 949 however belong to observed strands, 2604 to *
* observed loop regions. The term "observed" refers to the DSSP *
* assignment of secondary structure calculated from 3D coordinates *
* of experimentally determined structures (Dictionary of Secondary *
* Structure of Proteins: Kabsch & Sander (1983) Biopolymers, 22, *
* 2577-2637). *
* *
****************************************************************************
* *
* Position-specific reliability index *
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *
* *
* The network predicts the three secondary structure types using real *
* numbers from the output units. The prediction is assigned by choosing *
* the maximal unit ("winner takes all"). However, the real numbers *
* contain additional information. *
* E.g. the difference between the maximal and the second largest output *
* unit can be used to derive a "reliability index". This index is given *
* for each residue along with the prediction. The index is scaled to *
* have values between 0 (lowest reliability), and 9 (highest). *
* The accuracies (Qtot) to be expected for residues with values above a *
* particular value of the index are given below as well as the fraction *
* of such residues (%res).: *
* *
* +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ *
* | index| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | *
* | %res |100.0| 99.2| 90.4| 80.9| 71.6| 62.5| 52.8| 42.3| 29.8| 14.1| *
* +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ *
* | | | | | | | | | | | | *
* | Qtot | 72.1| 72.3| 74.8| 77.7| 80.3| 82.9| 85.7| 88.5| 91.1| 94.2| *
* | | | | | | | | | | | | *
* +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ *
* | H%obs| 70.4| 70.6| 73.7| 77.1| 80.1| 83.1| 86.0| 89.3| 92.5| 96.4| *
* | E%obs| 61.5| 61.7| 63.7| 66.6| 69.1| 71.7| 74.6| 77.0| 77.8| 68.1| *
* | | | | | | | | | | | | *
* | H%prd| 77.8| 78.0| 80.0| 82.6| 84.7| 86.9| 89.2| 91.3| 93.1| 95.4| *
* | E%prd| 64.5| 64.7| 67.8| 71.0| 74.2| 77.6| 81.4| 85.1| 89.8| 93.5| *
* +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ *
* *
* The above table gives the cumulative results, e.g. 62.5% of all *
* residues have a reliability of at least 5. The overall three-state *
* accuracy for this subset of almost two thirds of all residues is 82.9%. *
* For this subset, e.g., 83.1% of the observed helices are correctly *
* predicted, and 86.9% of all residues predicted to be in helix are *
* correct. *
* *
*..........................................................................*
* *
* The following table gives the non-cumulative quantities, i.e. the *
* values per reliability index range. These numbers answer the question: *
* how reliable is the prediction for all residues labeled with the *
* particular index i. *
* *
* +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ *
* | index| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | *
* | %res | 8.8| 9.5| 9.3| 9.1| 9.7| 10.5| 12.5| 15.7| 14.1| *
* +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ *
* | | | | | | | | | | | *
* | Qtot | 46.6| 50.6| 57.7| 62.6| 67.9| 74.2| 82.2| 88.3| 94.2| *
* | | | | | | | | | | | *
* +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ *
* | H%obs| 36.8| 42.3| 49.5| 55.2| 61.7| 69.9| 78.8| 87.4| 96.4| *
* | E%obs| 44.7| 44.5| 52.1| 55.4| 60.9| 68.0| 75.9| 81.0| 68.1| *
* | | | | | | | | | | | *
* | H%prd| 49.9| 52.5| 60.3| 64.2| 69.2| 77.5| 85.4| 89.9| 95.4| *
* | E%prd| 41.7| 47.1| 53.6| 57.0| 64.0| 71.6| 78.8| 88.8| 93.5| *
* +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ *
* *
* For example, for residues with Relindex = 5 64% of all predicted betha- *
* strand residues are correctly identified. *
* *
* *
****************************************************************************
* *
* *
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *
* Solvent accessibility prediction by PHDacc: *
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *
* *
* Author: Burkhard Rost *
* EMBL, Heidelberg, FRG *
* Meyerhofstrasse 1, 69 117 Heidelberg *
* Internet: Rost@EMBL-Heidelberg.DE *
* *
* All rights reserved. *
* *
* *
****************************************************************************
* *
* About the network method *
* ~~~~~~~~~~~~~~~~~~~~~~~~ *
* *
* The network for prediction of secondary structure is described in *
* detail in: *
* Rost, Burkhard; Sander, Chris: *
* Prediction of protein structure at better than 70% accuracy. *
* J. Mol. Biol., 1993, 232, 584-599. *
* *
* The analysis of the prediction of solvent exposure is given in: *
* Rost, Burkhard; Sander, Chris: *
* Conservation and prediction of solvent accessibility in protein *
* families. Proteins, 1994, 20, 216-226. *
* *
* To be quoted for publications of PHD exposure prediction: *
* Both papers quoted above. *
* *
****************************************************************************
* *
* Definition of accessibility *
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~ *
* *
* For training the residue solvent accessibility the DSSP (Dictionary of *
* Secondary Structure of Proteins; Kabsch & Sander (1983) Biopolymers, 22,*
* 2577-2637) values of accessible surface area have been used. The *
* prediction provides values for the relative solvent accessibility. The *
* normalisation is the following: *
* *
* | ACCESSIBILITY (from DSSP in Angstrom) *
* |RELATIVE_ACCESSIBILITY = ------------------------------------- * 100 *
* | MAXIMAL_ACC (amino acid type i) *
* *
* where MAXIMAL_ACC (i) is the maximal accessibility of amino acid type i.*
* The maximal values are: *
* *
* +----+----+----+----+----+----+----+----+----+----+----+----+ *
* | A | B | C | D | E | F | G | H | I | K | L | M | *
* | 106| 160| 135| 163| 194| 197| 84| 184| 169| 205| 164| 188| *
* +----+----+----+----+----+----+----+----+----+----+----+----+ *
* | N | P | Q | R | S | T | V | W | X | Y | Z | *
* | 157| 136| 198| 248| 130| 142| 142| 227| 180| 222| 196| *
* +----+----+----+----+----+----+----+----+----+----+----+ *
* *
* Notation: one letter code for amino acid, B stands for D or N; Z stands *
* for E or Q; and X stands for undetermined. *
* *
* The relative solvent accessibility can be used to estimate the number *
* of water molecules (W) in contact with the residue: *