Links: BOTTOM PredictProtein Burkhard Rost




Results from PredictProtein for predict_h14877

TOC for file /home/phd/server/work/predict_h14877

  1. The following information has been received by the server (TOC)
  2. PROSITE motif search (A Bairoch; P Bucher and K Hofmann) (TOC)
  3. SEG low-complexity regions (J C Wootton & S Federhen) (TOC)
  4. ProDom domain search (E Sonnhammer; Corpet, Gouzy, D Kahn) (TOC)
  5. MAXHOM alignment header (TOC)
  6. MAXHOM alignment (TOC)
  7. PHD information about accuracy (TOC)
  8. PHD predictions (TOC)
  9. GLOBE prediction of globularity (TOC)
  10. TOPITS (threading) header (TOC)
  11. TOPITS (threading) results in MSF format (TOC)
  12. TOPITS (threading) results in HSSP format (TOC)
  13. TOPITS (threading) results in STRIP format (TOC)
  14. Threading results in TOPITS format (TOC)

END of TOC




BEG of results for file /home/phd/server/work/predict_h14877


The following information has been received by the server


reference predict_h14877 (Jun 26, 2000 00:05:31)
reference pred_h14877 (Jun 26, 2000 00:06:21)
PPhdr from: kapilm@cs.brandeis.edu
PPhdr resp: MAIL
PPhdr orig: HTML
PPhdr want: HTML
PPhdr password(###)
prediction of: - threading             (TOPITS)-
return msf format
ret topits hssp
ret topits strip
ret topits own
ret html
# default: single protein sequence description=Lysine Aminomutase
MINRRYELFKDVSDADWNDWRWQVRNRIETVEELKKYIPLTKEEEEGVAQCVKSLRMAITPYYLSLIDPNDPNDPVRKQAIPTALELNKAAADLEDPLHEDTDSPVPGLTHRYPDRVLLLITDMCSMYCRHCTRRRFAGQSDDSMPMERIDKAIDYIRNTPQVRDVLLSGGDALLVSDETLEYIIAKLREIPHVEIVRIGSRTPVVLPQRITPELVNMLKKYHPVWLNTHFNHPNEITEESTRACQLLADAGVPLGNQSVLLRGVNDCVHVMKELVNKLVKIRVRPYYIYQCDLSLGLEHFRTPVSKGIEIIEGLRGHTSGYCVPTFVVDAPGGGGKTPVMPNYVISQSHDKVILRNFEGVITTYSEPINYTPGCNCDVCTGKKKVHKVGVAGLLNGEGMALEPVGLERNKRHVQE


PROSITE motif search (A Bairoch; P Bucher and K Hofmann)


TOP - BOTTOM - PROSITE
-------------------------------------------------------------
Pattern-ID: ASN_GLYCOSYLATION PS00001 PDOC00001
Pattern-DE: N-glycosylation site
Pattern:    N[^P][ST][^P]
   257      NQSV

Pattern-ID: PKC_PHOSPHO_SITE PS00005 PDOC00005
Pattern-DE: Protein kinase C phosphorylation site
Pattern:    [ST].[RK]
   54       SLR
   110      THR
   133      TRR
   241      STR
   381      TGK

Pattern-ID: CK2_PHOSPHO_SITE PS00006 PDOC00006
Pattern-DE: Casein kinase II phosphorylation site
Pattern:    [ST].{2}[DE]
   13       SDAD
   30       TVEE
   41       TKEE
   65       SLID
   83       TALE
   169      SGGD
   364      TYSE

Pattern-ID: MYRISTYL PS00008 PDOC00008
Pattern-DE: N-myristoylation site
Pattern:    G[^EDRKHPFYW].{2}[STAGCN][^P]
   47       GVAQCV
   252      GVPLGN
   264      GVNDCV
   360      GVITTY
   393      GLLNGE
   406      GLERNK

Pattern-ID: AMIDATION PS00009 PDOC00009
Pattern-DE: Amidation site
Pattern:    .G[RK][RK]
   381      TGKK

Pattern-ID: ATP_GTP_A PS00017 PDOC00017
Pattern-DE: ATP/GTP-binding site motif A (P-loop)
Pattern:    [AG].{4}GK[ST]
   331      APGGGGKT

Pattern-ID: LEUCINE_ZIPPER PS00029 PDOC00029
Pattern-DE: Leucine zipper pattern
Pattern:    L.{6}L.{6}L.{6}L
   167      LLSGGDALLVSDETLEYIIAKL



SEG low-complexity regions (J C Wootton & S Federhen)


TOP - BOTTOM - SEG

>prot (#) ppOld, default: single protein sequence description=lysine aminomutase /home/phd/server/work/predict_h14877
MINRRYELFKDVSDADWNDWRWQVRNRIETVEELKKYIPLTKEEEEGVAQCVKSLRMAIT PYYLSLIDPNDPNDPVRKQAIPTALELNKAAADLEDPLHEDTDSPVPGLTHRYPDRVLLL ITDMCSMYCRHCTRRRFAGQSDDSMPMERIDKAIDYIRNTPQVRDVLLSGGDALLVSDET LEYIIAKLREIPHVEIVRIGSRTPVVLPQRITPELVNMLKKYHPVWLNTHFNHPNEITEE STRACQLLADAGVPLGNQSVLLRGVNDCVHVMKELVNKLVKIRVRPYYIYQCDLSLGLEH FRTPVSKGIEIIEGLRGHTSGYCVPTFVVDAPGGGGKTPVMPNYVISQSHDKVILRNFEG VITTYSEPINYTPGCNCDVCTxxxxxxxxxxxx LLNGEGMALEPVGLERNKRHVQE


ProDom domain search (E Sonnhammer; Corpet, Gouzy, D Kahn)


TOP - BOTTOM - ProDom - MView
Identities computed with respect to: (query) prot
Colored by: consensus/70% and property
HSP processing: ranked
                                                                           17 [  .         .         .         :         .         .         .         .         1         .         .         .         .         :         .         .         .         .         2         .         .         .         .         :         .         .         .         .         3         .         .         .         .         :         .         .  ] 373
  prot           (#) ppOld, default: single ... score      P(N)  N 100.0%     WNDWRWQVRNRIETVEELKKYIPLTKEEEEXXXXXXXXXXXXXXXXXXXXXDPNDPNDPVRKQAIPTALELNKAAADLEDPLHEDTDSPVPGLTHRYPDRVLLLITDMCSMYCRHCTRRRFAGQSDDSMPMERIDKAIDYIRNTPQVRDVLLSGGDALLVSDETLEYIIAKLREIPHVEIVRIGSRTPVVLPQRITPELVNMLXXYHPVWLNTHFNHPNEITEESTRACQLLADAGVPLGNQSVLLRGVNDCVHVMKELVNKLVKIRVRPYYIYQCDLSLGLEHFRTPVSKGIEIIEGLRGHTSGYCVPTFVVDAPGGGGKTXVMPNYVISQSHDKVILRNFEGVITTYSEPINYTP    
1 PD008727       p2000.1 (8) YJEK(2)  // PRO...   351  1.1e-121  5  53.2%     WFKWLWQLTNGVKTLKELRKVLNLKVEDED---------------------NPYVEBDPIRRQVIPTEWEIEKZVWHKEDFMGEDEYSPVPGLTHRYPDRVLLLVTDSCAVYCRYCFRRWFIQQENQGVPKEEVEKALDYIREHPEINEVLISGGDPLTLSDHKLEKLLKRLREIPHVKIIRIGTRLPVVAPQRITDDLLELL--YKPIWIMTHINHPYEITEEAREAVEKLRKTGIPIYNQSVLLRGVNDDFETLATLFHALTKIGVKPYYLFQCDPTPGTGHFRVPIEETLEIMRTLRGRISGYAIPTLAVDLPGGGGKT-----------------------------------    
2 PD041312       p2000.1 (1) YODO_BACSU // H...   137   2.6e-11  1  76.5%     -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------LQPNYVLSQSPDKVILRNFEGVITSYPEPENYIP    
  consensus/100%                                                              ..............................                     ........................................................................................................................................................  ..................................................................................................................... ..................................    
  consensus/90%                                                               ..............................                     ........................................................................................................................................................  ..................................................................................................................... ..................................    
  consensus/80%                                                               ..............................                     ........................................................................................................................................................  ..................................................................................................................... ..................................    
  consensus/70%                                                               ..............................                     ........................................................................................................................................................  ..................................................................................................................... ..................................    
--- ------------------------------------------------------------
--- 
--- Again: these results were obtained based on the domain data-
--- base collected by Daniel Kahn and his coworkers in Toulouse.
--- 
--- PLEASE quote: 
---       F Corpet, J Gouzy, D Kahn (1998).  The ProDom database
---       of protein domain families. Nucleic Ac Res 26:323-326.
--- 
--- The general WWW page is on:
----      ---------------------------------------
---       http://www.toulouse.inra.fr/prodom.html
----      ---------------------------------------
--- 
--- For WWW graphic interfaces to PRODOM, in particular for your
--- protein family, follow the following links (each line is ONE
--- single link for your protein!!):
--- 
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=PD008727 ==> multiple alignment, consensus, PDB and PROSITE links of domain PD008727
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=PD008727 ==> graphical output of all proteins having domain PD008727
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=PD041312 ==> multiple alignment, consensus, PDB and PROSITE links of domain PD041312
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=PD041312 ==> graphical output of all proteins having domain PD041312
--- 
--- NOTE: if you want to use the link, make sure the entire line
---       is pasted as URL into your browser!
--- 
--- END of PRODOM
--- ------------------------------------------------------------


MAXHOM alignment header


--- ------------------------------------------------------------
--- MAXHOM multiple sequence alignment
--- ------------------------------------------------------------
--- 
--- MAXHOM ALIGNMENT HEADER: ABBREVIATIONS FOR SUMMARY
--- ID           : identifier of aligned (homologous) protein
--- STRID        : PDB identifier (only for known structures)
--- IDE          : percentage of pairwise sequence identity
--- WSIM         : percentage of weighted similarity
--- LALI         : number of residues aligned
--- NGAP         : number of insertions and deletions (indels)
--- LGAP         : number of residues in all indels
--- LSEQ2        : length of aligned sequence
--- ACCNUM       : SwissProt accession number
--- OMIM         : OMIM(Online Mendelian Inheritance in Man) ID
--- NAME         : one-line description of aligned protein
--- 
--- MAXHOM ALIGNMENT HEADER: SUMMARY
ID         STRID  IDE WSIM LALI NGAP LGAP LEN2 ACCNUM NAME                     
yodo_bacsu         60   73  410    1    5  471 O34676 HYPOTHETICAL 54.1 KD PROT
y454_aquae         48   62  366    1    1  370 O66761 HYPOTHETICAL PROTEIN AQ_4
y121_trepa         35   44  336    3   11  355 O83158 HYPOTHETICAL PROTEIN TP01
yjek_haein         34   48  320    2    2  338 P44641 HYPOTHETICAL PROTEIN HI03
yg32_aquae         34   43  350    6   43  374 O67554 HYPOTHETICAL PROTEIN AQ_1
yjek_ecoli         33   46  327    4    4  342 P39280 HYPOTHETICAL 38.7 KD PROT
yjek_bucap         34   46  143    1    1  144 Q44634 HYPOTHETICAL PROTEIN IN G
--- 
--- MAXHOM ALIGNMENT: IN MSF FORMAT


--- 
--- Version of database searched for alignment:
--- SWISS-PROT release 38.0 (7/99) with 80000 proteins
--- 

MAXHOM alignment


TOP - BOTTOM - MaxHom - MView
Identities computed with respect to: (1) predict_h1480
Colored by: consensus/70% and property
                          1 [        .         .         .         .         :         .         .         .         .         1         .         .         .         .         :         .         .         .         .         2         .         .         .         .         :         .         .         .         .         3         .         .         .         .         :         .         .         .         .         4         .     ] 416
1 predict_h1480  100.0%     MINRRYELFKDVSDADWNDWRWQVRNRIETVEELKKYIPLTKEEEEGVAQCVKSLRMAITPYYLSLIDPNDPNDPVRKQAIPTALELNKAAADLEDPLHEDTDSPVPGLTHRYPDRVLLLITDMCSMYCRHCTRRRFAGQSDDSMPMERIDKAIDYIRNTPQVRDVLLSGGDALLVSDETLEYIIAKLREIPHVEIVRIGSRTPVVLPQRITPELVNMLKKYHPVWLNTHFNHPNEITEESTRACQLLADAGVPLGNQSVLLRGVNDCVHVMKELVNKLVKIRVRPYYIYQCDLSLGLEHFRTPVSKGIEIIEGLRGHTSGYCVPTFVVDAPGGGGKTPVMPNYVISQSHDKVILRNFEGVITTYSEPINYTPGCNCDVCTGKKKVHKVGVAGLLNGEGMALEPVGLERNKRHVQE    
2 yodo_bacsu      60.2%     ---KEIELWKDVPEEKWNDWLWQLTHTVRTLDDLKKVINLTEDEEEGVRISTKTIPLNITPYYASLMDPDNPRCPVRMQSVPLSEEMHKTKYDLEDPLHEDEDSPVPGLTHRYPDRVLFLVTNQCSMYCRYCTRRRFSGQIGMGVPKKQLDAAIAYIRETPEIRDCLISGGDGLLINDQILEYILKELRSIPHLEVIRIGTRAPVVFPQRITDHLCEILKKYHPVWLNTHFNTSIEMTEESVEACEKLVNAGVPVGNQAVVLAGINDSVPIMKKLMHDLVKIRVRPYYIYQCDLSEGIGHFRAPVSKGLEIIEGLRGHTSGYAVPTFVVDAPGGGGKIALQPNYVLSQSPDKVILRNFEGVITSYPEPENYIPNQADAYfeTADKKEPIGLSAIFADKEVSFTPENVDRIKRR---    
3 y454_aquae      47.8%     -------FFENVPENLWRSYEWQIQNRIKTLKEIKKYLKLLPEEEEGIKRTQGLYPFAITPYYLSLINPEDPKDPIRLQAIPRVVEVDEKVQSAGEPDALKEEGDIPGLTHRYPDRVLLNVTTFCAVYCRHCMRKRIFSQGERARTKEEIDTMIDYIKRHEEIRDVLISGGEPLSLSLEKLEYLLSRLREIKHVEIIRFGTRLPVLAPQRFFnkLLDILEKYSPIWINTHFNHPNEITEYAEEAVDRLLRRGIPVNNQTVLLKGVNDDPEVMLKLFRKLLRIKVKPQYLFHCDPIKGAVHFRTTIDKGLEIMRYLRGRLSGFGIPTYAVDLPGGKGKVPLLPNYVKKRKGNKFWFESFTGEVVEYEVTEVWEP-------------------------------------------    
4 y121_trepa      34.8%     -----------------------------TREQRKRRGAGRADEHWRTLsaADALTEHISPAYAHLIAqgADAQALKRQVCFAPQERVVHACECADPLGEDRYCVTPFLVHQYANRVLMLATGRCFSHCRYCFRRGFIAQRAGWIPNEEREKIITYLRATPSVKEILVSGGDPLTGSFAQVTSLFRALRSVAPDLIIRLCTRAVTFAPQAFTPELIAFLQEMKPVWIIPHINHPAELGSTQRAVLEACVGAGLPVQSQSVLLRGVNDSVETLCTLFHALTCLGVKPGYLFQLDLAPGTGDFRVPLSDTLALWRTLKERLSGLSLPTLAVDLPGGGGKFPLvqDVTWHQEREAFSARGIDGAWYTY---------------------------------------------------    
5 yjek_haein      33.6%     -----------------QNWLTILKNAISDPKLLLKALNLPEDDFEQSIAARKLFSLRVPQPFIDKIEKGNPQDPLFLQVMCSDLEFVQAEGFSTDPLEEKNANAVPNILHKYRNRLLFMAKGGCAVNCRYCFRRHFPYDENPGNKKS-WQLALDYIAAHSEIEEVIFSGGDPLMAKDHELAWLIKHLENIPHLQRLRIHTRLPVVIPQRITDEFCTLLAETrqTVMVTHINHPNEIDQIFAHAMQKLNAVNVTLLNQSVLLKGVNDDAQILKILSDKLFQTGILPYYLHLLDKVQGASHFLISDIEAMQIYKTLQSLTSGYLVPKLAREIAGEPNKT------------------------------------------------------------------------------    
6 yg32_aquae      31.7%     -----------------------MGKKLKYIIDLKFIEEIPEEERRELEKVTEKFAFRTNTYYNSLINWDNPNDPIRRIVIPTTEELEVWGK--LDASNESKYMKVHGLEHKYPDTALLLVTDVCGIYCRFCFRKRLFMNDNDEVARD-VSEGLEYIRNHPEINNVLLTGGDPLILATFKLEKILKALAEIPHVRIVRIGSKMLAVNPFRVlpKLLELFEWfkKLYLMNHFNHPRELTKEARKAVELVQKTGTTLTNQTPILKGINDDFETLKTLLEELSFIGVPPYYVFQCRPTAGNKAYSTPIEETIDLVEAVRAEVSGL----------------AARVRYVMSHETGKIEILGKTDEHIFFRYHRAADPENRGKFmvAEYKSSLSGVS------------------------    
7 yjek_ecoli      32.5%     -----------LNTPSREDWLTQLADVVTDPDELLRLLNIDAEEKLLAGRSAKKLflRVPRSFIDRMEKGNPDDPLLRQVLTSQDEFVIAPGFSTDPLEEQ-HSVVPGLLHKYHNRALLLVKGGCAVNCRYCFRRHFPYAENQGNKRN-WQTALEYVAAHPELDEMIFSGGDPLMAKDHELDWLLTQLEAIPHIKRLRIHSRLPIVIPARITEALVECFARStqILLVNHINHANEVDETFRQAMAKLRRVGVTLLNQSVLLRDVNDNAQTLANLSNALFDAGVMPYYLHVLDKVQGAAHFMVSDDEARQIMRELLTLVSGYLVPKLAREIGGEPSKTPL----------------------------------------------------------------------------    
8 yjek_bucap      34.3%     ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------IHTRLPIVIPNRITSDLCQIFSNslKIIIVTHINHPQEINEQLSDSLLKLKKSNVILLNQSVLLKNINDNAIILAELSSRLCENNIIPYYLHILDKVKGTSHFLVSNKKAKSIISDLMKMISGFLVPRLVFDNGSKDNKLIII---------------------------------------------------------------------------    
  consensus/100%            ......................................................................................................................................................................................................hto+h.hh.P.thh.thhthht....hhh.sHhNps.Ehsp...tsh.hh.t.sh.l.sQsslLtslNDs..hhh.L.ptL...tl.P.Yla.hc...G..ta.hs..cshtlhp.l.t.hSGh..............................................................................................    
  consensus/90%             ......................................................................................................................................................................................................hto+h.hh.P.thh.thhthht....hhh.sHhNps.Ehsp...tsh.hh.t.sh.l.sQsslLtslNDs..hhh.L.ptL...tl.P.Yla.hc...G..ta.hs..cshtlhp.l.t.hSGh..............................................................................................    
  consensus/80%                ..............................hhhh..h..--......s.t.h..ths..a.phht.tsstpslhh.sh....Eh..h.....-s.t.p....h..l.HpY.sphLh.hps.C...CRaChR+th..t.t.....p.hp.hltYlttp.plpphlhoGG-sL.ht...lt.lhttLttl....hlRltoRhshlhPtRhhscLhphhtp.p.lhl.sHhNHstElsp..tpuhthl.tsslsl.NQoVlL+slNDss.hhtpL.ptLhphtlhPhYla.hD.s.GhtcFhss.pcshplhctLhthhSGh.lPphsh-.sut.sKh.h.........................................................................       
  consensus/70%                    ................htphlps.cplh+hhtl.t-Ec.thhtstchh.htls..ahshhp.ssPpsPlhhQshsts.Eh..t.t..tDP.tEpp.sslPsLhH+Y.sRsLhhspshCuh.CRaChR++h.hpts.t..pp.hpthlpYlttpsplc-hlhSGGDsLhhp.tpLphllptLcpIsHlphlRItoRhPlVhPpRlTscLhphhpchp.lhlssHhNHPpElscp.pcAhptLhpsGlslhNQoVLL+GlNDssphhtpL.pcLhphtVhPYYla.hDhstGssHFhss.pcuhpIhcsLpshhSGahlPphsh-hsGtssKhsl....................................................                            


PHD information about accuracy


****************************************************************************
*                                                                          *
*      Prediction of:			                                   *
*	- secondary structure,   		by PHDsec		   *
*	- solvent accessibility, 		by PHDacc		   *
*                                                                          *
*      PHD: Profile fed neural network systems from HeiDelberg             *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~             *
*                                                                          *
*      Author:             Burkhard Rost		                   *
*                          EMBL, Heidelberg, FRG                           *
*                          Meyerhofstrasse 1, 69 117 Heidelberg            *
*                          Internet: Predict-Help@EMBL-Heidelberg.DE       *
*                                                                          *
*      All rights reserved.                                                *
*                                                                          *
****************************************************************************
*                                                                          *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~	                   *
*      Secondary structure prediction by PHDsec:                           *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~	                   *
*                                                                          *
*      Author:             Burkhard Rost		                   *
*                          EMBL, Heidelberg, FRG                           *
*                          Meyerhofstrasse 1, 69 117 Heidelberg            *
*                          Internet: Rost@EMBL-Heidelberg.DE 		   *
*                                                                          *
*      All rights reserved.                                                *
*                                                                          *
*                                                                          *
****************************************************************************
*                                                                          *
*  About the network method                                                *
*  ~~~~~~~~~~~~~~~~~~~~~~~~                                                *
*                                                                          *
*  The network procedure is described in detail in:                        *
*  1) Rost, Burkhard; Sander, Chris:                                       *
*     Prediction of protein structure at better than 70% accuracy.         *
*     J. Mol. Biol., 1993, 232, 584-599.        	                   *
*                                                                          *
*  A brief description is given in:                                        *
*     Rost, Burkhard; Sander, Chris:                                       *
*     Improved prediction of protein secondary structure by use of se-     *
*     quence profiles and neural networks.                                 *
*     Proc. Natl. Acad. Sci. U.S.A., 1993, 90, 7558-7562.   		   *
*                                                                          *
*  The PHD mail server is described in:                                    *
*  2) Rost, Burkhard; Sander, Chris; Schneider, Reinhard:                  *
*     PHD - an automatic mail server for protein secondary structure       *
*     prediction.                                                          *
*     CABIOS, 1994, 10, 53-60.                                             *
*                                                                          *
*  The latest improvement steps (up to 72%) are explained in:              *
*  3) Rost, Burkhard; Sander, Chris:                                       *
*     Combining evolutionary information and neural networks to predict    *
*     protein secondary structure.                                         *
*     Proteins, 1994,  19, 55-72.                                          *
*                                                                          *
*  To be quoted for publications of PHD output:                            *
*     Papers 1-3 for the prediction of secondary structure and the pre-    *
*     diction server.                                                      *
*                                                                          *
****************************************************************************
*                                                                          *
*  About the input to the network                                          *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                         *
*                                                                          *
*  The prediction is performed by a system of neural networks.             *
*  The input is a multiple sequence alignment. It is taken from an HSSP    *
*  file (produced by the program MaxHom:                                   *
*     Sander, Chris & Schneider, Reinhard: Database of Homology-Derived    *
*     Structures and the Structural Meaning of Sequence Alignment.         *
*     Proteins, 1991, 9, 56-68.                                            *
*                                                                          *
*  For optimal results the alignment should contain sequences with varying *
*  degrees of sequence similarity relative to the input protein.           *
*  The following is an ideal situation:                                    *
*                                                                          *
*  +-----------------+----------------------+                              *
*  |   sequence:     |  sequence identity   |                              *
*  +-----------------+----------------------+                              *
*  | target sequence |  100 %               |                              *
*  | aligned seq. 1  |   90 %               |                              *
*  | aligned seq. 2  |   80 %               |                              *
*  |      ...        |   ...                |                              *
*  | aligned seq. 7  |   30 %               |                              *
*  +-----------------+----------------------+                              *
*                                                                          *
****************************************************************************
*                                                                          *
*  Estimated Accuracy of Prediction                                        *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                       *
*                                                                          *
*  A careful cross validation test on some 250 protein chains (in total    *
*  about 55,000 residues) with less than 25% pairwise sequence identity    *
*  gave the following results:                                             *
*                                                                          *
*  ++================++-----------------------------------------+          *
*  || Qtotal = 72.1% ||      ("overall three state accuracy")   |          *
*  ++================++-----------------------------------------+          *
*                                                                          *
*  +----------------------------+-----------------------------+            *
*  | Qhelix (% of observed)=70% | Qhelix (% of predicted)=77% |            *
*  | Qstrand(% of observed)=62% | Qstrand(% of predicted)=64% |            *
*  | Qloop  (% of observed)=79% | Qloop  (% of predicted)=72% |            *
*  +----------------------------+-----------------------------+            *
*..........................................................................*
*                                                                          *
*  These percentages are defined by:                                       *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                       *
*                                                                          *
*  |                    number of correctly predicted residues             *
*  |Qtotal =            ---------------------------------------      (*100)*
*  |                          number of all residues                       *
*  |                                                                       *
*  |                    no of res correctly predicted to be in helix       *
*  |Qhelix (% of obs) = -------------------------------------------- (*100)*
*  |                    no of all res observed to be in helix              *
*  |                                                                       *
*  |                                                                       *
*  |                    no of res correctly predicted to be in helix       *
*  |Qhelix (% of pred)= -------------------------------------------- (*100)*
*  |                    no of all residues predicted to be in helix        *
*                                                                          *
*..........................................................................*
*                                                                          *
*  Averaging over single chains                                            *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                            *
*                                                                          *
*  The most reasonable way to compute the overall accuracies is the above  *
*  quoted percentage of correctly predicted residues.  However, since the  *
*  user is mainly interested in the expected performance of the prediction *
*  for a particular protein, the mean value when averaging over protein    *
*  chains might be of help as well.  Computing first the three state       *
*  accuracy for each protein chain, and then averaging over 250 chains     *
*  yields the following average:                                           *
*                                                                          *
*  +-------------------------------====--+                                 *
*  | Qtotal/averaged over chains = 72.2% |                                 *
*  +-------------------------------====--+                                 *
*  | standard deviation          =  9.3% |                                 *
*  +-------------------------------------+                                 *
*                                                                          *
*..........................................................................*
*                                                                          *
*  Further measures of performance                                         *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                         *
*                                                                          *
*  Matthews correlation coefficient:                                       *
*                                                                          *
*  +---------------------------------------------+                         *
*  | Chelix = 0.63, Cstrand = 0.53, Cloop = 0.52 |                         *
*  +---------------------------------------------+                         *
*..........................................................................*
*                                                                          *
*  Average length of predicted secondary structure segments:               *
*                                                                          *
*  .           +------------+----------+                                   *
*  .           |  predicted | observed |                                   *
*  +-----------+------------+----------+                                   *
*  | Lhelix  = |    10.3    |    9.3   |                                   *
*  | Lstrand = |     5.0    |    5.3   |                                   *
*  | Lloop   = |     7.2    |    5.9   |                                   *
*  +-----------+------------+----------+                                   *
*..........................................................................*
*                                                                          *
*  The accuracy matrix in detail:                                          *
*                                                                          *
*  +---------------------------------------+                               *
*  |    number of residues with H, E, L    |                               *
*  +---------+------+------+------+--------+                               *
*  |         |net H |net E |net L |sum obs |                               *
*  +---------+------+------+------+--------+                               *
*  | obs H   |12447 | 1255 | 3990 |  17692 |                               *
*  | obs E   |  949 | 7493 | 3750 |  12192 |                               *
*  | obs L   | 2604 | 2875 |19962 |  25441 |                               *
*  +---------+------+------+------+--------+                               *
*  | sum Net |16000 |11623 |27702 |  55325 |                               *
*  +---------+------+------+------+--------+                               *
*                                                                          *
*  Note: This table is to be read in the following manner:                 *
*        12447 of all residues predicted to be in helix, were observed to  *
*        be in helix, 949 however belong to observed strands, 2604 to      *
*        observed loop regions.  The term "observed" refers to the DSSP    *
*        assignment of secondary structure calculated from 3D coordinates  *
*        of experimentally determined structures (Dictionary of Secondary  *
*        Structure  of Proteins: Kabsch & Sander (1983) Biopolymers, 22,   *
*        2577-2637).                                                       *
*                                                                          *
****************************************************************************
*                                                                          *
*  Position-specific reliability index                                     *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                     *
*                                                                          *
*  The network predicts the three secondary structure types using real     *
*  numbers from the output units. The prediction is assigned by choosing   *
*  the maximal unit ("winner takes all").  However, the real numbers       *
*  contain additional information.                                         *
*  E.g. the difference between the maximal and the second largest output   *
*  unit can be used to derive a "reliability index".  This index is given  *
*  for each residue along with the prediction.  The index is scaled to     *
*  have values between 0 (lowest reliability), and 9 (highest).            *
*  The accuracies (Qtot) to be expected for residues with values above a   *
*  particular value of the index are given below as well as the fraction   *
*  of such residues (%res).:                                               *
*                                                                          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*  | index|  0  |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |    *
*  | %res |100.0| 99.2| 90.4| 80.9| 71.6| 62.5| 52.8| 42.3| 29.8| 14.1|    *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*  |      |     |     |     |     |     |     |     |     |     |     |    *
*  | Qtot | 72.1| 72.3| 74.8| 77.7| 80.3| 82.9| 85.7| 88.5| 91.1| 94.2|    *
*  |      |     |     |     |     |     |     |     |     |     |     |    *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*  | H%obs| 70.4| 70.6| 73.7| 77.1| 80.1| 83.1| 86.0| 89.3| 92.5| 96.4|    *
*  | E%obs| 61.5| 61.7| 63.7| 66.6| 69.1| 71.7| 74.6| 77.0| 77.8| 68.1|    *
*  |      |     |     |     |     |     |     |     |     |     |     |    *
*  | H%prd| 77.8| 78.0| 80.0| 82.6| 84.7| 86.9| 89.2| 91.3| 93.1| 95.4|    *
*  | E%prd| 64.5| 64.7| 67.8| 71.0| 74.2| 77.6| 81.4| 85.1| 89.8| 93.5|    *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*                                                                          *
*  The above table gives the cumulative results, e.g. 62.5% of all         *
*  residues have a reliability of at least 5.  The overall three-state     *
*  accuracy for this subset of almost two thirds of all residues is 82.9%. *
*  For this subset, e.g., 83.1% of the observed helices are correctly      *
*  predicted, and 86.9% of all residues predicted to be in helix are       *
*  correct.                                                                *
*                                                                          *
*..........................................................................*
*                                                                          *
*  The following table gives the non-cumulative quantities, i.e. the       *
*  values per reliability index range.  These numbers answer the question: *
*  how reliable is the prediction for all residues labeled with the        *
*  particular index i.                                                     *
*                                                                          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+          *
*  | index|  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |          *
*  | %res |  8.8|  9.5|  9.3|  9.1|  9.7| 10.5| 12.5| 15.7| 14.1|          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+          *
*  |      |     |     |     |     |     |     |     |     |     |          *
*  | Qtot | 46.6| 50.6| 57.7| 62.6| 67.9| 74.2| 82.2| 88.3| 94.2|          *
*  |      |     |     |     |     |     |     |     |     |     |          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+          *
*  | H%obs| 36.8| 42.3| 49.5| 55.2| 61.7| 69.9| 78.8| 87.4| 96.4|          *
*  | E%obs| 44.7| 44.5| 52.1| 55.4| 60.9| 68.0| 75.9| 81.0| 68.1|          *
*  |      |     |     |     |     |     |     |     |     |     |          *
*  | H%prd| 49.9| 52.5| 60.3| 64.2| 69.2| 77.5| 85.4| 89.9| 95.4|          *
*  | E%prd| 41.7| 47.1| 53.6| 57.0| 64.0| 71.6| 78.8| 88.8| 93.5|          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+          *
*                                                                          *
*  For example, for residues with Relindex = 5 64% of all predicted betha- *
*  strand residues are correctly identified.                               *
*                                                                          *
*                                                                          *
****************************************************************************
*                                                                          *
*                                                                          *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~		           *
*      Solvent accessibility prediction by PHDacc:                         *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~		           *
*                                                                          *
*      Author:             Burkhard Rost		                   *
*                          EMBL, Heidelberg, FRG                           *
*                          Meyerhofstrasse 1, 69 117 Heidelberg            *
*                          Internet: Rost@EMBL-Heidelberg.DE 		   *
*                                                                          *
*      All rights reserved.                                                *
*                                                                          *
*                                                                          *
****************************************************************************
*                                                                          *
*  About the network method                                                *
*  ~~~~~~~~~~~~~~~~~~~~~~~~                                                *
*                                                                          *
*  The network for prediction of secondary structure is described in       *
*  detail in:                                                              *
*     Rost, Burkhard; Sander, Chris:                                       *
*     Prediction of protein structure at better than 70% accuracy.         *
*     J. Mol. Biol., 1993, 232, 584-599.                                   *
*                                                                          *
*  The analysis of the prediction of solvent exposure is given in:         *
*     Rost, Burkhard; Sander, Chris:                                       *
*     Conservation and prediction of solvent accessibility in protein      *
*     families.  Proteins, 1994, 20, 216-226.                              *
*                                                                          *
*  To be quoted for publications of PHD exposure prediction:               *
*     Both papers quoted above.                                            *
*                                                                          *
****************************************************************************
*                                                                          *
*  Definition of accessibility                                             *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~                                             *
*                                                                          *
*  For training the residue solvent accessibility the DSSP (Dictionary of  *
*  Secondary Structure of Proteins; Kabsch & Sander (1983) Biopolymers, 22,*
*  2577-2637) values of accessible surface area have been used.  The       *
*  prediction provides values for the relative solvent accessibility.  The *
*  normalisation is the following:                                         *
*                                                                          *
*  |                           ACCESSIBILITY (from DSSP in Angstrom)       *
*  |RELATIVE_ACCESSIBILITY =   ------------------------------------- * 100 *
*  |                               MAXIMAL_ACC (amino acid type i)         *
*                                                                          *
*  where MAXIMAL_ACC (i) is the maximal accessibility of amino acid type i.*
*  The maximal values are:                                                 *
*                                                                          *
*  +----+----+----+----+----+----+----+----+----+----+----+----+           *
*  |  A |  B |  C |  D |  E |  F |  G |  H |  I |  K |  L |  M |           *
*  | 106| 160| 135| 163| 194| 197|  84| 184| 169| 205| 164| 188|           *
*  +----+----+----+----+----+----+----+----+----+----+----+----+           *
*  |  N |  P |  Q |  R |  S |  T |  V |  W |  X |  Y |  Z |                *
*  | 157| 136| 198| 248| 130| 142| 142| 227| 180| 222| 196|                *
*  +----+----+----+----+----+----+----+----+----+----+----+                *
*                                                                          *
*  Notation: one letter code for amino acid, B stands for D or N; Z stands *
*     for E or Q; and X stands for undetermined.                           *
*                                                                          *
*  The relative solvent accessibility can be used to estimate the number   *
*  of water molecules (W) in contact with the residue:                     *
*                                                                          *
*  W = ACCESSIBILITY /10                                                   *
*                                                                          *
*  The prediction is given in 10 states for relative accessibility, with   *
*                                                                          *
*  RELATIVE_ACCESSIBILITY = (PREDICTED_ACC * PREDICTED_ACC)                *
*                                                                          *
*  where PREDICTED_ACC = 0 - 9.                                            *
*                                                                          *
****************************************************************************
*                                                                          *
*  Estimated Accuracy of Prediction                                        *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                       *
*                                                                          *
*  A careful cross validation test on some 238 protein chains (in total    *
*  about 62,000 residues) with less than 25% pairwise sequence identity    *
*  gave the following results:                                             *
*                                                                          *
*                                                                          *
*  Correlation                                                             *
*  ...........                                                             *
*                                                                          *
*  The correlation between observed and predicted solvent accessibility    *
*  is:                                                                     *
*                                                                          *
*  -----------                                                             *
*  corr = 0.53                                                             *
*  -----------                                                             *
*                                                                          *
*  This value ought to be compared to the worst and best case prediction   *
*  scenario: random prediction (corr = 0.0) and homology modelling         *
*  (corr = 0.66).  (Note: homology modelling yields a relative accurate    *
*  prediction in 3D if, and only if, a significantly identical sequence    *
*  has a known 3D structure.)                                              *
*                                                                          *
*                                                                          *
*  3-state accuracy                                                        *
*  ................                                                        *
*                                                                          *
*  Often the relative accessibility is projected onto, e.g., 3 states:     *
*     b  = buried       (here defined as < 9% relative accessibility),     *
*     i  = intermediate ( 9% <= rel. acc. < 36% ),                         *
*     e  = exposed      ( rel. acc. >= 36% ).                              *
*                                                                          *
*  A projection onto 3 states or 2 states (buried/exposed) enables the     *
*  compilation of a 3- and 2-state prediction accuracy.  PHD reaches an    *
*  overall 3-state accuracy of:                                            *
*     Q3 = 57.5%                                                           *
*  (compared to 35% for random prediction and 70% for homology modelling). *
*                                                                          *
*  In detail:                                                              *
*                                                                          *
*  +-----------------------------------+-------------------------+         *