Results from PredictProtein for predict_h16490

TOC for file /home/phd/server/work/predict_h16490

The following information has been received by the server (TOC)
PROSITE motif search (A Bairoch; P Bucher and K Hofmann) (TOC)
SEG low-complexity regions (J C Wootton & S Federhen) (TOC)
ProDom domain search (E Sonnhammer; Corpet, Gouzy, D Kahn) (TOC)
MAXHOM alignment header (TOC)
MAXHOM alignment (TOC)
PHD information about accuracy (TOC)
PHD predictions (TOC)
GLOBE prediction of globularity (TOC)

END of TOC

BEG of results for file /home/phd/server/work/predict_h16490

The following information has been received by the server

reference predict_h16490 (Jun 20, 2000 00:24:39)
reference pred_h16490 (Jun 20, 2000 00:25:05)
PPhdr from: kapilm@cs.brandeis.edu
PPhdr resp: MAIL
PPhdr orig: HTML
PPhdr want: HTML
PPhdr password(###)
prediction of: - default prediction of: - PHDsec PHDacc PHDhtm ProSite SEG ProDom
return msf format
ret html
# default: single protein sequence description=L-lysine 2,3-aminomutase
MINRRYELFKDVSDADWNDWRWQVRNRIETVEELKKYIPLTKEEEEGVAQCVKSLRMAITPYYLSLIDPNDPNDPVRKQAIPTALELNKAAADLEDPLHEDTDSPVPGLTHRYPDRVLLLITDMCSMYCRHCTRRRFAGQSDDSMPMERIDKAIDYIRNTPQVRDVLLSGGDALLVSDETLEYIIAKLREIPHVEIVRIGSRTPVVLPQRITPELVNMLKKYHPVWLNTHFNHPNEITEESTRACQLLADAGVPLGNQSVLLRGVNDCVHVMKELVNKLVKIRVRPYYIYQCDLSLGLEHFRTPVSKGIEIIEGLRGHTSGYCVPTFVVDAPGGGGKTPVMPNYVISQSHDKVILRNFEGVITTYSEPINYTPGCNCDVCTGKKKVHKVGVAGLLNGEGMALEPVGLERNKRHVQE

PROSITE motif search (A Bairoch; P Bucher and K Hofmann)

TOP - BOTTOM - PROSITE

-------------------------------------------------------------
Pattern-ID: ASN_GLYCOSYLATION PS00001 PDOC00001
Pattern-DE: N-glycosylation site
Pattern:    N[^P][ST][^P]
   257      NQSV

Pattern-ID: PKC_PHOSPHO_SITE PS00005 PDOC00005
Pattern-DE: Protein kinase C phosphorylation site
Pattern:    [ST].[RK]
   54       SLR
   110      THR
   133      TRR
   241      STR
   381      TGK

Pattern-ID: CK2_PHOSPHO_SITE PS00006 PDOC00006
Pattern-DE: Casein kinase II phosphorylation site
Pattern:    [ST].{2}[DE]
   13       SDAD
   30       TVEE
   41       TKEE
   65       SLID
   83       TALE
   169      SGGD
   364      TYSE

Pattern-ID: MYRISTYL PS00008 PDOC00008
Pattern-DE: N-myristoylation site
Pattern:    G[^EDRKHPFYW].{2}[STAGCN][^P]
   47       GVAQCV
   252      GVPLGN
   264      GVNDCV
   360      GVITTY
   393      GLLNGE
   406      GLERNK

Pattern-ID: AMIDATION PS00009 PDOC00009
Pattern-DE: Amidation site
Pattern:    .G[RK][RK]
   381      TGKK

Pattern-ID: ATP_GTP_A PS00017 PDOC00017
Pattern-DE: ATP/GTP-binding site motif A (P-loop)
Pattern:    [AG].{4}GK[ST]
   331      APGGGGKT

Pattern-ID: LEUCINE_ZIPPER PS00029 PDOC00029
Pattern-DE: Leucine zipper pattern
Pattern:    L.{6}L.{6}L.{6}L
   167      LLSGGDALLVSDETLEYIIAKL

SEG low-complexity regions (J C Wootton & S Federhen)

TOP - BOTTOM - SEG

>prot (#) ppOld, default: single protein sequence description=l-lysine 2,3-aminomutase /home/phd/server/work/predict_h16490
MINRRYELFKDVSDADWNDWRWQVRNRIETVEELKKYIPLTKEEEEGVAQCVKSLRMAIT PYYLSLIDPNDPNDPVRKQAIPTALELNKAAADLEDPLHEDTDSPVPGLTHRYPDRVLLL ITDMCSMYCRHCTRRRFAGQSDDSMPMERIDKAIDYIRNTPQVRDVLLSGGDALLVSDET LEYIIAKLREIPHVEIVRIGSRTPVVLPQRITPELVNMLKKYHPVWLNTHFNHPNEITEE STRACQLLADAGVPLGNQSVLLRGVNDCVHVMKELVNKLVKIRVRPYYIYQCDLSLGLEH FRTPVSKGIEIIEGLRGHTSGYCVPTFVVDAPGGGGKTPVMPNYVISQSHDKVILRNFEG VITTYSEPINYTPGCNCDVCTxxxxxxxxxxxx LLNGEGMALEPVGLERNKRHVQE

ProDom domain search (E Sonnhammer; Corpet, Gouzy, D Kahn)

TOP - BOTTOM - ProDom - MView

Identities computed with respect to: (query) prot
Colored by: consensus/70% and property

HSP processing: ranked

                                                                           17 [  .         .         .         :         .         .         .         .         1         .         .         .         .         :         .         .         .         .         2         .         .         .         .         :         .         .         .         .         3         .         .         .         .         :         .         .  ] 373
  prot           (#) ppOld, default: single ... score      P(N)  N 100.0%     WNDWRWQVRNRIETVEELKKYIPLTKEEEEXXXXXXXXXXXXXXXXXXXXXDPNDPNDPVRKQAIPTALELNKAAADLEDPLHEDTDSPVPGLTHRYPDRVLLLITDMCSMYCRHCTRRRFAGQSDDSMPMERIDKAIDYIRNTPQVRDVLLSGGDALLVSDETLEYIIAKLREIPHVEIVRIGSRTPVVLPQRITPELVNMLXXYHPVWLNTHFNHPNEITEESTRACQLLADAGVPLGNQSVLLRGVNDCVHVMKELVNKLVKIRVRPYYIYQCDLSLGLEHFRTPVSKGIEIIEGLRGHTSGYCVPTFVVDAPGGGGKTXVMPNYVISQSHDKVILRNFEGVITTYSEPINYTP    
1 PD008727       p99.2 (8) YJEK(2)  // PROTE...   351  1.0e-121  5  53.2%     WFKWLWQLTNGVKTLKELRKVLNLKVEDED---------------------NPYVEBDPIRRQVIPTEWEIEKZVWHKEDFMGEDEYSPVPGLTHRYPDRVLLLVTDSCAVYCRYCFRRWFIQQENQGVPKEEVEKALDYIREHPEINEVLISGGDPLTLSDHKLEKLLKRLREIPHVKIIRIGTRLPVVAPQRITDDLLELL--YKPIWIMTHINHPYEITEEAREAVEKLRKTGIPIYNQSVLLRGVNDDFETLATLFHALTKIGVKPYYLFQCDPTPGTGHFRVPIEETLEIMRTLRGRISGYAIPTLAVDLPGGGGKT-----------------------------------    
2 PD041312       p99.2 (1) YODO_BACSU // HYP...   137   2.4e-11  1  76.5%     -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------LQPNYVLSQSPDKVILRNFEGVITSYPEPENYIP    
3 PD091955       p99.2 (1) O34400_BACSU // Y...    58     0.083  2  23.5%     ---------------------------------------------------------------------------------------------------------TTLCNMRCEHC-----------------IDLLLKRLEEIPRLRSISITGGEPMLSLKSVKEYVVPLLK----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    
  consensus/100%                                                              ..............................                     ........................................................................................................................................................  ..................................................................................................................... ..................................    
  consensus/90%                                                               ..............................                     ........................................................................................................................................................  ..................................................................................................................... ..................................    
  consensus/80%                                                               ..............................                     ........................................................................................................................................................  ..................................................................................................................... ..................................    
  consensus/70%                                                               ..............................                     ......................................................Ts.CshhCcaC.................l-hhlchlcphPplppl.loGG-shh..cphhEhll.hL+..............................  ..................................................................................................................... ..................................

--- ------------------------------------------------------------
--- 
--- Again: these results were obtained based on the domain data-
--- base collected by Daniel Kahn and his coworkers in Toulouse.
--- 
--- PLEASE quote: 
---       F Corpet, J Gouzy, D Kahn (1998).  The ProDom database
---       of protein domain families. Nucleic Ac Res 26:323-326.
--- 
--- The general WWW page is on:
----      ---------------------------------------
---       http://www.toulouse.inra.fr/prodom.html
----      ---------------------------------------
--- 
--- For WWW graphic interfaces to PRODOM, in particular for your
--- protein family, follow the following links (each line is ONE
--- single link for your protein!!):
--- 
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=PD008727 ==> multiple alignment, consensus, PDB and PROSITE links of domain PD008727
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=PD008727 ==> graphical output of all proteins having domain PD008727
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=PD041312 ==> multiple alignment, consensus, PDB and PROSITE links of domain PD041312
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=PD041312 ==> graphical output of all proteins having domain PD041312
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=PD091955 ==> multiple alignment, consensus, PDB and PROSITE links of domain PD091955
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=PD091955 ==> graphical output of all proteins having domain PD091955
--- 
--- NOTE: if you want to use the link, make sure the entire line
---       is pasted as URL into your browser!
--- 
--- END of PRODOM
--- ------------------------------------------------------------

MAXHOM alignment header

--- ------------------------------------------------------------
--- MAXHOM multiple sequence alignment
--- ------------------------------------------------------------
--- 
--- MAXHOM ALIGNMENT HEADER: ABBREVIATIONS FOR SUMMARY
--- ID           : identifier of aligned (homologous) protein
--- STRID        : PDB identifier (only for known structures)
--- PIDE         : percentage of pairwise sequence identity
--- WSIM         : percentage of weighted similarity
--- LALI         : number of residues aligned
--- NGAP         : number of insertions and deletions (indels)
--- LGAP         : number of residues in all indels
--- LSEQ2        : length of aligned sequence
--- ACCNUM       : SwissProt accession number
--- NAME         : one-line description of aligned protein
--- 
--- MAXHOM ALIGNMENT HEADER: SUMMARY
ID         STRID  IDE WSIM LALI NGAP LGAP LEN2 ACCNUM NAME                     
yodo_bacsu         60   73  410    1    5  471 O34676 HYPOTHETICAL 54.1 KD PROT
y454_aquae         48   62  366    1    1  370 O66761 HYPOTHETICAL PROTEIN AQ_4
y121_trepa         35   44  336    3   11  355 O83158 HYPOTHETICAL PROTEIN TP01
yjek_haein         34   48  320    2    2  338 P44641 HYPOTHETICAL PROTEIN HI03
yg32_aquae         34   43  350    6   43  374 O67554 HYPOTHETICAL PROTEIN AQ_1
yjek_ecoli         33   46  327    4    4  342 P39280 HYPOTHETICAL 38.7 KD PROT
yjek_bucap         34   46  143    1    1  144 Q44634 HYPOTHETICAL PROTEIN IN G
--- 
--- MAXHOM ALIGNMENT: IN MSF FORMAT

--- 
--- Version of database searched for alignment:
--- SWISS-PROT release 38.0 (7/99) with 80000 proteins
---

MAXHOM alignment

TOP - BOTTOM - MaxHom - MView

Identities computed with respect to: (1) predict_h1640
Colored by: consensus/70% and property


                          1 [        .         .         .         .         :         .         .         .         .         1         .         .         .         .         :         .         .         .         .         2         .         .         .         .         :         .         .         .         .         3         .         .         .         .         :         .         .         .         .         4         .     ] 416
1 predict_h1640  100.0%     MINRRYELFKDVSDADWNDWRWQVRNRIETVEELKKYIPLTKEEEEGVAQCVKSLRMAITPYYLSLIDPNDPNDPVRKQAIPTALELNKAAADLEDPLHEDTDSPVPGLTHRYPDRVLLLITDMCSMYCRHCTRRRFAGQSDDSMPMERIDKAIDYIRNTPQVRDVLLSGGDALLVSDETLEYIIAKLREIPHVEIVRIGSRTPVVLPQRITPELVNMLKKYHPVWLNTHFNHPNEITEESTRACQLLADAGVPLGNQSVLLRGVNDCVHVMKELVNKLVKIRVRPYYIYQCDLSLGLEHFRTPVSKGIEIIEGLRGHTSGYCVPTFVVDAPGGGGKTPVMPNYVISQSHDKVILRNFEGVITTYSEPINYTPGCNCDVCTGKKKVHKVGVAGLLNGEGMALEPVGLERNKRHVQE    
2 yodo_bacsu      60.2%     ---KEIELWKDVPEEKWNDWLWQLTHTVRTLDDLKKVINLTEDEEEGVRISTKTIPLNITPYYASLMDPDNPRCPVRMQSVPLSEEMHKTKYDLEDPLHEDEDSPVPGLTHRYPDRVLFLVTNQCSMYCRYCTRRRFSGQIGMGVPKKQLDAAIAYIRETPEIRDCLISGGDGLLINDQILEYILKELRSIPHLEVIRIGTRAPVVFPQRITDHLCEILKKYHPVWLNTHFNTSIEMTEESVEACEKLVNAGVPVGNQAVVLAGINDSVPIMKKLMHDLVKIRVRPYYIYQCDLSEGIGHFRAPVSKGLEIIEGLRGHTSGYAVPTFVVDAPGGGGKIALQPNYVLSQSPDKVILRNFEGVITSYPEPENYIPNQADAYfeTADKKEPIGLSAIFADKEVSFTPENVDRIKRR---    
3 y454_aquae      47.8%     -------FFENVPENLWRSYEWQIQNRIKTLKEIKKYLKLLPEEEEGIKRTQGLYPFAITPYYLSLINPEDPKDPIRLQAIPRVVEVDEKVQSAGEPDALKEEGDIPGLTHRYPDRVLLNVTTFCAVYCRHCMRKRIFSQGERARTKEEIDTMIDYIKRHEEIRDVLISGGEPLSLSLEKLEYLLSRLREIKHVEIIRFGTRLPVLAPQRFFnkLLDILEKYSPIWINTHFNHPNEITEYAEEAVDRLLRRGIPVNNQTVLLKGVNDDPEVMLKLFRKLLRIKVKPQYLFHCDPIKGAVHFRTTIDKGLEIMRYLRGRLSGFGIPTYAVDLPGGKGKVPLLPNYVKKRKGNKFWFESFTGEVVEYEVTEVWEP-------------------------------------------    
4 y121_trepa      34.8%     -----------------------------TREQRKRRGAGRADEHWRTLsaADALTEHISPAYAHLIAqgADAQALKRQVCFAPQERVVHACECADPLGEDRYCVTPFLVHQYANRVLMLATGRCFSHCRYCFRRGFIAQRAGWIPNEEREKIITYLRATPSVKEILVSGGDPLTGSFAQVTSLFRALRSVAPDLIIRLCTRAVTFAPQAFTPELIAFLQEMKPVWIIPHINHPAELGSTQRAVLEACVGAGLPVQSQSVLLRGVNDSVETLCTLFHALTCLGVKPGYLFQLDLAPGTGDFRVPLSDTLALWRTLKERLSGLSLPTLAVDLPGGGGKFPLvqDVTWHQEREAFSARGIDGAWYTY---------------------------------------------------    
5 yjek_haein      33.6%     -----------------QNWLTILKNAISDPKLLLKALNLPEDDFEQSIAARKLFSLRVPQPFIDKIEKGNPQDPLFLQVMCSDLEFVQAEGFSTDPLEEKNANAVPNILHKYRNRLLFMAKGGCAVNCRYCFRRHFPYDENPGNKKS-WQLALDYIAAHSEIEEVIFSGGDPLMAKDHELAWLIKHLENIPHLQRLRIHTRLPVVIPQRITDEFCTLLAETrqTVMVTHINHPNEIDQIFAHAMQKLNAVNVTLLNQSVLLKGVNDDAQILKILSDKLFQTGILPYYLHLLDKVQGASHFLISDIEAMQIYKTLQSLTSGYLVPKLAREIAGEPNKT------------------------------------------------------------------------------    
6 yg32_aquae      31.7%     -----------------------MGKKLKYIIDLKFIEEIPEEERRELEKVTEKFAFRTNTYYNSLINWDNPNDPIRRIVIPTTEELEVWGK--LDASNESKYMKVHGLEHKYPDTALLLVTDVCGIYCRFCFRKRLFMNDNDEVARD-VSEGLEYIRNHPEINNVLLTGGDPLILATFKLEKILKALAEIPHVRIVRIGSKMLAVNPFRVlpKLLELFEWfkKLYLMNHFNHPRELTKEARKAVELVQKTGTTLTNQTPILKGINDDFETLKTLLEELSFIGVPPYYVFQCRPTAGNKAYSTPIEETIDLVEAVRAEVSGL----------------AARVRYVMSHETGKIEILGKTDEHIFFRYHRAADPENRGKFmvAEYKSSLSGVS------------------------    
7 yjek_ecoli      32.5%     -----------LNTPSREDWLTQLADVVTDPDELLRLLNIDAEEKLLAGRSAKKLflRVPRSFIDRMEKGNPDDPLLRQVLTSQDEFVIAPGFSTDPLEEQ-HSVVPGLLHKYHNRALLLVKGGCAVNCRYCFRRHFPYAENQGNKRN-WQTALEYVAAHPELDEMIFSGGDPLMAKDHELDWLLTQLEAIPHIKRLRIHSRLPIVIPARITEALVECFARStqILLVNHINHANEVDETFRQAMAKLRRVGVTLLNQSVLLRDVNDNAQTLANLSNALFDAGVMPYYLHVLDKVQGAAHFMVSDDEARQIMRELLTLVSGYLVPKLAREIGGEPSKTPL----------------------------------------------------------------------------    
8 yjek_bucap      34.3%     ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------IHTRLPIVIPNRITSDLCQIFSNslKIIIVTHINHPQEINEQLSDSLLKLKKSNVILLNQSVLLKNINDNAIILAELSSRLCENNIIPYYLHILDKVKGTSHFLVSNKKAKSIISDLMKMISGFLVPRLVFDNGSKDNKLIII---------------------------------------------------------------------------    
  consensus/100%            ......................................................................................................................................................................................................hto+h.hh.P.thh.thhthht....hhh.sHhNps.Ehsp...tsh.hh.t.sh.l.sQsslLtslNDs..hhh.L.ptL...tl.P.Yla.hc...G..ta.hs..cshtlhp.l.t.hSGh..............................................................................................    
  consensus/90%             ......................................................................................................................................................................................................hto+h.hh.P.thh.thhthht....hhh.sHhNps.Ehsp...tsh.hh.t.sh.l.sQsslLtslNDs..hhh.L.ptL...tl.P.Yla.hc...G..ta.hs..cshtlhp.l.t.hSGh..............................................................................................    
  consensus/80%                ..............................hhhh..h..--......s.t.h..ths..a.phht.tsstpslhh.sh....Eh..h.....-s.t.p....h..l.HpY.sphLh.hps.C...CRaChR+th..t.t.....p.hp.hltYlttp.plpphlhoGG-sL.ht...lt.lhttLttl....hlRltoRhshlhPtRhhscLhphhtp.p.lhl.sHhNHstElsp..tpuhthl.tsslsl.NQoVlL+slNDss.hhtpL.ptLhphtlhPhYla.hD.s.GhtcFhss.pcshplhctLhthhSGh.lPphsh-.sut.sKh.h.........................................................................       
  consensus/70%                    ................htphlps.cplh+hhtl.t-Ec.thhtstchh.htls..ahshhp.ssPpsPlhhQshsts.Eh..t.t..tDP.tEpp.sslPsLhH+Y.sRsLhhspshCuh.CRaChR++h.hpts.t..pp.hpthlpYlttpsplc-hlhSGGDsLhhp.tpLphllptLcpIsHlphlRItoRhPlVhPpRlTscLhphhpchp.lhlssHhNHPpElscp.pcAhptLhpsGlslhNQoVLL+GlNDssphhtpL.pcLhphtVhPYYla.hDhstGssHFhss.pcuhpIhcsLpshhSGahlPphsh-hsGtssKhsl....................................................

PHD information about accuracy

****************************************************************************
*                                                                          *
*      Prediction of:			                                   *
*	- secondary structure,   		by PHDsec		   *
*	- solvent accessibility, 		by PHDacc		   *
*                                                                          *
*      PHD: Profile fed neural network systems from HeiDelberg             *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~             *
*                                                                          *
*      Author:             Burkhard Rost		                   *
*                          EMBL, Heidelberg, FRG                           *
*                          Meyerhofstrasse 1, 69 117 Heidelberg            *
*                          Internet: Predict-Help@EMBL-Heidelberg.DE       *
*                                                                          *
*      All rights reserved.                                                *
*                                                                          *
****************************************************************************
*                                                                          *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~	                   *
*      Secondary structure prediction by PHDsec:                           *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~	                   *
*                                                                          *
*      Author:             Burkhard Rost		                   *
*                          EMBL, Heidelberg, FRG                           *
*                          Meyerhofstrasse 1, 69 117 Heidelberg            *
*                          Internet: Rost@EMBL-Heidelberg.DE 		   *
*                                                                          *
*      All rights reserved.                                                *
*                                                                          *
*                                                                          *
****************************************************************************
*                                                                          *
*  About the network method                                                *
*  ~~~~~~~~~~~~~~~~~~~~~~~~                                                *
*                                                                          *
*  The network procedure is described in detail in:                        *
*  1) Rost, Burkhard; Sander, Chris:                                       *
*     Prediction of protein structure at better than 70% accuracy.         *
*     J. Mol. Biol., 1993, 232, 584-599.        	                   *
*                                                                          *
*  A brief description is given in:                                        *
*     Rost, Burkhard; Sander, Chris:                                       *
*     Improved prediction of protein secondary structure by use of se-     *
*     quence profiles and neural networks.                                 *
*     Proc. Natl. Acad. Sci. U.S.A., 1993, 90, 7558-7562.   		   *
*                                                                          *
*  The PHD mail server is described in:                                    *
*  2) Rost, Burkhard; Sander, Chris; Schneider, Reinhard:                  *
*     PHD - an automatic mail server for protein secondary structure       *
*     prediction.                                                          *
*     CABIOS, 1994, 10, 53-60.                                             *
*                                                                          *
*  The latest improvement steps (up to 72%) are explained in:              *
*  3) Rost, Burkhard; Sander, Chris:                                       *
*     Combining evolutionary information and neural networks to predict    *
*     protein secondary structure.                                         *
*     Proteins, 1994,  19, 55-72.                                          *
*                                                                          *
*  To be quoted for publications of PHD output:                            *
*     Papers 1-3 for the prediction of secondary structure and the pre-    *
*     diction server.                                                      *
*                                                                          *
****************************************************************************
*                                                                          *
*  About the input to the network                                          *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                         *
*                                                                          *
*  The prediction is performed by a system of neural networks.             *
*  The input is a multiple sequence alignment. It is taken from an HSSP    *
*  file (produced by the program MaxHom:                                   *
*     Sander, Chris & Schneider, Reinhard: Database of Homology-Derived    *
*     Structures and the Structural Meaning of Sequence Alignment.         *
*     Proteins, 1991, 9, 56-68.                                            *
*                                                                          *
*  For optimal results the alignment should contain sequences with varying *
*  degrees of sequence similarity relative to the input protein.           *
*  The following is an ideal situation:                                    *
*                                                                          *
*  +-----------------+----------------------+                              *
*  |   sequence:     |  sequence identity   |                              *
*  +-----------------+----------------------+                              *
*  | target sequence |  100 %               |                              *
*  | aligned seq. 1  |   90 %               |                              *
*  | aligned seq. 2  |   80 %               |                              *
*  |      ...        |   ...                |                              *
*  | aligned seq. 7  |   30 %               |                              *
*  +-----------------+----------------------+                              *
*                                                                          *
****************************************************************************
*                                                                          *
*  Estimated Accuracy of Prediction                                        *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                       *
*                                                                          *
*  A careful cross validation test on some 250 protein chains (in total    *
*  about 55,000 residues) with less than 25% pairwise sequence identity    *
*  gave the following results:                                             *
*                                                                          *
*  ++================++-----------------------------------------+          *
*  || Qtotal = 72.1% ||      ("overall three state accuracy")   |          *
*  ++================++-----------------------------------------+          *
*                                                                          *
*  +----------------------------+-----------------------------+            *
*  | Qhelix (% of observed)=70% | Qhelix (% of predicted)=77% |            *
*  | Qstrand(% of observed)=62% | Qstrand(% of predicted)=64% |            *
*  | Qloop  (% of observed)=79% | Qloop  (% of predicted)=72% |            *
*  +----------------------------+-----------------------------+            *
*..........................................................................*
*                                                                          *
*  These percentages are defined by:                                       *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                       *
*                                                                          *
*  |                    number of correctly predicted residues             *
*  |Qtotal =            ---------------------------------------      (*100)*
*  |                          number of all residues                       *
*  |                                                                       *
*  |                    no of res correctly predicted to be in helix       *
*  |Qhelix (% of obs) = -------------------------------------------- (*100)*
*  |                    no of all res observed to be in helix              *
*  |                                                                       *
*  |                                                                       *
*  |                    no of res correctly predicted to be in helix       *
*  |Qhelix (% of pred)= -------------------------------------------- (*100)*
*  |                    no of all residues predicted to be in helix        *
*                                                                          *
*..........................................................................*
*                                                                          *
*  Averaging over single chains                                            *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                            *
*                                                                          *
*  The most reasonable way to compute the overall accuracies is the above  *
*  quoted percentage of correctly predicted residues.  However, since the  *
*  user is mainly interested in the expected performance of the prediction *
*  for a particular protein, the mean value when averaging over protein    *
*  chains might be of help as well.  Computing first the three state       *
*  accuracy for each protein chain, and then averaging over 250 chains     *
*  yields the following average:                                           *
*                                                                          *
*  +-------------------------------====--+                                 *
*  | Qtotal/averaged over chains = 72.2% |                                 *
*  +-------------------------------====--+                                 *
*  | standard deviation          =  9.3% |                                 *
*  +-------------------------------------+                                 *
*                                                                          *
*..........................................................................*
*                                                                          *
*  Further measures of performance                                         *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                         *
*                                                                          *
*  Matthews correlation coefficient:                                       *
*                                                                          *
*  +---------------------------------------------+                         *
*  | Chelix = 0.63, Cstrand = 0.53, Cloop = 0.52 |                         *
*  +---------------------------------------------+                         *
*..........................................................................*
*                                                                          *
*  Average length of predicted secondary structure segments:               *
*                                                                          *
*  .           +------------+----------+                                   *
*  .           |  predicted | observed |                                   *
*  +-----------+------------+----------+                                   *
*  | Lhelix  = |    10.3    |    9.3   |                                   *
*  | Lstrand = |     5.0    |    5.3   |                                   *
*  | Lloop   = |     7.2    |    5.9   |                                   *
*  +-----------+------------+----------+                                   *
*..........................................................................*
*                                                                          *
*  The accuracy matrix in detail:                                          *
*                                                                          *
*  +---------------------------------------+                               *
*  |    number of residues with H, E, L    |                               *
*  +---------+------+------+------+--------+                               *
*  |         |net H |net E |net L |sum obs |                               *
*  +---------+------+------+------+--------+                               *
*  | obs H   |12447 | 1255 | 3990 |  17692 |                               *
*  | obs E   |  949 | 7493 | 3750 |  12192 |                               *
*  | obs L   | 2604 | 2875 |19962 |  25441 |                               *
*  +---------+------+------+------+--------+                               *
*  | sum Net |16000 |11623 |27702 |  55325 |                               *
*  +---------+------+------+------+--------+                               *
*                                                                          *
*  Note: This table is to be read in the following manner:                 *
*        12447 of all residues predicted to be in helix, were observed to  *
*        be in helix, 949 however belong to observed strands, 2604 to      *
*        observed loop regions.  The term "observed" refers to the DSSP    *
*        assignment of secondary structure calculated from 3D coordinates  *
*        of experimentally determined structures (Dictionary of Secondary  *
*        Structure  of Proteins: Kabsch & Sander (1983) Biopolymers, 22,   *
*        2577-2637).                                                       *
*                                                                          *
****************************************************************************
*                                                                          *
*  Position-specific reliability index                                     *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                     *
*                                                                          *
*  The network predicts the three secondary structure types using real     *
*  numbers from the output units. The prediction is assigned by choosing   *
*  the maximal unit ("winner takes all").  However, the real numbers       *
*  contain additional information.                                         *
*  E.g. the difference between the maximal and the second largest output   *
*  unit can be used to derive a "reliability index".  This index is given  *
*  for each residue along with the prediction.  The index is scaled to     *
*  have values between 0 (lowest reliability), and 9 (highest).            *
*  The accuracies (Qtot) to be expected for residues with values above a   *
*  particular value of the index are given below as well as the fraction   *
*  of such residues (%res).:                                               *
*                                                                          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*  | index|  0  |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |    *
*  | %res |100.0| 99.2| 90.4| 80.9| 71.6| 62.5| 52.8| 42.3| 29.8| 14.1|    *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*  |      |     |     |     |     |     |     |     |     |     |     |    *
*  | Qtot | 72.1| 72.3| 74.8| 77.7| 80.3| 82.9| 85.7| 88.5| 91.1| 94.2|    *
*  |      |     |     |     |     |     |     |     |     |     |     |    *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*  | H%obs| 70.4| 70.6| 73.7| 77.1| 80.1| 83.1| 86.0| 89.3| 92.5| 96.4|    *
*  | E%obs| 61.5| 61.7| 63.7| 66.6| 69.1| 71.7| 74.6| 77.0| 77.8| 68.1|    *
*  |      |     |     |     |     |     |     |     |     |     |     |    *
*  | H%prd| 77.8| 78.0| 80.0| 82.6| 84.7| 86.9| 89.2| 91.3| 93.1| 95.4|    *
*  | E%prd| 64.5| 64.7| 67.8| 71.0| 74.2| 77.6| 81.4| 85.1| 89.8| 93.5|    *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*                                                                          *
*  The above table gives the cumulative results, e.g. 62.5% of all         *
*  residues have a reliability of at least 5.  The overall three-state     *
*  accuracy for this subset of almost two thirds of all residues is 82.9%. *
*  For this subset, e.g., 83.1% of the observed helices are correctly      *
*  predicted, and 86.9% of all residues predicted to be in helix are       *
*  correct.                                                                *
*                                                                          *
*..........................................................................*
*                                                                          *
*  The following table gives the non-cumulative quantities, i.e. the       *
*  values per reliability index range.  These numbers answer the question: *
*  how reliable is the prediction for all residues labeled with the        *
*  particular index i.                                                     *
*                                                                          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+          *
*  | index|  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |          *
*  | %res |  8.8|  9.5|  9.3|  9.1|  9.7| 10.5| 12.5| 15.7| 14.1|          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+          *
*  |      |     |     |     |     |     |     |     |     |     |          *
*  | Qtot | 46.6| 50.6| 57.7| 62.6| 67.9| 74.2| 82.2| 88.3| 94.2|          *
*  |      |     |     |     |     |     |     |     |     |     |          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+          *
*  | H%obs| 36.8| 42.3| 49.5| 55.2| 61.7| 69.9| 78.8| 87.4| 96.4|          *
*  | E%obs| 44.7| 44.5| 52.1| 55.4| 60.9| 68.0| 75.9| 81.0| 68.1|          *
*  |      |     |     |     |     |     |     |     |     |     |          *
*  | H%prd| 49.9| 52.5| 60.3| 64.2| 69.2| 77.5| 85.4| 89.9| 95.4|          *
*  | E%prd| 41.7| 47.1| 53.6| 57.0| 64.0| 71.6| 78.8| 88.8| 93.5|          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+          *
*                                                                          *
*  For example, for residues with Relindex = 5 64% of all predicted betha- *
*  strand residues are correctly identified.                               *
*                                                                          *
*                                                                          *
****************************************************************************
*                                                                          *
*                                                                          *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~		           *
*      Solvent accessibility prediction by PHDacc:                         *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~		           *
*                                                                          *
*      Author:             Burkhard Rost		                   *
*                          EMBL, Heidelberg, FRG                           *
*                          Meyerhofstrasse 1, 69 117 Heidelberg            *
*                          Internet: Rost@EMBL-Heidelberg.DE 		   *
*                                                                          *
*      All rights reserved.                                                *
*                                                                          *
*                                                                          *
****************************************************************************
*                                                                          *
*  About the network method                                                *
*  ~~~~~~~~~~~~~~~~~~~~~~~~                                                *
*                                                                          *
*  The network for prediction of secondary structure is described in       *
*  detail in:                                                              *
*     Rost, Burkhard; Sander, Chris:                                       *
*     Prediction of protein structure at better than 70% accuracy.         *
*     J. Mol. Biol., 1993, 232, 584-599.                                   *
*                                                                          *
*  The analysis of the prediction of solvent exposure is given in:         *
*     Rost, Burkhard; Sander, Chris:                                       *
*     Conservation and prediction of solvent accessibility in protein      *
*     families.  Proteins, 1994, 20, 216-226.                              *
*                                                                          *
*  To be quoted for publications of PHD exposure prediction:               *
*     Both papers quoted above.                                            *
*                                                                          *
****************************************************************************
*                                                                          *
*  Definition of accessibility                                             *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~                                             *
*                                                                          *
*  For training the residue solvent accessibility the DSSP (Dictionary of  *
*  Secondary Structure of Proteins; Kabsch & Sander (1983) Biopolymers, 22,*
*  2577-2637) values of accessible surface area have been used.  The       *
*  prediction provides values for the relative solvent accessibility.  The *
*  normalisation is the following:                                         *
*                                                                          *
*  |                           ACCESSIBILITY (from DSSP in Angstrom)       *
*  |RELATIVE_ACCESSIBILITY =   ------------------------------------- * 100 *
*  |                               MAXIMAL_ACC (amino acid type i)         *
*                                                                          *
*  where MAXIMAL_ACC (i) is the maximal accessibility of amino acid type i.*
*  The maximal values are:                                                 *
*                                                                          *
*  +----+----+----+----+----+----+----+----+----+----+----+----+           *
*  |  A |  B |  C |  D |  E |  F |  G |  H |  I |  K |  L |  M |           *
*  | 106| 160| 135| 163| 194| 197|  84| 184| 169| 205| 164| 188|           *
*  +----+----+----+----+----+----+----+----+----+----+----+----+           *
*  |  N |  P |  Q |  R |  S |  T |  V |  W |  X |  Y |  Z |                *
*  | 157| 136| 198| 248| 130| 142| 142| 227| 180| 222| 196|                *
*  +----+----+----+----+----+----+----+----+----+----+----+                *
*                                                                          *
*  Notation: one letter code for amino acid, B stands for D or N; Z stands *
*     for E or Q; and X stands for undetermined.                           *
*                                                                          *
*  The relative solvent accessibility can be used to estimate the number   *
*  of water molecules (W) in contact with the residue:                     *
*                                                                          *
*  W = ACCESSIBILITY /10                                                   *
*                                                                          *
*  The prediction is given in 10 states for relative accessibility, with   *
*                                                                          *
*  RELATIVE_ACCESSIBILITY = (PREDICTED_ACC * PREDICTED_ACC)                *
*                                                                          *
*  where PREDICTED_ACC = 0 - 9.                                            *
*                                                                          *
****************************************************************************
*                                                                          *
*  Estimated Accuracy of Prediction                                        *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                       *
*                                                                          *
*  A careful cross validation test on some 238 protein chains (in total    *
*  about 62,000 residues) with less than 25% pairwise sequence identity    *
*  gave the following results:                                             *
*                                                                          *
*                                                                          *
*  Correlation                                                             *
*  ...........                                                             *
*                                                                          *
*  The correlation between observed and predicted solvent accessibility    *
*  is:                                                                     *
*                                                                          *
*  -----------                                                             *
*  corr = 0.53                                                             *
*  -----------                                                             *
*                                                                          *
*  This value ought to be compared to the worst and best case prediction   *
*  scenario: random prediction (corr = 0.0) and homology modelling         *
*  (corr = 0.66).  (Note: homology modelling yields a relative accurate    *
*  prediction in 3D if, and only if, a significantly identical sequence    *
*  has a known 3D structure.)                                              *
*                                                                          *
*                                                                          *
*  3-state accuracy                                                        *
*  ................                                                        *
*                                                                          *
*  Often the relative accessibility is projected onto, e.g., 3 states:     *
*     b  = buried       (here defined as < 9% relative accessibility),     *
*     i  = intermediate ( 9% <= rel. acc. < 36% ),                         *
*     e  = exposed      ( rel. acc. >= 36% ).                              *
*                                                                          *
*  A projection onto 3 states or 2 states (buried/exposed) enables the     *
*  compilation of a 3- and 2-state prediction accuracy.  PHD reaches an    *
*  overall 3-state accuracy of:                                            *
*     Q3 = 57.5%                                                           *
*  (compared to 35% for random prediction and 70% for homology modelling). *
*                                                                          *
*  In detail:                                                              *
*                                                                          *
*  +-----------------------------------+-------------------------+         *
*  | Qburied       (% of observed)=77% | Qb (% of predicted)=60% |         *
*  | Qintermediate (% of observed)= 9% | Qi (% of predicted)=44% |         *
*  | Qexposed      (% of observed)=78% | Qe (% of predicted)=56% |         *
*  +-----------------------------------+-------------------------+         *
*                                                                          *
*                                                                          *
*  10-state accuracy                                                       *
*  .................                                                       *
*                                                                          *
*  The network predicts relative solvent accessibility in 10 states, with  *
*  state i (i = 0-9) corresponding to a relative solvent accessibility of  *
*  i*i %.  The 10-state accuracy of the network is:                        *
*                                                                          *
*     Q10 = 24.5%                                                          *
*                                                                          *
*..........................................................................*
*                                                                          *
*  These percentages are defined by:                                       *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                       *
*                                                                          *
*  |                     number of correctly predicted residues            *
*  |Q3 		      = ---------------------------------------      (*100)*
*  |                           number of all residues                      *
*  |                                                                       *
*  |                     no of res. correctly predicted to be buried       *
*  |Qburied (% of obs) = ------------------------------------------- (*100)*
*  |                     no of all res. observed to be buried              *
*  |                                                                       *
*  |                                                                       *
*  |                     no of res. correctly predicted to be buried       *
*  |Qburied (% of pred)= ------------------------------------------- (*100)*
*  |                     no of all residues predicted to be buried         *
*                                                                          *
*..........................................................................*
*                                                                          *
*  Averaging over single chains                                            *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                            *
*                                                                          *
*  The most reasonable way to compute the overall accuracies is the above  *
*  quoted percentage of correctly predicted residues.  However, since the  *
*  user is mainly interested in the expected performance of the prediction *
*  for a particular protein, the mean value when averaging over protein    *
*  chains might be of help as well.  Computing first the correlation       *
*  between observed and predicted accessibility for each protein chan, and *
*  then averaging over all 238 chains yields the following average:        *
*                                                                          *
*  +-------------------------------====--+                                 *
*  | corr/averaged over chains   = 0.53  |                                 *
*  +-------------------------------====--+                                 *
*  | standard deviation          = 0.11  |                                 *
*  +-------------------------------------+                                 *
*                                                                          *
*..........................................................................*
*                                                                          *
*  Further details of performance accuracy                                 *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                 *
*                                                                          *
*  The accuracy matrix in detail:                                          *
*  ..............................                                          *
*                                                                          *
* -------+----------------------------------------------------+----------- *
*  \ PHD |    0    1   2   3    4    5     6     7    8    9  |  SUM  %obs *
* -------+----------------------------------------------------+----------- *
* OBS  0 | 8611  140   8  44   82  169   772   334   27    0  | 10187 16.6 *
* OBS  1 | 4367  164   0  50  106  231   738   346   44    3  |  6049  9.8 *
* OBS  2 | 3194  168   1  68  125  303   951   513   42    7  |  5372  8.7 *
* OBS  3 | 2760  159   8  80  136  327  1246   746   58   19  |  5539  9.0 *
* OBS  4 | 2312  144   2  72  166  396  1615  1245  124   19  |  6095  9.9 *
* OBS  5 | 1873   96   3  84  138  425  1979  1834  187   27  |  6646 10.8 *
* OBS  6 | 1387   67   1  60   80  278  2237  2627  231   51  |  7019 11.4 *
* OBS  7 | 1082   35   0  32   56  225  1871  3107  302   60  |  6770 11.0 *
* OBS  8 |  660   25   0  27   43  136  1206  2374  325   87  |  4883  7.9 *
* OBS  9 |  325   20   2  27   29   74   648  1159  366  214  |  2864  4.7 *
* -------+----------------------------------------------------+----------- *
* SUM    |26571 1018  25 544  961 2564 13263 14285 1706  487  |            *
* %pred  | 43.3  1.7 0.0 0.9  1.6  4.2  21.6  23.3  2.8  0.8  |            *
* -------+----------------------------------------------------+----------- *
*                                                                          *
*  Note: This table is to be read in the following manner:                 *
*        8611 of all residues predicted to be in exposed by 0%, were       *
*        observed with 0% relative accessibility.  However, 325 of all     *
*        residues predicted to have 0% are observed as completely exposed  *
*        (obs = 9 -> rel. acc. >= 81%).  The term "observed" refers to the *
*        DSSP compilation of area of solvent accessibility calculated from *
*        3D coordinates of experimentally determined structures (Diction-  *
*        ary of Secondary Structure  of Proteins: Kabsch & Sander (1983)   *
*        Biopolymers, 22, 2577-2637).                                      *
*                                                                          *
*                                                                          *
*  Accuracy for each amino acid:                                           *
*  .............................                                           *
*                                                                          *
*  +---+------------------------------+-----+-------+------+               *
*  |AA |   Q3 b%o b%p i%o i%p e%o e%p | Q10 |  corr |    N |               *
*  +---+------------------------------+-----+-------+------+               *
*  | A | 59.0  87  60   2  38  66  57 |  31 | 0.530 | 5054 |               *
*  | C | 62.0  91  67   5  39  25  21 |  34 | 0.244 |  893 |               *
*  | D | 56.5  21  45   6  49  94  57 |  20 | 0.321 | 3536 |               *
*  | E | 60.8   9  40   3  41  98  61 |  21 | 0.347 | 3743 |               *
*  | F | 63.3  94  67   9  46  29  37 |  27 | 0.366 | 2436 |               *
*  | G | 52.1  75  51   1  31  67  53 |  22 | 0.405 | 4787 |               *
*  | H | 50.9  63  53  23  45  71  50 |  18 | 0.442 | 1366 |               *
*  | I | 64.9  95  68   6  41  30  38 |  34 | 0.360 | 3437 |               *
*  | K | 66.6   2  11   2  37  98  67 |  23 | 0.267 | 3652 |               *
*  | L | 61.6  93  65   8  44  31  40 |  31 | 0.368 | 5016 |               *
*  | M | 60.1  92  64   5  39  45  44 |  29 | 0.452 | 1371 |               *
*  | N | 55.5  45  45   8  38  87  59 |  17 | 0.410 | 2923 |               *
*  | P | 53.0  48  48   9  39  83  56 |  18 | 0.364 | 2920 |               *
*  | Q | 54.3  27  44   7  44  92  56 |  20 | 0.344 | 2225 |               *
*  | R | 49.9  15  47  36  47  76  51 |  18 | 0.372 | 2765 |               *
*  | S | 55.6  69  53   3  51  81  56 |  22 | 0.464 | 3981 |               *
*  | T | 51.8  61  51   8  38  78  53 |  21 | 0.432 | 3740 |               *
*  | V | 61.1  93  65   5  40  39  42 |  34 | 0.418 | 4156 |               *
*  | W | 56.2  85  62  20  49  29  27 |  21 | 0.318 |  891 |               *
*  | Y | 49.7  73  52  33  49  36  38 |  19 | 0.359 | 2301 |               *
*  +---+------------------------------+-----+-------+------+               *
*                                                                          *
*  Abbreviations:                                                          *
*                                                                          *
*  AA:   amino acid in one-letter code                                     *
*  b%o, i%o, e%o:   = Qburied, Qintermediate, Qexposed (% of observed),    *
*        i.e. percentage of correct prediction in each state, see above    *
*  b%p, i%p, e%p:   = Qburied, Qintermediate, Qexposed (% of predicted),   *
*        i.e. probability of correct prediction in each state, see above   *
*  b%o:  = Qburied (% of observed), see above                              *
*  Q10:  percentage of correctly predicted residues in each of the 10      *
*        states of predicted relative accessibility.                       *
*  corr: correlation between predicted and observed rel. acc.              *
*  N:    number of residues in data set                                    *
*                                                                          *
*                                                                          *
*  Accuracy for different secondary structure:                             *
*  ...........................................                             *
*                                                                          *
*  +--------+------------------------------+----+-------+-------+          *
*  | type   |   Q3 b%o b%p i%o i%p e%o e%p |Q10 |  corr |     N |          *
*  +--------+------------------------------+----+-------+-------+          *
*  | helix  | 59.5  79  64   8  44  80  56 | 27 | 0.574 | 20100 |          *
*  | strand | 61.3  84  73   9  46  69  37 | 35 | 0.524 | 13356 |          *
*  | loop   | 54.4  64  43  11  44  78  61 | 18 | 0.442 | 27968 |          *
*  +--------+------------------------------+----+-------+-------+          *
*                                                                          *
*  Abbreviations as before.                                                *
*                                                                          *
****************************************************************************
*                                                                          *
*  Position-specific reliability index                                     *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                     *
*                                                                          *
*  The network predicts the 10 states for relative accessibility using real*
*  numbers from the output units. The prediction is assigned by choosing   *
*  the maximal unit ("winner takes all").  However, the real numbers       *
*  contain additional information.                                         *
*  E.g. the difference between the maximal and the second largest output   *
*  unit (with the constraint that the second largest output is compiled    *
*  among all units at least 2 positions off the maximal unit) can be used  *
*  to derive a "reliability index".  This index is given for each residue  *
*  along with the prediction.  The index is scaled to have values between  *
*  0 (lowest reliability), and 9 (highest).                                *
*  The accuracies (Q3, corr, asf.) to be expected for residues with values *
*  above a particular value of the index are given below as well as the    *
*  fraction of such residues (%res).:                                      *
*                                                                          *
*  +---+------------------------------+----+-------+-------+               *
*  |RI |   Q3 b%o b%p i%o i%p e%o e%p |Q10 |  corr |  %res |               *
*  +---+------------------------------+----+-------+-------+               *
*  | 0 | 57.5  77  60   9  44  78  56 | 24 | 0.535 | 100.0 |               *
*  | 1 | 59.1  76  63   9  45  82  57 | 25 | 0.560 |  91.2 |               *
*  | 2 | 61.7  79  66   4  47  87  58 | 27 | 0.594 |  77.1 |               *
*  | 3 | 66.6  87  70   1  51  89  63 | 30 | 0.650 |  57.1 |               *
*  | 4 | 70.0  89  72   0  83  91  67 | 32 | 0.686 |  45.8 |               *
*  | 5 | 72.9  92  75   0   0  93  70 | 34 | 0.722 |  35.6 |               *
*  | 6 | 76.3  95  77   0   0  93  75 | 36 | 0.769 |  24.7 |               *
*  | 7 | 79.0  97  79   0   0  93  78 | 39 | 0.803 |  16.0 |               *
*  | 8 | 80.9  98  80   0   0  91  81 | 43 | 0.824 |   9.6 |               *
*  | 9 | 81.2  99  80   0   0  88  83 | 45 | 0.828 |   5.9 |               *
*  +---+------------------------------+----+-------+-------+               *
*                                                                          *
*  Abbreviations as before.                                                *
*                                                                          *
*  The above table gives the cumulative results, e.g. 45.8% of all         *
*  residues have a reliability of at least 4.  The correlation for this    *
*  most reliably predicted half of the residues is 0.686, i.e. a value     *
*  comparable to what could be expected if homology modelling were         *
*  possible.  For this subset of 45.8% of all residues, 89% of the buried  *
*  residues are correctly predicted, and 72% of all residues predicted to  *
*  be buried are correct.                                                  *
*                                                                          *
*..........................................................................*
*                                                                          *
*  The following table gives the non-cumulative quantities, i.e. the       *
*  values per reliability index range.  These numbers answer the question: *
*  how reliable is the prediction for all residues labeled with the        *
*  particular index i.                                                     *
*                                                                          *
*  +---+------------------------------+----+-------+-------+               *
*  |RI |   Q3 b%o b%p i%o i%p e%o e%p |Q10 |  corr |  %res |               *
*  +---+------------------------------+----+-------+-------+               *
*  | 0 | 40.9  79  40  16  41  21  40 | 14 | 0.175 |   8.8 |               *
*  | 1 | 45.4  61  46  28  44  48  44 | 17 | 0.278 |  14.1 |               *
*  | 2 | 47.4  53  52  10  46  80  44 | 19 | 0.343 |  19.9 |               *
*  | 3 | 52.9  75  59   4  50  77  47 | 23 | 0.439 |  11.4 |               *
*  | 4 | 60.0  81  63   0  83  84  56 | 25 | 0.547 |  10.1 |               *
*  | 5 | 65.2  82  70   0   0  93  62 | 28 | 0.607 |  10.9 |               *
*  | 6 | 71.3  90  72   0   0  94  70 | 31 | 0.692 |   8.8 |               *
*  | 7 | 76.0  94  76   0   0  95  75 | 34 | 0.762 |   6.3 |               *
*  | 8 | 80.5  97  81   0   0  94  79 | 39 | 0.808 |   3.8 |               *
*  | 9 | 81.2  99  80   0   0  88  83 | 45 | 0.828 |   5.9 |               *
*  +---+------------------------------+----+-------+-------+               *
*                                                                          *
*  For example, for residues with RI = 4 83% of all predicted intermediate *
*  residues are correctly predicted as such.                               *
*                                                                          *
*                                                                          *
****************************************************************************

PHD predictions

TOP - BOTTOM - PHD

PHD predictions for predict_h16490

Different levels of data:

PHD brief
PHD normal

PHDsec summary overall your protein can be classified as:
alpha-beta given the following classes:
- 'all-alpha': %H > 45% AND %E < 5%
- 'all-beta': %H < 5% AND %E > 45%
- 'alpha-beta': %H > 30% AND %E > 20%
- 'mixed': all others

Predicted secondary structure composition for your protein:

%H: 31.2 %E: 24.3 %L: 44.5

Residue composition for your protein:

%A: 4.3 %C: 2.6 %D: 6.2 %E: 7.2 %F: 1.4

%G: 6.2 %H: 3.1 %I: 6.2 %K: 5.3 %L: 9.6

%M: 2.4 %N: 4.8 %P: 6.7 %Q: 2.6 %R: 7.0

%S: 4.3 %T: 5.5 %V: 9.4 %W: 1.0 %Y: 3.9

AA : amino acid sequence

PHD_sec: PHD predicted secondary structure: H=helix, E=extended (sheet), blank=other (loop)
PHD = PHD: Profile network prediction HeiDelberg

Rel_sec: reliability index for PHDsec prediction (0=low to 9=high)
Note: for the brief presentation strong predictions marked by '*'

SUB_sec: subset of the PHDsec prediction, for all residues with an expected average accuracy > 82% (tables in header)
NOTE: for this subset the following symbols are used:
L: is loop (for which above ' ' is used)
.: means that no prediction is made for this residue, as the reliability is: Rel < 5

pH_sec: 'probability' for assigning helix (1=high, 0=low)

pE_sec: 'probability' for assigning strand (1=high, 0=low)

pL_sec: 'probability' for assigning neither helix, nor strand (1=high, 0=low)

P_3_acc: PHD predicted relative solvent accessibility (acc) in 3 states: b = 0-9%, i = 9-36%, e = 36-100%.

Rel_acc: reliability index for PHDacc prediction (0=low to 9=high)
Note: for the brief presentation strong predictions marked by '*'

SUB_acc: subset of the PHDacc prediction, for all residues with an expected average correlation > 0.69 (tables in header)
NOTE: for this subset the following symbols are used:
I: is intermediate (for which above ' ' is used)
.: means that no prediction is made for this residue, as the reliability is: Rel < 4

PHD_acc: PHD predicted relative solvent accessibility (acc) in 10 states: a value of n (=0-9) corresponds to a relative acc. of between n*n % and (n+1)*(n+1) % (e.g. for n=5: 16-25%).

PHD results (brief)

PHD results (normal)

....,....1....,....2....,....3....,....4....,....5....,....6....,....7....,....8....,....9....,....10...,....11...,....12...,....13...,....14...,....15...,....16...,....17...,....18...,....19...,....20...,....21...,....22...,....23...,....24...,....25...,....26...,....27...,....28...,....29...,....30...,....31...,....32...,....33...,....34...,....35...,....36...,....37...,....38...,....39...,....40...,....41...,....42 AA MINRRYELFKDVSDADWNDWRWQVRNRIETVEELKKYIPLTKEEEEGVAQCVKSLRMAITPYYLSLIDPNDPNDPVRKQAIPTALELNKAAADLEDPLHEDTDSPVPGLTHRYPDRVLLLITDMCSMYCRHCTRRRFAGQSDDSMPMERIDKAIDYIRNTPQVRDVLLSGGDALLVSDETLEYIIAKLREIPHVEIVRIGSRTPVVLPQRITPELVNMLKKYHPVWLNTHFNHPNEITEESTRACQLLADAGVPLGNQSVLLRGVNDCVHVMKELVNKLVKIRVRPYYIYQCDLSLGLEHFRTPVSKGIEIIEGLRGHTSGYCVPTFVVDAPGGGGKTPVMPNYVISQSHDKVILRNFEGVITTYSEPINYTPGCNCDVCTGKKKVHKVGVAGLLNGEGMALEPVGLERNKRHVQE PHD_sec HHHHH HHHHHHHHHHHHHH HHHHHHHH HHHHHHHHHHH EE HHHHH EEEEE HHH E EEEEEEE EEEEEEEEEEEE HHHHHHHHHHHH EEEEE HHHH HHHHHHHHHHHHH EEEEEE EE HHHHHHHHHHHH EEEEE HHHHHHHHHHHHHHHH EE EEEEEEE HHHHHHHHHHHHHH EEEEEEEEE EEEE HHHHHHHHHHHHH EEE EEEEEEEEEE EEEEEE EEEEE EEEE EEEEE E Rel_sec 98843676516881219999999999734489999881788399999996412213222427874137899996323465267133011237679965466799999974001634999996763454312332102457889999436899999997442221567619992431189999999999608995999961566211441357999999997298499982899759999999999999956981414038897438919999999999943476458999964326752553252158999999967415656622224289999983553435775368769988276489997267669989976536752466467716774687842146886532224469 SUB_sec LLL..HHHH.LLL...HHHHHHHHHHH...HHHHHHH.LLL.HHHHHHHH...........HHH...LLLLLLL....EE.LL........LLLLLLL.LLLLLLLLLL....L..EEEEEELL..E...........LLLLLLLL..HHHHHHHHHH......EEEE.LLL.....HHHHHHHHHHHH.LLLEEEEEE.LLL.......HHHHHHHHHHH.LL.EEEE.LLLLHHHHHHHHHHHHHHHHLLL......EEEE..LL.HHHHHHHHHHH...LL.EEEEEEE...LLL.EE..L..HHHHHHHHHHH..LLLLL......LLLLLLL.EE...EEEE.LLLEEEEE.LL.EEEEE.LLLLLLLLLLLL.EEE..LL.LLL.EEE.LLLL....LLLLL......LL P_3_acc ebeeebebbeebeeeebeebbebbeeebee eebee bebeeeeeeebeebbeebebeb e bbebbeee eeebbebbbbeeeeebeeeeeeeee eeeeeeeebbbbbbeb b bbbbbeb bbbbbbbbbbe bbeeeeeeeeeeebeebbbbbee eebeebbbbbb bbbbeeeebebbbeebeebbbbbbbbbbb bbbbbbe bbeebbebbee ebbbbbbbbbbeebeeebeebbeebeebbbebbbbbbbbebbbebbebbeebbeebbebeb bbbbbbbeeeebbe b bbeeebbebbeebeeebbbbbbbebbbebeeee ebbbbeebbbeeeeeeb beb ebbbbbbeeeeeeebeeeeebbeeeeeeeebbbbbbbeeeebbbebeebe beeebee Rel_acc 00121007431213210203000612132003231200212220332022203140116010530140320011031005200001201111101000222320031161120010878570009162922750104010013211231121760381101261088423202512111181076216114004059051303235340151219525624011449523400311513232354125030241402236791240002168127222722217051672120102401070100234018521712035302741572110033010141012501022113051002102101110110102010311103313120565733013220311101102111226 SUB_acc .......bb..............b..............................b...b...b...b............b............................b.......bbbbb...b.b.b..bb...b...............bb..b.....b..bbb.....b......b..bb..b..b..b.bb.b......b.b..b...bb.bb.e...bbbb..b.....b......bb..b....b.b....bbb..b.....bb..b...b....b.b.bb.......b...b......b..bb..b....b...bb.bb...........b....b.........b..................................bbbb......................e

GLOBE prediction of globularity

--- 
--- GLOBE: prediction of protein globularity
--- 
--- nexp =   213    (number of predicted exposed residues)
--- nfit =   165    (number of expected exposed residues
--- diff =    48.00 (difference nexp-nfit)
--- =====> your protein appears as compact, as a globular domain
--- 
--- 
--- GLOBE: further explanations preliminaryily in:
---        http://www.columbia.edu/~rost/Papers/98globe.html
--- 
--- END of GLOBE

END of results for file predict_h16490

Quotes for methods

PredictProtein: PredicProtein: B Rost (1996) Methods in Enzymology, 266:525-539
- Url: http://dodo.cpmc.columbia.edu
- Version: 1.99.08
- Description: PredictProtein is the acronym for all prediction programs run.
PROSITE: A Bairoch, P Bucher & K Hofmann (1997) Nucleic Acids Research, 25:217-221
- Author: A Bairoch, bairoch@cmu.unige.ch P Bucher & K Hofmann
- Contact: bairoch@cmu.unige.ch
- Url: http://www.expasy.ch/prosite/
- Version: 99.07
- Description: PROSITE is a database of functional motifs. ScanProsite, finds all functional motifs in your sequence that are annotated in the ProSite db.
SEG: J C Wootton & S Federhen (1996) Methods in Enzymology, 266:554-571
- Author: J C Wootton & S Federhen, wootton@ncbi.nlm.nih.gov
- Contact: wootton@ncbi.nlm.nih.gov
- Version: 1994
- Description: SEG divides sequences into regions of low-, and high-complexity. Low-complexity regions typically correspond to 'simple sequences' or 'compositionally-biased' regions.
ProDom: ELL Sonnhammer & D Kahn (1994) Protein Science, 3:482-492
- Author: LL Sonnhammer; J Gouzy, F Corpet, F Servant, D Kahn, dkahn@zyx.toulouse.inra.fr
- Contact: dkahn@zyx.toulouse.inra.fr
- Url: http://protein.toulouse.inra.fr/prodom.html
- Version: 99_2
- Description: ProDom is a database of putative protein domains. The database is searched with BLAST for domains corresponding to your protein.
MaxHom: MaxHom: C Sander R Schneider (1991) Proteins, 9:56-68
- Author: C Sander & R Schneider, schneider@lion-ag.de
- Contact: schneider@lion-ag.de
- Version: 1.99.04
- Description: MaxHom is a dynamic multiple sequence alignment program which finds similar sequences in a database.
MView: MView: N P Brown, C Leroy & C Sander (1998) Bioinformatics, 14:380-381
- Author: N Brown, nbrown@nimr.mrc.ac.uk
- Contact: nbrown@nimr.mrc.ac.uk
- Url: http://mathbio.nimr.mrc.ac.uk/~nbrown/mview/
- Copyright: Copyright (C) Nigel P. Brown, 1997-1998. All rights reserved.
- Version: 1.40.2
- Description: MView is a program converting multiple sequence alignments into fancy HTML formatted output.
PHD: B Rost (1996) Methods in Enzymology, 266:525-539
- Author: B Rost
- Version: 1.96
- Description: PHD is a suite of programs predicting 1D structure (secondary structure, solvent accessibility) from multiple sequence alignments.
PHDsec: B Rost & C Sander (1993) J. of Molecular Biology, 232:584-599
- Author: B Rost
- Version: 1.96
- Description: PHD predicts secondary structure from multiple sequence alignments.
PHDacc: B Rost & C Sander (1994) Proteins, 20:216-226
- Author: B Rost
- Version: 1.96
- Description: PHD predicts per residue solvent accessibility from multiple sequence alignments.
GLOBE: B Rost (1998) unpublished
- Author: B Rost
- Version: 1.98.05
- Description: GLOBE predicts the globularity of a protein.

Links:

TOP

%A: 4.3	%C: 2.6	%D: 6.2	%E: 7.2	%F: 1.4
%G: 6.2	%H: 3.1	%I: 6.2	%K: 5.3	%L: 9.6
%M: 2.4	%N: 4.8	%P: 6.7	%Q: 2.6	%R: 7.0
%S: 4.3	%T: 5.5	%V: 9.4	%W: 1.0	%Y: 3.9

AA :	amino acid sequence
PHD_sec:	PHD predicted secondary structure: H=helix, E=extended (sheet), blank=other (loop) PHD = PHD: Profile network prediction HeiDelberg
Rel_sec:	reliability index for PHDsec prediction (0=low to 9=high) Note: for the brief presentation strong predictions marked by '*'
SUB_sec:	subset of the PHDsec prediction, for all residues with an expected average accuracy > 82% (tables in header) NOTE: for this subset the following symbols are used: L: is loop (for which above ' ' is used) .: means that no prediction is made for this residue, as the reliability is: Rel < 5

pH_sec:	'probability' for assigning helix (1=high, 0=low)
pE_sec:	'probability' for assigning strand (1=high, 0=low)
pL_sec:	'probability' for assigning neither helix, nor strand (1=high, 0=low)
P_3_acc:	PHD predicted relative solvent accessibility (acc) in 3 states: b = 0-9%, i = 9-36%, e = 36-100%.
Rel_acc:	reliability index for PHDacc prediction (0=low to 9=high) Note: for the brief presentation strong predictions marked by '*'
SUB_acc:	subset of the PHDacc prediction, for all residues with an expected average correlation > 0.69 (tables in header) NOTE: for this subset the following symbols are used: I: is intermediate (for which above ' ' is used) .: means that no prediction is made for this residue, as the reliability is: Rel < 4

PHD_acc:	PHD predicted relative solvent accessibility (acc) in 10 states: a value of n (=0-9) corresponds to a relative acc. of between nn % and (n+1)(n+1) % (e.g. for n=5: 16-25%).