Links: BOTTOM PredictProtein Burkhard Rost




Results from PredictProtein for predict_h8378

TOC for file /home/phd/server/work/predict_h8378

  1. The following information has been received by the server (TOC)
  2. PROSITE motif search (A Bairoch; P Bucher and K Hofmann) (TOC)
  3. SEG low-complexity regions (J C Wootton & S Federhen) (TOC)
  4. ProDom domain search (E Sonnhammer; Corpet, Gouzy, D Kahn) (TOC)
  5. MAXHOM alignment header (TOC)
  6. MAXHOM alignment (TOC)
  7. PHD information about accuracy (TOC)
  8. PHD predictions (TOC)
  9. GLOBE prediction of globularity (TOC)

END of TOC




BEG of results for file /home/phd/server/work/predict_h8378


The following information has been received by the server


reference predict_h8378 (Jun 19, 2000 20:55:42)
reference pred_h8378 (Jun 19, 2000 20:56:08)
PPhdr from: kapilm@cs.brandeis.edu
PPhdr resp: MAIL
PPhdr orig: HTML
PPhdr want: HTML
PPhdr password(###)
prediction of: - default prediction of: - PHDsec PHDacc PHDhtm ProSite SEG ProDom
return msf format
ret html
# default: single protein sequence description=Serine Racemase
MCAQYCISFADVEKAHINIQDSIHLTPVLTSSILNQIAGRNLFFKCELFQKTGSFKIRGALNAIRGLIPDTPEEKPKAVVTHSSGNHGQALTYAAKLEGIPAYIVVPQTAPNCKKLAIQAYGASIVYCDPSDESREKVTQRIMQETEGILVHPNQEPAVIAGQGTIALEVLNQVPLVDALVVPVGGGGMVAGIAITIKALKPSVKVYAAEPSNADDCYQSKLKGELTPNLHPPETIADGVKSSIGLNTWPIIRDLVDDVFTVTEDEIKYATQLVWGRMKLLIEPTAGVALAAVLSQHFQTVSPEVKNVCIVLSGGNVDLTSLNWVGQAERPAPYQTVSV


PROSITE motif search (A Bairoch; P Bucher and K Hofmann)


TOP - BOTTOM - PROSITE
-------------------------------------------------------------
Pattern-ID: PKC_PHOSPHO_SITE PS00005 PDOC00005
Pattern-DE: Protein kinase C phosphorylation site
Pattern:    [ST].[RK]
   54       SFK
   139      TQR
   196      TIK
   203      SVK

Pattern-ID: CK2_PHOSPHO_SITE PS00006 PDOC00006
Pattern-DE: Casein kinase II phosphorylation site
Pattern:    [ST].{2}[DE]
   8        SFAD
   71       TPEE
   212      SNAD
   235      TIAD
   261      TVTE

Pattern-ID: MYRISTYL PS00008 PDOC00008
Pattern-DE: N-myristoylation site
Pattern:    G[^EDRKHPFYW].{2}[STAGCN][^P]
   59       GALNAI
   88       GQALTY
   187      GGMVAG
   239      GVKSSI
   287      GVALAA

Pattern-ID: DEHYDRATASE_SER_THR PS00165 PDOC00149
Pattern-DE: Serine/threonine dehydratases pyridoxal-phosphate attachment site
Pattern:    [DESH].{4,5}[STVG].[AS][FYI]K[DLIFSA][RVMF][GA][LIVMGA]
   47       ELFQKTGSFKIRGA



SEG low-complexity regions (J C Wootton & S Federhen)


TOP - BOTTOM - SEG

>prot (#) ppOld, default: single protein sequence description=serine racemase /home/phd/server/work/predict_h8378
MCAQYCISFADVEKAHINIQDSIHLTPVLTSSILNQIAGRNLFFKCELFQKTGSFKIRGA LNAIRGLIPDTPEEKPKAVVTHSSGNHGQALTYAAKLEGIPAYIVVPQTAPNCKKLAIQA YGASIVYCDPSDESREKVTQRIMQETEGILVHPNQEPAVIAGQGTIALEVLNQVPLVDAL xxxxxxxxxxxx IAITIKALKPSVKVYAAEPSNADDCYQSKLKGELTPNLHPPETIADGV KSSIGLNTWPIIRDLVDDVFTVTEDEIKYATQLVWGRMKLLIEPTAGVALAAVLSQHFQT VSPEVKNVCIVLSGGNVDLTSLNWVGQAERPAPYQTVSV


ProDom domain search (E Sonnhammer; Corpet, Gouzy, D Kahn)


TOP - BOTTOM - ProDom - MView
Identities computed with respect to: (query) prot
Colored by: consensus/70% and property
HSP processing: ranked
                                                                           52 [       .         .         .         .         1         .         .         .         .         :         .         .         .         .         2         .         .         .         .         :         .         .         .         . ] 292
  prot           (#) ppOld, default: single ... score      P(N)  N 100.0%     TGSFKIRGALNAIXXXXXXXPEEKPKAVVTHSSGNHGQALTYAAKLEGIPAYIVVPQTXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXVDALVVPVGGGGMVAGIAITIKXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXDVFTVTEDEIKYATQLVWGRMKLLIEPTAGVALAA    
1 PD000323       p99.2 (175) TRPB(29) CYSK(1...    56     0.019  4  35.2%     SGSYKDRGAYSMI-------PGKKSVIVESTSGNTGAVALAMVAARLGLKCVIVMPES-------------------------------------------------------------------VDVIVASVGTGGTIAGVARYLK-----------------------------------------------------------EAVSVSDEEALEAGLLLGESEGIVPEPASAAAIAA    
  consensus/100%                                                              oGSaK.RGAhshI       PtcKshhV.opSusstu.ALshsAth.Gl.shIVhPpo                                                                   VDslVssVGsGGhlAGlAhhlK                                                           -shoVo--Ehh.As.Llhtp.tll.EPsuusAlAA    
  consensus/90%                                                               oGSaK.RGAhshI       PtcKshhV.opSusstu.ALshsAth.Gl.shIVhPpo                                                                   VDslVssVGsGGhlAGlAhhlK                                                           -shoVo--Ehh.As.Llhtp.tll.EPsuusAlAA    
  consensus/80%                                                               oGSaK.RGAhshI       PtcKshhV.opSusstu.ALshsAth.Gl.shIVhPpo                                                                   VDslVssVGsGGhlAGlAhhlK                                                           -shoVo--Ehh.As.Llhtp.tll.EPsuusAlAA    
  consensus/70%                                                               oGSaK.RGAhshI       PtcKshhV.opSusstu.ALshsAth.Gl.shIVhPpo                                                                   VDslVssVGsGGhlAGlAhhlK                                                           -shoVo--Ehh.As.Llhtp.tll.EPsuusAlAA    
--- ------------------------------------------------------------
--- 
--- Again: these results were obtained based on the domain data-
--- base collected by Daniel Kahn and his coworkers in Toulouse.
--- 
--- PLEASE quote: 
---       F Corpet, J Gouzy, D Kahn (1998).  The ProDom database
---       of protein domain families. Nucleic Ac Res 26:323-326.
--- 
--- The general WWW page is on:
----      ---------------------------------------
---       http://www.toulouse.inra.fr/prodom.html
----      ---------------------------------------
--- 
--- For WWW graphic interfaces to PRODOM, in particular for your
--- protein family, follow the following links (each line is ONE
--- single link for your protein!!):
--- 
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=PD000323 ==> multiple alignment, consensus, PDB and PROSITE links of domain PD000323
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=PD000323 ==> graphical output of all proteins having domain PD000323
--- 
--- NOTE: if you want to use the link, make sure the entire line
---       is pasted as URL into your browser!
--- 
--- END of PRODOM
--- ------------------------------------------------------------


MAXHOM alignment header


--- ------------------------------------------------------------
--- MAXHOM multiple sequence alignment
--- ------------------------------------------------------------
--- 
--- MAXHOM ALIGNMENT HEADER: ABBREVIATIONS FOR SUMMARY
--- ID           : identifier of aligned (homologous) protein
--- STRID        : PDB identifier (only for known structures)
--- PIDE         : percentage of pairwise sequence identity
--- WSIM         : percentage of weighted similarity
--- LALI         : number of residues aligned
--- NGAP         : number of insertions and deletions (indels)
--- LGAP         : number of residues in all indels
--- LSEQ2        : length of aligned sequence
--- ACCNUM       : SwissProt accession number
--- NAME         : one-line description of aligned protein
--- 
--- MAXHOM ALIGNMENT HEADER: SUMMARY
ID         STRID  IDE WSIM LALI NGAP LGAP LEN2 ACCNUM NAME                     
ykv8_yeast         40   52  319    4    8  326 P36007 HYPOTHETICAL 34.9 KD PROT
thd2_ecoli         35   47  320    4    6  329 P05792 THREONINE DEHYDRATASE CAT
y4tj_rhisn         32   44  328    4   10  332 P55664 PUTATIVE THREONINE DEHYDR
thdh_yeast         31   42  319    6   13  576 P00927 THREONINE DEHYDRATASE PRE
thdh_arxad         31   43  320    5    9  550 O42615 THREONINE DEHYDRATASE PRE
thd1_haein         31   44  327    5   10  513 P46493 DEAMINASE).              
thd1_salty         31   43  334    6   16  514 P20506 DEAMINASE).              
thd1_lyces         30   41  323    5   11  595 P25306 DEAMINASE).              
thd1_ecoli 1TDJ    30   42  334    6   16  514 P04968 DEAMINASE).              
thd1_burce         30   41  316    6   10  507 P53607 DEAMINASE).              
thd1_myctu         30   37  318    5   13  429 Q10766 DEAMINASE).              
thd1_bacsu         28   39  320    6   12  422 P37946 DEAMINASE).              
thd1_lacla         28   39  312    6   15  441 Q02145 DEAMINASE).              
thd1_soltu         28   41  185    3    3  359 P31212 (FRAGMENT).              
sdhl_rat           28   28  298    9   59  362 P09367 DEHYDRATASE (EC 4.2.1.16)
sdhl_human         28   30  297    7   16  328 P20132 L-SERINE DEHYDRATASE (EC 
thd1_corgl         27   33  313    7   18  436 Q04513 DEAMINASE).              
--- 
--- MAXHOM ALIGNMENT: IN MSF FORMAT


--- ------------------------------------------------------------
--- 3D homologue: the known structure that appeared to have sig-
--- 3D homologue: nificant sequence identity to your protein is:
--- 3D homologue: 1TDJ, 
.
--- 3D homologue: Note: we do  NOT  check whether the similarity
--- 3D homologue:       is in the region for which structure has
--- 3D homologue:       been determined.  Thus, please verify!  
--- ------------------------------------------------------------

--- 
--- Version of database searched for alignment:
--- SWISS-PROT release 38.0 (7/99) with 80000 proteins
--- 

MAXHOM alignment


TOP - BOTTOM - MaxHom - MView
Identities computed with respect to: (1) predict_h8370
Colored by: consensus/70% and property
                           1 [        .         .         .         .         :         .         .         .         .         1         .         .         .         .         :         .         .         .         .         2         .         .         .         .         :         .         .         .         .         3         .         .         .        ] 339
 1 predict_h8370  100.0%     MCAQYCISFADVEKAHINIQDSIHLTPVLTSSILNQIAGRNLFFKCELFQKTGSFKIRGALNAIRGLIPDTPEEKPKAVVTHSSGNHGQALTYAAKLEGIPAYIVVPQTAPNCKKLAIQAYGASIVYCDPSDESREKVTQRIMQETEGILVHPNQEPAVIAGQGTIALEVLNQVPLVDALVVPVGGGGMVAGIAITIKALKPSVKVYAAEPSNADDCYQSKLKGELTPNLHPPETIADGVKSSIGLNTWPIIRDLVDDVFTVTEDEIKYATQLVWGRMKLLIEPTAGVALAAVLSQHFQTVSPEVKNVCIVLSGGNVDLTSLNWVGQAERPAPYQTVSV    
 2 ykv8_yeast      39.6%     -------TYGDVLDASNRIKEYVNKTPVLTSRMLNDRLGAQIYFKGENFQRVGAFKFRGAMNAVSKL---SDEKRSKGVIAFSSGNHAQAIALSAKLLNVPATIVMPEDAPALKVAATAGYGAHIIRYNRYTEDREQIGRQLAAEHGFALIPPYDHPDVIAGQGTSAKELLEEVGQLDALFVPLGGGGLLSGSALAARSLSPGCKIFGVEPEAGNDGQQSFRSGSIV-HINTPKTIADGAQthLGEYTFAIIRENVDDILTVSDQELVKCMHFLAERMKVVVEPTACLGFAGALLKKEELVG---KKVGIILSGGNVDMKRYATLISGKEDGP------    
 3 thd2_ecoli      33.8%     ITYDLPVAIDDIIEAKQRLAGRIYKTGMPRSNYFSERCKGEIFLKFENMQRTGSFKIRGAFNKLSSL---TDAEKRKGVVACSAGNHAQGVSLSCAMLGIDGKVVMPKGAPKSKVAATCDYSAEVVLHGDNFNDTIAKVSEIVEMEGRIFIPPYDDPKVIAGQGTIGLEIMEDLYDVDNVIVPIGGGGLIAGIAVAIKSINPTIRVIGVQSENVHGMAASFHSGEITTHRT-TGTLADGCdsRPGNLTYEIVRELVDDIVLVSEDEIRNSMIALIQRNKVVTEGAGALACAALLSGKLDQYIQNRKTV-SIISGGNIDLSRVSQI--------------    
 4 y4tj_rhisn      31.0%     MNELSNLSLESIERARERIEEHVFRTPLTTSRSLTELTGTQVSLKLEHYQRTGSFKLRGATNAILQL---SPSDRARGVIAASTGNHGRALSYAAKAVGSRATICMSDLVPENKVSEIRKLGATVRIVGSSQDDAQVEVERLVAEEGLSMIPPFDHPHIIAGQRTVGLEIVEAMPDVAMVLLPLSGGGLAAGVAAAVKALRPHARIIGVTMDRGAAMKASIEAGHPV-QVKEYRSLADSLGGGIGmwTFQMCRALLDDVVLVNEGEIAAGIRHAYEHERQILEGAGAVGIAALLSG---KVAARGGSVGVVLSGQNIDMGLHREVINGVVRATEE----    
 5 thdh_yeast      30.6%     ---------------RSSVYDVINESPISQGVGLSSRLNTNVILKREDLLPVFSFKLRGAYNMIAKL---DDSQRNQGVIACSAGNHAQGVAFAAKHLKIPATIVMPVCTPSIKYQNVSRLGSQVVLYGNDFDEAKAECAKLAEERGLTNIPPFDHPYVIAGQGTVAMEILRQVrkIGAVFVPVGGGGLIAGIGAYLKRVAPHIKIIGVETYDAATLHNSLQRNQRTP-LPVVGTFADGTSvmIGEETFRVAQQVVDEVVLVNTDEICAAVKDIFEDTRSIVEPSGALSVAGMKK-YISTVHPEinTYVPILSGANMNFDRLRFVSERAVLGEGKEVFM    
 6 thdh_arxad      30.6%     ---------------TSKVYDVCNETPVTPAVNLSSKLGANIFLKREDLQPVFSFKLRGAYNMMAHLP---QETRWKGVIACSAGNHAQGVAYSAKHLNIPATIVMPVVTPAIKYKNVDRLGAKVVLHGNDFDAAKAECNRLSEKHGLTNIPLFDNPYVIAGQGTIGVELLRQIdsLKAIFVCIGGGGLIAGVGAYIKRIAPQVKIIGVETYDANAMRQSLQKGERI-TLSEVGLFADGAAviLGEETFRLCQQVVDEIVLVSTDEICAAIKDVFTETRSIVEPAGALSVAGLVkeSHPEIDHSASGYTAILSGANMDFDRLRFVSERAKLGEGSEVFI    
 7 thd1_haein      30.1%     -------SQSDYINAIVKLGSRVyvTPLQKMGKLSERLHNNIWIKREDRQPVNSFKLRGAYAMISSL---SAEQKAAGVIAASAGNHAQGVALSAKQLGLKALIVMPQNTPSIKVDAVRGFGGEVLLHGANFDEAKAKAIELSKEKNMTFIPPFDHPLVIAGQGTLAMEMLQQVADLDYVFVQVGGGGLAAGVAILLKQFMPEIKIIGVESKDSACLKAALDKGEPT-DLTHIGLFADGVAvrIGDETFRLCQQYLDDMVLVDSDEVCAAMKDLFENVRAVAEPSGALGLAGLKKYVKQNHI-EGKNMAAILSGANLNFHTLRYVSERCEIGENREALL    
 8 thd1_salty      29.8%     MAESQPLSVAPEGAEYlpVYEAAQVTPLQKMEKLSSRLDNVILVKREDRQPVHSFKLRGAYAMMTGL---TEEQKAHGVITASAGNHAQGVAFSSARLGVKSLIVMPKATADIKVDAVRGLGGEVLLHGANFDEAKAKAIELAQQQGFTWVPPFDHPMVIAGQGTLALELLQQDSHLDRVFVPVGGGGLAAGVAVLIKQLMPQIKVIAVEAEDSA-CLKAALEAGHPVDLPRVGLFAEGVAvrIGDETFRLCQEYLDDIITVDSDAICAAMKDLFEDVRAVAEPSGALALAGMKKYIAQH-NIRGERLAHVLSGANVNFHGLRYVSEreQREGLLTVTI    
 9 thd1_lyces      29.6%     -----------VDILASPVYDVAIESPLELAEKLSDRLGVNFYIKREDKQRVFSFKLRGAYNMMSNL---SREELDKGVITASAGNHAQGVALAGQRLNCVAKIVMPTTTPQIKIDAVRALGGDVVLYGKTFDEAQTHALELSEKDGLKYIPPFDDPGVIKGQGTIGTEINRQLKDIHAVFIPVGGGGLIAGVATFFKQIAPNTKIIGVEPYGAASMTLSLHEGHRV-KLSNVDTFADGVAvlVGEYTFAKCQELIDGMVLVANDGISAAIKDVYDEGRNILETSGAVAIAGAAA-YCEFYKIKNENIVAIASGANMDFSKLHKVTELakEALLATFMV    
10 thd1_ecoli      29.2%     MADSQPLSGAPEGAEYlpVYEAAQVTPLQKMEKLSSRLDNVILVKREDRQPVHSFKLRGAYAMMAGL---TEEQKAHGVITASAGNHAQGVAFSSARLGVKALIVMPTATADIKVDAVRGFGGEVLLHGANFDEAKAKAIELSQQQGFTWVPPFDHPMVIAGQGTLALELLQQDAHLDRVFVPVGGGGLAAGVAVLIKQLMPQIKVIAVEAEDSA-CLKAALDAGHPVDLPRVGLFAEGVAvrIGDETFRLCQEYLDDIITVDSDAICAAMKDLFEDVRAVAEPSGALALAGMKK-YIALHNIRGERLAHILSGANVNFHGLRYVSEreQREALLAVTI    
11 thd1_burce      29.3%     ---------------TARVYDVAFETELEPARNLSARLRNPVYLKREDNQPVFSFKLRGAYNKMAHIP---ADALARGVITASAGNHAQGVAFSAARMGVKAVIVVPVTTPQVKVDAVRAHGGPGVEVIQAGESYSDaaLKVQEERGLTFVHPFDDPYVIAGQGTIAMEILRQHqpIHAIFVPIGGGGLAAGVAAYVKAVRPEIKVIGVQAEDSCAMAQSLQAGKRV-ELAEVGLFADGTAvlVGEETFRLCKEYLDGVVTVDTDALCAAIKDVFQDTRSVLEPSGALAVAGAKL-YAEREGIENQTLVAVTSGANMNFDRMRFVAERAEVGEARE---    
12 thd1_myctu      28.6%     --PLFSLSGADIDRAAKRIAPVVTPTPLQPSDRLSAITGATVYLKREDLQTVRSYKLRGAYNLLVQL---SDEELAAGVVCSSAGNHAQGFAYACRCLGVHGRVYVPAKTPKQKRDRIRYHGGEFIDLIVGGSTYDLAAAAALEDVErtLVPPFDDLRTIAGQGTIAVEVLGQLeePDLVVVPVGGGGCIAGITTYLAERTTNTAVLGVEPAGAAAMMAALAAGEPVTLDHVDQFVDGAAVNRAGTLTYAALAAAGDMVstVDEGAVCTAMLDLYQNEGIIAEPAGALSVAGLLEADIEPGST----VVCLISGGNNDVSRYGEVLE------------    
13 thd1_bacsu      28.4%     LKENSLIQVKHILKAHQNVKDVVIHTPLQRNDRLSERYECNIYLKREDLQVVRSFKLRGAYHKMKQL---SSEQTENGVVCASAGNHAQGVAFSCKHLGIHGKIFMPSTTPRQKVSQVELFGKgiILTGDTFDDVYKSAAECCEAESRTFIHPFDDPDVMAGQGTLAVEILNDIdePHFLFASVGGGGLLSGVGTYLKNVSPDTKVIAVEPAGAASYFESNKAGHVV-TLDKIDKFVDGAAvkIGEETFRTLETVVDDILLVPEGKVCTSILELYNECAVVAEPAGALSVAALDLYKDQIKG---KNVVCVVSGGNNDIGRMQEMKE------------    
14 thd1_lacla      28.2%     --------LSNKYQANIYLKEVVTKTPLQLDPYLSNKYQANIYLKEENLQKVRSFKLRGAYYSISKL---SDEQRSKGVVCASAGNHAQGVAFAANQLNISATIFMPVTTPNQKISQVKFFGESHVtiGDTFDESARAAKAFSQDNDKPFIDPFDDENVIAGQGTVALEIFAQAksLDKIFVQIGGGGLIAGITAYSKERYPQTEIIGVEAKGATSMKAAYSAGQPV-TLEHIDKFADGIAvtVGQKTYQLINDKVKQLLAVDEGLISQTILELYSKLGIVAEPAGATSVAALELIKDEIKG---KNIVCIISGGNNDISRMQEIEE------------    
15 thd1_soltu      28.9%     --------------------------------------------------------------------------------------------------------------------------------------------------------PFDAPGVIKGQGTIGTEINRQLKDIHAVFVPVGGGGLISGVAAYFTQVAPHTKIIGVEPYGAASMTLSLYEGHRV-KLENVDTFADGVAvlVGEYTFAKCQELIDGMVLVRNDGISAAIKDVYDEGRNILETSGAVAIAGAAA-YCEFYNIKNENIVAIASGANMDFSKLHKVTELAELGSDNEALL    
16 sdhl_rat        27.1%     -------------------QESLhkTPLRDSMALSKVAGTSVFLKMDSSQPSGSFKIRGIGHLCkaLLPDTPSPL-------TAGNAGMATAYAARRLGLPATIVVPSTTPALTIERLKNEGATVEVVGEMLDEAIQLAKALEKNNPgvYISPFDDPLIWEGHTSLVKELKETLskPGAIVLSVGGGGLLCGVVQGLREvwEDVPIIAMETFGAHS-FHAAVKEGKLVTLPKITSVAKALgnTVGAQTLKLFYEHPIFSEVISDQEAVTAIEKFVDDEKILVEPACGAALAAVYSGvgRLQTPLASLVVIVCGGSNISLAQLQAL--------------    
17 sdhl_human      27.2%     -------------------------TPIRDSMALSKMAGTSVYLKMDSAQPSGSFKIRGIGHFCKRWA----KQGCAHFVCSSAGNAGMAAAYAARQLGVPATIVVPGTTPALTIERLKNEGAtkVVGELLDEAFELAKALAKNNPGWVYIPPFDDPLIWEGHASIVKELKETLwkPGAIALSVGGGGLLCGVVQGLQegWGDVPVIAMETFGAHSFHAATTAGKLV-SLPKITSVAKALGvtVGSQALKLFQEHPIFSEVISDQEAVAAIEKFVDDEKILVEPAWGAALAAVYSHVIQKLQLepSLVVIVCGGSNISLAQLRALKE------------    
18 thd1_corgl      26.5%     ------IRAADIQTAQARISSVIAPTPLQYCPRLSEETGAEIYLKREDLQDVRSYKIRGALNSG---AQSPQEQRDAGIVAASAGNHAQGVAYVCKSLGVQGRIYVPVQTPKQKRDRIMVHGGEFVSLVVTGNNFDEASAAAHEDAErtLIEPFDARNTVIGQGTVAAEILSQLtsADHVMVPVGGGGLLAGVVSYMADMAPRTAIVGIEPAGAAS-MQAALHNGGPITLETVDPFVDGAEvrVGDLNYTIVEKNQGRVHMMSATEGAVCTEmlYQNEGIIAEPAGALSIAGLKEMSFAPGSV----VVCIISGGNNDVLRYAEIAE------------    
   consensus/100%            ..........................................................................................................................................................pt..hh.Gpto.shEh.tt....t.lhh.luGGGhhsG.s.h.tth..th.lhuhp..tst....u..ttt........t.hstuh.s..G..sh.hh.t......hh.tt.h..sh..hhtp.t.lhEsshshuhAuh..............h..lhuGtN.sh..ht.h..............    
   consensus/90%             .........................o.h.....hst.ht..h.hK.-....s.uaKhRGhh..h........t.h.......osGNtu.uhshsst..t..uhlhhs..sst.ph.th...ut.h..h....pt.......h.t...h..l.PaDt..hhtGpsolshEl.tp....thlhl.lGGGGhhsGls.hhtth.sph.lhuhps.sut..h.uh.tst....h...t.hscuhtshhG..shthhtt......hlsptth..sh..hhpp.t.lhEsssuhuhAuh..............hs.lhuGuN.shttht.l..............    
   consensus/80%             ..................l.t.h..TPl.....Lophhts.lhlKhEshQ.shSFKlRGAhthh.tl.  ..pph.tullstSuGNHupuhuhust.lsl.uhIhhP.tsPt.Khttlp.hGuphl.h....pph...s.th.pptthhhl.PFDcP.lltGQGTluhElhpph.p.thlhlslGGGGLhuGlshhhpphhPphtllulEs.susshhtuh.tut.s.pl..ht.hA-uhsshlGp.TathhpphhcthhhVspstlssuh.tlhpc.t.lhEsuuululAuhh....t.ht...p.lshlhSGuN.shtphp.l.p............    
   consensus/70%             ..................l.phh..TPlp.s.tLSphhtsslalKtEshQ.stSFKlRGAhshhttL.  stcptstGVlssSAGNHAQulAauuppLsl.uhIshP.sTPp.KhptlpthGuphlhhstsh-phpttshthtpppshshlsPFDcPhVIAGQGTluhElhppltplctlhVslGGGGLlAGlushl+plhPph+lIuVEs.susshhtuh.tGphs.pLtplshhADGsustlGp.TaplhpphlDtllhVspstlssuhcclapc.+.lsEPuGAlulAuhht.hhphhs...pplshllSGuNhshsphp.ltE............    


PHD information about accuracy


****************************************************************************
*                                                                          *
*      Prediction of:			                                   *
*	- secondary structure,   		by PHDsec		   *
*	- solvent accessibility, 		by PHDacc		   *
*                                                                          *
*      PHD: Profile fed neural network systems from HeiDelberg             *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~             *
*                                                                          *
*      Author:             Burkhard Rost		                   *
*                          EMBL, Heidelberg, FRG                           *
*                          Meyerhofstrasse 1, 69 117 Heidelberg            *
*                          Internet: Predict-Help@EMBL-Heidelberg.DE       *
*                                                                          *
*      All rights reserved.                                                *
*                                                                          *
****************************************************************************
*                                                                          *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~	                   *
*      Secondary structure prediction by PHDsec:                           *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~	                   *
*                                                                          *
*      Author:             Burkhard Rost		                   *
*                          EMBL, Heidelberg, FRG                           *
*                          Meyerhofstrasse 1, 69 117 Heidelberg            *
*                          Internet: Rost@EMBL-Heidelberg.DE 		   *
*                                                                          *
*      All rights reserved.                                                *
*                                                                          *
*                                                                          *
****************************************************************************
*                                                                          *
*  About the network method                                                *
*  ~~~~~~~~~~~~~~~~~~~~~~~~                                                *
*                                                                          *
*  The network procedure is described in detail in:                        *
*  1) Rost, Burkhard; Sander, Chris:                                       *
*     Prediction of protein structure at better than 70% accuracy.         *
*     J. Mol. Biol., 1993, 232, 584-599.        	                   *
*                                                                          *
*  A brief description is given in:                                        *
*     Rost, Burkhard; Sander, Chris:                                       *
*     Improved prediction of protein secondary structure by use of se-     *
*     quence profiles and neural networks.                                 *
*     Proc. Natl. Acad. Sci. U.S.A., 1993, 90, 7558-7562.   		   *
*                                                                          *
*  The PHD mail server is described in:                                    *
*  2) Rost, Burkhard; Sander, Chris; Schneider, Reinhard:                  *
*     PHD - an automatic mail server for protein secondary structure       *
*     prediction.                                                          *
*     CABIOS, 1994, 10, 53-60.                                             *
*                                                                          *
*  The latest improvement steps (up to 72%) are explained in:              *
*  3) Rost, Burkhard; Sander, Chris:                                       *
*     Combining evolutionary information and neural networks to predict    *
*     protein secondary structure.                                         *
*     Proteins, 1994,  19, 55-72.                                          *
*                                                                          *
*  To be quoted for publications of PHD output:                            *
*     Papers 1-3 for the prediction of secondary structure and the pre-    *
*     diction server.                                                      *
*                                                                          *
****************************************************************************
*                                                                          *
*  About the input to the network                                          *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                         *
*                                                                          *
*  The prediction is performed by a system of neural networks.             *
*  The input is a multiple sequence alignment. It is taken from an HSSP    *
*  file (produced by the program MaxHom:                                   *
*     Sander, Chris & Schneider, Reinhard: Database of Homology-Derived    *
*     Structures and the Structural Meaning of Sequence Alignment.         *
*     Proteins, 1991, 9, 56-68.                                            *
*                                                                          *
*  For optimal results the alignment should contain sequences with varying *
*  degrees of sequence similarity relative to the input protein.           *
*  The following is an ideal situation:                                    *
*                                                                          *
*  +-----------------+----------------------+                              *
*  |   sequence:     |  sequence identity   |                              *
*  +-----------------+----------------------+                              *
*  | target sequence |  100 %               |                              *
*  | aligned seq. 1  |   90 %               |                              *
*  | aligned seq. 2  |   80 %               |                              *
*  |      ...        |   ...                |                              *
*  | aligned seq. 7  |   30 %               |                              *
*  +-----------------+----------------------+                              *
*                                                                          *
****************************************************************************
*                                                                          *
*  Estimated Accuracy of Prediction                                        *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                       *
*                                                                          *
*  A careful cross validation test on some 250 protein chains (in total    *
*  about 55,000 residues) with less than 25% pairwise sequence identity    *
*  gave the following results:                                             *
*                                                                          *
*  ++================++-----------------------------------------+          *
*  || Qtotal = 72.1% ||      ("overall three state accuracy")   |          *
*  ++================++-----------------------------------------+          *
*                                                                          *
*  +----------------------------+-----------------------------+            *
*  | Qhelix (% of observed)=70% | Qhelix (% of predicted)=77% |            *
*  | Qstrand(% of observed)=62% | Qstrand(% of predicted)=64% |            *
*  | Qloop  (% of observed)=79% | Qloop  (% of predicted)=72% |            *
*  +----------------------------+-----------------------------+            *
*..........................................................................*
*                                                                          *
*  These percentages are defined by:                                       *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                       *
*                                                                          *
*  |                    number of correctly predicted residues             *
*  |Qtotal =            ---------------------------------------      (*100)*
*  |                          number of all residues                       *
*  |                                                                       *
*  |                    no of res correctly predicted to be in helix       *
*  |Qhelix (% of obs) = -------------------------------------------- (*100)*
*  |                    no of all res observed to be in helix              *
*  |                                                                       *
*  |                                                                       *
*  |                    no of res correctly predicted to be in helix       *
*  |Qhelix (% of pred)= -------------------------------------------- (*100)*
*  |                    no of all residues predicted to be in helix        *
*                                                                          *
*..........................................................................*
*                                                                          *
*  Averaging over single chains                                            *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                            *
*                                                                          *
*  The most reasonable way to compute the overall accuracies is the above  *
*  quoted percentage of correctly predicted residues.  However, since the  *
*  user is mainly interested in the expected performance of the prediction *
*  for a particular protein, the mean value when averaging over protein    *
*  chains might be of help as well.  Computing first the three state       *
*  accuracy for each protein chain, and then averaging over 250 chains     *
*  yields the following average:                                           *
*                                                                          *
*  +-------------------------------====--+                                 *
*  | Qtotal/averaged over chains = 72.2% |                                 *
*  +-------------------------------====--+                                 *
*  | standard deviation          =  9.3% |                                 *
*  +-------------------------------------+                                 *
*                                                                          *
*..........................................................................*
*                                                                          *
*  Further measures of performance                                         *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                         *
*                                                                          *
*  Matthews correlation coefficient:                                       *
*                                                                          *
*  +---------------------------------------------+                         *
*  | Chelix = 0.63, Cstrand = 0.53, Cloop = 0.52 |                         *
*  +---------------------------------------------+                         *
*..........................................................................*
*                                                                          *
*  Average length of predicted secondary structure segments:               *
*                                                                          *
*  .           +------------+----------+                                   *
*  .           |  predicted | observed |                                   *
*  +-----------+------------+----------+                                   *
*  | Lhelix  = |    10.3    |    9.3   |                                   *
*  | Lstrand = |     5.0    |    5.3   |                                   *
*  | Lloop   = |     7.2    |    5.9   |                                   *
*  +-----------+------------+----------+                                   *
*..........................................................................*
*                                                                          *
*  The accuracy matrix in detail:                                          *
*                                                                          *
*  +---------------------------------------+                               *
*  |    number of residues with H, E, L    |                               *
*  +---------+------+------+------+--------+                               *
*  |         |net H |net E |net L |sum obs |                               *
*  +---------+------+------+------+--------+                               *
*  | obs H   |12447 | 1255 | 3990 |  17692 |                               *
*  | obs E   |  949 | 7493 | 3750 |  12192 |                               *
*  | obs L   | 2604 | 2875 |19962 |  25441 |                               *
*  +---------+------+------+------+--------+                               *
*  | sum Net |16000 |11623 |27702 |  55325 |                               *
*  +---------+------+------+------+--------+                               *
*                                                                          *
*  Note: This table is to be read in the following manner:                 *
*        12447 of all residues predicted to be in helix, were observed to  *
*        be in helix, 949 however belong to observed strands, 2604 to      *
*        observed loop regions.  The term "observed" refers to the DSSP    *
*        assignment of secondary structure calculated from 3D coordinates  *
*        of experimentally determined structures (Dictionary of Secondary  *
*        Structure  of Proteins: Kabsch & Sander (1983) Biopolymers, 22,   *
*        2577-2637).                                                       *
*                                                                          *
****************************************************************************
*                                                                          *
*  Position-specific reliability index                                     *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                     *
*                                                                          *
*  The network predicts the three secondary structure types using real     *
*  numbers from the output units. The prediction is assigned by choosing   *
*  the maximal unit ("winner takes all").  However, the real numbers       *
*  contain additional information.                                         *
*  E.g. the difference between the maximal and the second largest output   *
*  unit can be used to derive a "reliability index".  This index is given  *
*  for each residue along with the prediction.  The index is scaled to     *
*  have values between 0 (lowest reliability), and 9 (highest).            *
*  The accuracies (Qtot) to be expected for residues with values above a   *
*  particular value of the index are given below as well as the fraction   *
*  of such residues (%res).:                                               *
*                                                                          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*  | index|  0  |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |    *
*  | %res |100.0| 99.2| 90.4| 80.9| 71.6| 62.5| 52.8| 42.3| 29.8| 14.1|    *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*  |      |     |     |     |     |     |     |     |     |     |     |    *
*  | Qtot | 72.1| 72.3| 74.8| 77.7| 80.3| 82.9| 85.7| 88.5| 91.1| 94.2|    *
*  |      |     |     |     |     |     |     |     |     |     |     |    *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*  | H%obs| 70.4| 70.6| 73.7| 77.1| 80.1| 83.1| 86.0| 89.3| 92.5| 96.4|    *
*  | E%obs| 61.5| 61.7| 63.7| 66.6| 69.1| 71.7| 74.6| 77.0| 77.8| 68.1|    *
*  |      |     |     |     |     |     |     |     |     |     |     |    *
*  | H%prd| 77.8| 78.0| 80.0| 82.6| 84.7| 86.9| 89.2| 91.3| 93.1| 95.4|    *
*  | E%prd| 64.5| 64.7| 67.8| 71.0| 74.2| 77.6| 81.4| 85.1| 89.8| 93.5|    *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*                                                                          *
*  The above table gives the cumulative results, e.g. 62.5% of all         *
*  residues have a reliability of at least 5.  The overall three-state     *
*  accuracy for this subset of almost two thirds of all residues is 82.9%. *
*  For this subset, e.g., 83.1% of the observed helices are correctly      *
*  predicted, and 86.9% of all residues predicted to be in helix are       *
*  correct.                                                                *
*                                                                          *
*..........................................................................*
*                                                                          *
*  The following table gives the non-cumulative quantities, i.e. the       *
*  values per reliability index range.  These numbers answer the question: *
*  how reliable is the prediction for all residues labeled with the        *
*  particular index i.                                                     *
*                                                                          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+          *
*  | index|  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |          *
*  | %res |  8.8|  9.5|  9.3|  9.1|  9.7| 10.5| 12.5| 15.7| 14.1|          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+          *
*  |      |     |     |     |     |     |     |     |     |     |          *
*  | Qtot | 46.6| 50.6| 57.7| 62.6| 67.9| 74.2| 82.2| 88.3| 94.2|          *
*  |      |     |     |     |     |     |     |     |     |     |          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+          *
*  | H%obs| 36.8| 42.3| 49.5| 55.2| 61.7| 69.9| 78.8| 87.4| 96.4|          *
*  | E%obs| 44.7| 44.5| 52.1| 55.4| 60.9| 68.0| 75.9| 81.0| 68.1|          *
*  |      |     |     |     |     |     |     |     |     |     |          *
*  | H%prd| 49.9| 52.5| 60.3| 64.2| 69.2| 77.5| 85.4| 89.9| 95.4|          *
*  | E%prd| 41.7| 47.1| 53.6| 57.0| 64.0| 71.6| 78.8| 88.8| 93.5|          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+          *
*                                                                          *
*  For example, for residues with Relindex = 5 64% of all predicted betha- *
*  strand residues are correctly identified.                               *
*                                                                          *
*                                                                          *
****************************************************************************
*                                                                          *
*                                                                          *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~		           *
*      Solvent accessibility prediction by PHDacc:                         *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~		           *
*                                                                          *
*      Author:             Burkhard Rost		                   *
*                          EMBL, Heidelberg, FRG                           *
*                          Meyerhofstrasse 1, 69 117 Heidelberg            *
*                          Internet: Rost@EMBL-Heidelberg.DE 		   *
*                                                                          *
*      All rights reserved.                                                *
*                                                                          *
*                                                                          *
****************************************************************************
*                                                                          *
*  About the network method                                                *
*  ~~~~~~~~~~~~~~~~~~~~~~~~                                                *
*                                                                          *
*  The network for prediction of secondary structure is described in       *
*  detail in:                                                              *
*     Rost, Burkhard; Sander, Chris:                                       *
*     Prediction of protein structure at better than 70% accuracy.         *
*     J. Mol. Biol., 1993, 232, 584-599.                                   *
*                                                                          *
*  The analysis of the prediction of solvent exposure is given in:         *
*     Rost, Burkhard; Sander, Chris:                                       *
*     Conservation and prediction of solvent accessibility in protein      *
*     families.  Proteins, 1994, 20, 216-226.                              *
*                                                                          *
*  To be quoted for publications of PHD exposure prediction:               *
*     Both papers quoted above.                                            *
*                                                                          *
****************************************************************************
*                                                                          *
*  Definition of accessibility                                             *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~                                             *
*                                                                          *
*  For training the residue solvent accessibility the DSSP (Dictionary of  *
*  Secondary Structure of Proteins; Kabsch & Sander (1983) Biopolymers, 22,*
*  2577-2637) values of accessible surface area have been used.  The       *
*  prediction provides values for the relative solvent accessibility.  The *
*  normalisation is the following:                                         *
*                                                                          *
*  |                           ACCESSIBILITY (from DSSP in Angstrom)       *
*  |RELATIVE_ACCESSIBILITY =   ------------------------------------- * 100 *
*  |                               MAXIMAL_ACC (amino acid type i)         *
*                                                                          *
*  where MAXIMAL_ACC (i) is the maximal accessibility of amino acid type i.*
*  The maximal values are:                                                 *
*                                                                          *
*  +----+----+----+----+----+----+----+----+----+----+----+----+           *
*  |  A |  B |  C |  D |  E |  F |  G |  H |  I |  K |  L |  M |           *
*  | 106| 160| 135| 163| 194| 197|  84| 184| 169| 205| 164| 188|           *
*  +----+----+----+----+----+----+----+----+----+----+----+----+           *
*  |  N |  P |  Q |  R |  S |  T |  V |  W |  X |  Y |  Z |                *
*  | 157| 136| 198| 248| 130| 142| 142| 227| 180| 222| 196|                *
*  +----+----+----+----+----+----+----+----+----+----+----+                *
*                                                                          *
*  Notation: one letter code for amino acid, B stands for D or N; Z stands *
*     for E or Q; and X stands for undetermined.                           *
*                                                                          *
*  The relative solvent accessibility can be used to estimate the number   *
*  of water molecules (W) in contact with the residue:                     *
*                                                                          *
*  W = ACCESSIBILITY /10                                                   *
*                                                                          *
*  The prediction is given in 10 states for relative accessibility, with   *
*                                                                          *
*  RELATIVE_ACCESSIBILITY = (PREDICTED_ACC * PREDICTED_ACC)                *
*                                                                          *
*  where PREDICTED_ACC = 0 - 9.                                            *
*                                                                          *
****************************************************************************
*                                                                          *
*  Estimated Accuracy of Prediction                                        *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                       *
*                                                                          *
*  A careful cross validation test on some 238 protein chains (in total    *
*  about 62,000 residues) with less than 25% pairwise sequence identity    *
*  gave the following results:                                             *
*                                                                          *
*                                                                          *
*  Correlation                                                             *
*  ...........                                                             *
*                                                                          *
*  The correlation between observed and predicted solvent accessibility    *
*  is:                                                                     *
*                                                                          *
*  -----------                                                             *
*  corr = 0.53                                                             *
*  -----------                                                             *
*                                                                          *
*  This value ought to be compared to the worst and best case prediction   *
*  scenario: random prediction (corr = 0.0) and homology modelling         *
*  (corr = 0.66).  (Note: homology modelling yields a relative accurate    *
*  prediction in 3D if, and only if, a significantly identical sequence    *
*  has a known 3D structure.)                                              *
*                                                                          *
*                                                                          *
*  3-state accuracy                                                        *
*  ................                                                        *
*                                                                          *
*  Often the relative accessibility is projected onto, e.g., 3 states:     *
*     b  = buried       (here defined as < 9% relative accessibility),     *
*     i  = intermediate ( 9% <= rel. acc. < 36% ),                         *
*     e  = exposed      ( rel. acc. >= 36% ).                              *
*                                                                          *
*  A projection onto 3 states or 2 states (buried/exposed) enables the     *
*  compilation of a 3- and 2-state prediction accuracy.  PHD reaches an    *
*  overall 3-state accuracy of:                                            *
*     Q3 = 57.5%                                                           *
*  (compared to 35% for random prediction and 70% for homology modelling). *
*                                                                          *
*  In detail:                                                              *
*                                                                          *
*  +-----------------------------------+-------------------------+         *
*  | Qburied       (% of observed)=77% | Qb (% of predicted)=60% |         *
*  | Qintermediate (% of observed)= 9% | Qi (% of predicted)=44% |         *
*  | Qexposed      (% of observed)=78% | Qe (% of predicted)=56% |         *
*  +-----------------------------------+-------------------------+         *
*                                                                          *
*                                                                          *
*  10-state accuracy                                                       *
*  .................                                                       *
*                                                                          *
*  The network predicts relative solvent accessibility in 10 states, with  *
*  state i (i = 0-9) corresponding to a relative solvent accessibility of  *
*  i*i %.  The 10-state accuracy of the network is:                        *
*                                                                          *
*     Q10 = 24.5%                                                          *
*                                                                          *
*..........................................................................*
*                                                                          *
*  These percentages are defined by:                                       *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                       *
*                                                                          *
*  |                     number of correctly predicted residues            *
*  |Q3 		      = ---------------------------------------      (*100)*
*  |                           number of all residues                      *
*  |                                                                       *
*  |                     no of res. correctly predicted to be buried       *
*  |Qburied (% of obs) = ------------------------------------------- (*100)*
*  |                     no of all res. observed to be buried              *
*  |                                                                       *
*  |                                                                       *
*  |                     no of res. correctly predicted to be buried       *
*  |Qburied (% of pred)= ------------------------------------------- (*100)*
*  |                     no of all residues predicted to be buried         *
*                                                                          *
*..........................................................................*
*                                                                          *
*  Averaging over single chains                                            *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                            *
*                                                                          *
*  The most reasonable way to compute the overall accuracies is the above  *
*  quoted percentage of correctly predicted residues.  However, since the  *
*  user is mainly interested in the expected performance of the prediction *
*  for a particular protein, the mean value when averaging over protein    *
*  chains might be of help as well.  Computing first the correlation       *
*  between observed and predicted accessibility for each protein chan, and *
*  then averaging over all 238 chains yields the following average:        *
*                                                                          *
*  +-------------------------------====--+                                 *
*  | corr/averaged over chains   = 0.53  |                                 *
*  +-------------------------------====--+                                 *
*  | standard deviation          = 0.11  |                                 *
*  +-------------------------------------+                                 *
*                                                                          *
*..........................................................................*
*                                                                          *
*  Further details of performance accuracy                                 *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                 *
*                                                                          *
*  The accuracy matrix in detail:                                          *
*  ..............................                                          *
*                                                                          *
* -------+----------------------------------------------------+----------- *
*  \ PHD |    0    1   2   3    4    5     6     7    8    9  |  SUM  %obs *
* -------+----------------------------------------------------+----------- *
* OBS  0 | 8611  140   8  44   82  169   772   334   27    0  | 10187 16.6 *
* OBS  1 | 4367  164   0  50  106  231   738   346   44    3  |  6049  9.8 *
* OBS  2 | 3194  168   1  68  125  303   951   513   42    7  |  5372  8.7 *
* OBS  3 | 2760  159   8  80  136  327  1246   746   58   19  |  5539  9.0 *
* OBS  4 | 2312  144   2  72  166  396  1615  1245  124   19  |  6095  9.9 *
* OBS  5 | 1873   96   3  84  138  425  1979  1834  187   27  |  6646 10.8 *
* OBS  6 | 1387   67   1  60   80  278  2237  2627  231   51  |  7019 11.4 *
* OBS  7 | 1082   35   0  32   56  225  1871  3107  302   60  |  6770 11.0 *
* OBS  8 |  660   25   0  27   43  136  1206  2374  325   87  |  4883  7.9 *
* OBS  9 |  325   20   2  27   29   74   648  1159  366  214  |  2864  4.7 *
* -------+----------------------------------------------------+----------- *
* SUM    |26571 1018  25 544  961 2564 13263 14285 1706  487  |            *
* %pred  | 43.3  1.7 0.0 0.9  1.6  4.2  21.6  23.3  2.8  0.8  |            *
* -------+----------------------------------------------------+----------- *
*                                                                          *
*  Note: This table is to be read in the following manner:                 *
*        8611 of all residues predicted to be in exposed by 0%, were       *
*        observed with 0% relative accessibility.  However, 325 of all     *
*        residues predicted to have 0% are observed as completely exposed  *
*        (obs = 9 -> rel. acc. >= 81%).  The term "observed" refers to the *
*        DSSP compilation of area of solvent accessibility calculated from *
*        3D coordinates of experimentally determined structures (Diction-  *
*        ary of Secondary Structure  of Proteins: Kabsch & Sander (1983)   *
*        Biopolymers, 22, 2577-2637).                                      *
*                                                                          *
*                                                                          *
*  Accuracy for each amino acid:                                           *
*  .............................                                           *
*                                                                          *
*  +---+------------------------------+-----+-------+------+               *
*  |AA |   Q3 b%o b%p i%o i%p e%o e%p | Q10 |  corr |    N |               *
*  +---+------------------------------+-----+-------+------+               *
*  | A | 59.0  87  60   2  38  66  57 |  31 | 0.530 | 5054 |               *
*  | C | 62.0  91  67   5  39  25  21 |  34 | 0.244 |  893 |               *
*  | D | 56.5  21  45   6  49  94  57 |  20 | 0.321 | 3536 |               *
*  | E | 60.8   9  40   3  41  98  61 |  21 | 0.347 | 3743 |               *
*  | F | 63.3  94  67   9  46  29  37 |  27 | 0.366 | 2436 |               *
*  | G | 52.1  75  51   1  31  67  53 |  22 | 0.405 | 4787 |               *
*  | H | 50.9  63  53  23  45  71  50 |  18 | 0.442 | 1366 |               *
*  | I | 64.9  95  68   6  41  30  38 |  34 | 0.360 | 3437 |               *
*  | K | 66.6   2  11   2  37  98  67 |  23 | 0.267 | 3652 |               *
*  | L | 61.6  93  65   8  44  31  40 |  31 | 0.368 | 5016 |               *
*  | M | 60.1  92  64   5  39  45  44 |  29 | 0.452 | 1371 |               *
*  | N | 55.5  45  45   8  38  87  59 |  17 | 0.410 | 2923 |               *
*  | P | 53.0  48  48   9  39  83  56 |  18 | 0.364 | 2920 |               *
*  | Q | 54.3  27  44   7  44  92  56 |  20 | 0.344 | 2225 |               *
*  | R | 49.9  15  47  36  47  76  51 |  18 | 0.372 | 2765 |               *
*  | S | 55.6  69  53   3  51  81  56 |  22 | 0.464 | 3981 |               *
*  | T | 51.8  61  51   8  38  78  53 |  21 | 0.432 | 3740 |               *
*  | V | 61.1  93  65   5  40  39  42 |  34 | 0.418 | 4156 |               *
*  | W | 56.2  85  62  20  49  29  27 |  21 | 0.318 |  891 |               *
*  | Y | 49.7  73  52  33  49  36  38 |  19 | 0.359 | 2301 |               *
*  +---+------------------------------+-----+-------+------+               *
*                                                                          *
*  Abbreviations:                                                          *
*                                                                          *
*  AA:   amino acid in one-letter code                                     *
*  b%o, i%o, e%o:   = Qburied, Qintermediate, Qexposed (% of observed),    *
*        i.e. percentage of correct prediction in each state, see above    *
*  b%p, i%p, e%p:   = Qburied, Qintermediate, Qexposed (% of predicted),   *
*        i.e. probability of correct prediction in each state, see above   *
*  b%o:  = Qburied (% of observed), see above                              *
*  Q10:  percentage of correctly predicted residues in each of the 10      *
*        states of predicted relative accessibility.                       *
*  corr: correlation between predicted and observed rel. acc.              *
*  N:    number of residues in data set                                    *
*                                                                          *
*                                                                          *
*  Accuracy for different secondary structure:                             *
*  ...........................................                             *
*                                                                          *
*  +--------+------------------------------+----+-------+-------+          *
*  | type   |   Q3 b%o b%p i%o i%p e%o e%p |Q10 |  corr |     N |          *
*  +--------+------------------------------+----+-------+-------+          *
*  | helix  | 59.5  79  64   8  44  80  56 | 27 | 0.574 | 20100 |          *
*  | strand | 61.3  84  73   9  46  69  37 | 35 | 0.524 | 13356 |          *
*  | loop   | 54.4  64  43  11  44  78  61 | 18 | 0.442 | 27968 |          *
*  +--------+------------------------------+----+-------+-------+          *
*                                                                          *
*  Abbreviations as before.                                                *
*                                                                          *
****************************************************************************
*                                                                          *
*  Position-specific reliability index                                     *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                     *
*                                                                          *
*  The network predicts the 10 states for relative accessibility using real*
*  numbers from the output units. The prediction is assigned by choosing   *
*  the maximal unit ("winner takes all").  However, the real numbers       *
*  contain additional information.                                         *
*  E.g. the difference between the maximal and the second largest output   *
*  unit (with the constraint that the second largest output is compiled    *
*  among all units at least 2 positions off the maximal unit) can be used  *
*  to derive a "reliability index".  This index is given for each residue  *
*  along with the prediction.  The index is scaled to have values between  *
*  0 (lowest reliability), and 9 (highest).                                *
*  The accuracies (Q3, corr, asf.) to be expected for residues with values *
*  above a particular value of the index are given below as well as the    *
*  fraction of such residues (%res).:                                      *
*                                                                          *
*  +---+------------------------------+----+-------+-------+               *
*  |RI |   Q3 b%o b%p i%o i%p e%o e%p |Q10 |  corr |  %res |               *
*  +---+------------------------------+----+-------+-------+               *
*  | 0 | 57.5  77  60   9  44  78  56 | 24 | 0.535 | 100.0 |               *
*  | 1 | 59.1  76  63   9  45  82  57 | 25 | 0.560 |  91.2 |               *
*  | 2 | 61.7  79  66   4  47  87  58 | 27 | 0.594 |  77.1 |               *
*  | 3 | 66.6  87  70   1  51  89  63 | 30 | 0.650 |  57.1 |               *
*  | 4 | 70.0  89  72   0  83  91  67 | 32 | 0.686 |  45.8 |               *
*  | 5 | 72.9  92  75   0   0  93  70 | 34 | 0.722 |  35.6 |               *
*  | 6 | 76.3  95  77   0   0  93  75 | 36 | 0.769 |  24.7 |               *
*  | 7 | 79.0  97  79   0   0  93  78 | 39 | 0.803 |  16.0 |               *
*  | 8 | 80.9  98  80   0   0  91  81 | 43 | 0.824 |   9.6 |               *
*  | 9 | 81.2  99  80   0   0  88  83 | 45 | 0.828 |   5.9 |               *
*  +---+------------------------------+----+-------+-------+               *
*                                                                          *
*  Abbreviations as before.                                                *
*                                                                          *
*  The above table gives the cumulative results, e.g. 45.8% of all         *
*  residues have a reliability of at least 4.  The correlation for this    *
*  most reliably predicted half of the residues is 0.686, i.e. a value     *
*  comparable to what could be expected if homology modelling were         *
*  possible.  For this subset of 45.8% of all residues, 89% of the buried  *
*  residues are correctly predicted, and 72% of all residues predicted to  *
*  be buried are correct.                                                  *
*                                                                          *
*..........................................................................*
*                                                                          *
*  The following table gives the non-cumulative quantities, i.e. the       *
*  values per reliability index range.  These numbers answer the question: *
*  how reliable is the prediction for all residues labeled with the        *
*  particular index i.                                                     *
*                                                                          *
*  +---+------------------------------+----+-------+-------+               *
*  |RI |   Q3 b%o b%p i%o i%p e%o e%p |Q10 |  corr |  %res |               *
*  +---+------------------------------+----+-------+-------+               *
*  | 0 | 40.9  79  40  16  41  21  40 | 14 | 0.175 |   8.8 |               *
*  | 1 | 45.4  61  46  28  44  48  44 | 17 | 0.278 |  14.1 |               *
*  | 2 | 47.4  53  52  10  46  80  44 | 19 | 0.343 |  19.9 |               *
*  | 3 | 52.9  75  59   4  50  77  47 | 23 | 0.439 |  11.4 |               *
*  | 4 | 60.0  81  63   0  83  84  56 | 25 | 0.547 |  10.1 |               *
*  | 5 | 65.2  82  70   0   0  93  62 | 28 | 0.607 |  10.9 |               *
*  | 6 | 71.3  90  72   0   0  94  70 | 31 | 0.692 |   8.8 |               *
*  | 7 | 76.0  94  76   0   0  95  75 | 34 | 0.762 |   6.3 |               *
*  | 8 | 80.5  97  81   0   0  94  79 | 39 | 0.808 |   3.8 |               *
*  | 9 | 81.2  99  80   0   0  88  83 | 45 | 0.828 |   5.9 |               *
*  +---+------------------------------+----+-------+-------+               *
*                                                                          *
*  For example, for residues with RI = 4 83% of all predicted intermediate *
*  residues are correctly predicted as such.                               *
*                                                                          *
*                                                                          *
****************************************************************************


PHD predictions


TOP - BOTTOM - PHD

PHD predictions for predict_h8378

Different levels of data:
  1. PHD brief
  2. PHD normal







AA : amino acid sequence
PHD_sec: PHD predicted secondary structure: H=helix, E=extended (sheet), blank=other (loop)
PHD = PHD: Profile network prediction HeiDelberg
Rel_sec: reliability index for PHDsec prediction (0=low to 9=high)
Note: for the brief presentation strong predictions marked by '*'
SUB_sec: subset of the PHDsec prediction, for all residues with an expected average accuracy > 82% (tables in header)
NOTE: for this subset the following symbols are used:
L: is loop (for which above ' ' is used)
.: means that no prediction is made for this residue, as the reliability is: Rel < 5
pH_sec: 'probability' for assigning helix (1=high, 0=low)
pE_sec: 'probability' for assigning strand (1=high, 0=low)
pL_sec: 'probability' for assigning neither helix, nor strand (1=high, 0=low)
P_3_acc: PHD predicted relative solvent accessibility (acc) in 3 states: b = 0-9%, i = 9-36%, e = 36-100%.
Rel_acc: reliability index for PHDacc prediction (0=low to 9=high)
Note: for the brief presentation strong predictions marked by '*'
SUB_acc: subset of the PHDacc prediction, for all residues with an expected average correlation > 0.69 (tables in header)
NOTE: for this subset the following symbols are used:
I: is intermediate (for which above ' ' is used)
.: means that no prediction is made for this residue, as the reliability is: Rel < 4
PHD_acc: PHD predicted relative solvent accessibility (acc) in 10 states: a value of n (=0-9) corresponds to a relative acc. of between n*n % and (n+1)*(n+1) % (e.g. for n=5: 16-25%).




PHD results (brief)

....,....1....,....2....,....3....,....4....,....5....,....6....,....7....,....8....,....9....,....10...,....11...,....12...,....13...,....14...,....15...,....16...,....17...,....18...,....19...,....20...,....21...,....22...,....23...,....24...,....25...,....26...,....27...,....28...,....29...,....30...,....31...,....32...,....33...,....34 AA MCAQYCISFADVEKAHINIQDSIHLTPVLTSSILNQIAGRNLFFKCELFQKTGSFKIRGALNAIRGLIPDTPEEKPKAVVTHSSGNHGQALTYAAKLEGIPAYIVVPQTAPNCKKLAIQAYGASIVYCDPSDESREKVTQRIMQETEGILVHPNQEPAVIAGQGTIALEVLNQVPLVDALVVPVGGGGMVAGIAITIKALKPSVKVYAAEPSNADDCYQSKLKGELTPNLHPPETIADGVKSSIGLNTWPIIRDLVDDVFTVTEDEIKYATQLVWGRMKLLIEPTAGVALAAVLSQHFQTVSPEVKNVCIVLSGGNVDLTSLNWVGQAERPAPYQTVSV PHD_sec HHHHHHHHHHHHHH HHHHH EEEEEE EE HHHHHHHH HHH EEEEE HHHHHHHHHHHH EEEEE HHHHHHHHHHH EEEEE HHHHHHHHHHHHH EEEE EE HHHHHHHHHH EEEEE HHHHHHHHHHHHHH EEEEE HHHHHHHHH E EE E HHHHHHHHHH EEEE HHHHHHHHHHHHH EEEE HHHHHHHHHHHH EEEEEE HHHHHHHHHHHHH E Rel_sec ***** * *********** **** **** *** **** ****** ************ ******* * *********** ** **** *** ********* ** **** *** *********** ****** ***** ******** ******** * ************* *** **** ************ *** ** * ********* ***** * ********** * ** * ********** * ***** ***** *** ************ *** * P_3_acc eeeebebbbeebeebeeebeebbeebbbeebeebbeebebebbbeeeeeeeb bbebbbbbbbbeeb eeeeeeeeebbbbbbbbbbbbbbbbbbeebbbebbbbbbee bebebebbee bbebbbbbee eebeeebeebeeeee ebbbbbe bbbbbbbbbbbbebbeebeebebbbbbbbbbbbbbbbbbbbeeb eebebbbb bebbbbbebbbeeeeebbebeebebbbebbbbebbeebbebbeebbeebbbbbeeebbbbbeebbeebebbbebbbbbbbbbbeeeeeeeeeeeeeebbbbbbbbbbebbebeebeeebeeeeeeebbb Rel_acc * * * * *** ** * ** * ******* * ****** * * *** * ** * ** * ** * ***** ******** * * **** * * ** ** * * ** * ** * ** ******* ****** * * *


PHD results (normal)

....,....1....,....2....,....3....,....4....,....5....,....6....,....7....,....8....,....9....,....10...,....11...,....12...,....13...,....14...,....15...,....16...,....17...,....18...,....19...,....20...,....21...,....22...,....23...,....24...,....25...,....26...,....27...,....28...,....29...,....30...,....31...,....32...,....33...,....34 AA MCAQYCISFADVEKAHINIQDSIHLTPVLTSSILNQIAGRNLFFKCELFQKTGSFKIRGALNAIRGLIPDTPEEKPKAVVTHSSGNHGQALTYAAKLEGIPAYIVVPQTAPNCKKLAIQAYGASIVYCDPSDESREKVTQRIMQETEGILVHPNQEPAVIAGQGTIALEVLNQVPLVDALVVPVGGGGMVAGIAITIKALKPSVKVYAAEPSNADDCYQSKLKGELTPNLHPPETIADGVKSSIGLNTWPIIRDLVDDVFTVTEDEIKYATQLVWGRMKLLIEPTAGVALAAVLSQHFQTVSPEVKNVCIVLSGGNVDLTSLNWVGQAERPAPYQTVSV PHD_sec HHHHHHHHHHHHHH HHHHH EEEEEE EE HHHHHHHH HHH EEEEE HHHHHHHHHHHH EEEEE HHHHHHHHHHH EEEEE HHHHHHHHHHHHH EEEE EE HHHHHHHHHH EEEEE HHHHHHHHHHHHHH EEEEE HHHHHHHHH E EE E HHHHHHHHHH EEEE HHHHHHHHHHHHH EEEE HHHHHHHHHHHH EEEEEE HHHHHHHHHHHHH E Rel_sec 999854453169999987876212488764168763169549985139886563131315999985599973422686599725327899999999718813999538993787999998199189982595499999999996489655517999741103334599999983378859996484599999999999938984599832577589999998179922266421111453111325899999993495598235379999999994247375163167999999963253466788499998189834999999999985299942139 SUB_sec LLLLL..L..HHHHHHHHHHH....LLLL..HHHH..LLL.EEEE..LLLLLL......HHHHHHHLLLLL....LLLEEEE.L..HHHHHHHHHHH.LL..EEEE.LLL.HHHHHHHHH.LL.EEEE.LLL.HHHHHHHHHHH.LLLEEE.LLLLL........HHHHHHHH..LLLEEEEE.L.HHHHHHHHHHHHH.LLL.EEEE..LLLHHHHHHHHH.LLL...LL.......L......HHHHHHHHH..LLEEE..L.HHHHHHHHHH...L.EE.L..HHHHHHHHHH..L..LLLLL.EEEEE.LLL..HHHHHHHHHHHH.LLL....L P_3_acc eeeebebbbeebeebeeebeebbeebbbeebeebbeebebebbbeeeeeeeb bbebbbbbbbbeeb eeeeeeeeebbbbbbbbbbbbbbbbbbeebbbebbbbbbee bebebebbee bbebbbbbee eebeeebeebeeeee ebbbbbe bbbbbbbbbbbbebbeebeebebbbbbbbbbbbbbbbbbbbeeb eebebbbb bebbbbbebbbeeeeebbebeebebbbebbbbebbeebbebbeebbeebbbbbeeebbbbbeebbeebebbbebbbbbbbbbbeeeeeeeeeeeeeebbbbbbbbbbebbebeebeeebeeeeeeebbb Rel_acc 122101202001125112711150211021012533100117460001011117616057002502302202321034689757101608665771040318194721000301011712031155212110210211312203310002601001106625030582082212221028894633245687844361211115177770210201310502211111121120035123221551155226120110534601106126711512101064123146879651111011010302069896432020401402512003103211201 SUB_acc ..............b...b...b..........b.......bbb.........bb.b.bb...b.............bbbbbbb...b.bbbbbb..b...b.bbb...........b......bb........................b.......bb.b...bb..b.........bbbbb...bbbbbbbb.b......b.bbbb..........b................b......bb..bb..b......b.bb....b..bb..b......bb....bbbbbbb..............bbbbbb.....b..b..b..............



GLOBE prediction of globularity


--- 
--- GLOBE: prediction of protein globularity
--- 
--- nexp =   147    (number of predicted exposed residues)
--- nfit =   139    (number of expected exposed residues
--- diff =     8.00 (difference nexp-nfit)
--- =====> your protein appears as compact, as a globular domain
--- 
--- 
--- GLOBE: further explanations preliminaryily in:
---        http://www.columbia.edu/~rost/Papers/98globe.html
--- 
--- END of GLOBE


END of results for file predict_h8378





Quotes for methods

  1. PredictProtein: PredicProtein: B Rost (1996) Methods in Enzymology, 266:525-539
  2. PROSITE: A Bairoch, P Bucher & K Hofmann (1997) Nucleic Acids Research, 25:217-221
  3. SEG: J C Wootton & S Federhen (1996) Methods in Enzymology, 266:554-571
  4. ProDom: ELL Sonnhammer & D Kahn (1994) Protein Science, 3:482-492
  5. MaxHom: MaxHom: C Sander R Schneider (1991) Proteins, 9:56-68
  6. MView: MView: N P Brown, C Leroy & C Sander (1998) Bioinformatics, 14:380-381
  7. PHD: B Rost (1996) Methods in Enzymology, 266:525-539
  8. PHDsec: B Rost & C Sander (1993) J. of Molecular Biology, 232:584-599
  9. PHDacc: B Rost & C Sander (1994) Proteins, 20:216-226
  10. GLOBE: B Rost (1998) unpublished




Links: TOP PredictProtein What is new? Burkhard Rost