Results from PredictProtein for predict_h8378

TOC for file /home/phd/server/work/predict_h8378

The following information has been received by the server (TOC)
PROSITE motif search (A Bairoch; P Bucher and K Hofmann) (TOC)
SEG low-complexity regions (J C Wootton & S Federhen) (TOC)
ProDom domain search (E Sonnhammer; Corpet, Gouzy, D Kahn) (TOC)
MAXHOM alignment header (TOC)
MAXHOM alignment (TOC)
PHD information about accuracy (TOC)
PHD predictions (TOC)
GLOBE prediction of globularity (TOC)

END of TOC

BEG of results for file /home/phd/server/work/predict_h8378

The following information has been received by the server

reference predict_h8378 (Jun 19, 2000 20:55:42)
reference pred_h8378 (Jun 19, 2000 20:56:08)
PPhdr from: kapilm@cs.brandeis.edu
PPhdr resp: MAIL
PPhdr orig: HTML
PPhdr want: HTML
PPhdr password(###)
prediction of: - default prediction of: - PHDsec PHDacc PHDhtm ProSite SEG ProDom
return msf format
ret html
# default: single protein sequence description=Serine Racemase
MCAQYCISFADVEKAHINIQDSIHLTPVLTSSILNQIAGRNLFFKCELFQKTGSFKIRGALNAIRGLIPDTPEEKPKAVVTHSSGNHGQALTYAAKLEGIPAYIVVPQTAPNCKKLAIQAYGASIVYCDPSDESREKVTQRIMQETEGILVHPNQEPAVIAGQGTIALEVLNQVPLVDALVVPVGGGGMVAGIAITIKALKPSVKVYAAEPSNADDCYQSKLKGELTPNLHPPETIADGVKSSIGLNTWPIIRDLVDDVFTVTEDEIKYATQLVWGRMKLLIEPTAGVALAAVLSQHFQTVSPEVKNVCIVLSGGNVDLTSLNWVGQAERPAPYQTVSV

PROSITE motif search (A Bairoch; P Bucher and K Hofmann)

TOP - BOTTOM - PROSITE

-------------------------------------------------------------
Pattern-ID: PKC_PHOSPHO_SITE PS00005 PDOC00005
Pattern-DE: Protein kinase C phosphorylation site
Pattern:    [ST].[RK]
   54       SFK
   139      TQR
   196      TIK
   203      SVK

Pattern-ID: CK2_PHOSPHO_SITE PS00006 PDOC00006
Pattern-DE: Casein kinase II phosphorylation site
Pattern:    [ST].{2}[DE]
   8        SFAD
   71       TPEE
   212      SNAD
   235      TIAD
   261      TVTE

Pattern-ID: MYRISTYL PS00008 PDOC00008
Pattern-DE: N-myristoylation site
Pattern:    G[^EDRKHPFYW].{2}[STAGCN][^P]
   59       GALNAI
   88       GQALTY
   187      GGMVAG
   239      GVKSSI
   287      GVALAA

Pattern-ID: DEHYDRATASE_SER_THR PS00165 PDOC00149
Pattern-DE: Serine/threonine dehydratases pyridoxal-phosphate attachment site
Pattern:    [DESH].{4,5}[STVG].[AS][FYI]K[DLIFSA][RVMF][GA][LIVMGA]
   47       ELFQKTGSFKIRGA

SEG low-complexity regions (J C Wootton & S Federhen)

TOP - BOTTOM - SEG

>prot (#) ppOld, default: single protein sequence description=serine racemase /home/phd/server/work/predict_h8378
MCAQYCISFADVEKAHINIQDSIHLTPVLTSSILNQIAGRNLFFKCELFQKTGSFKIRGA LNAIRGLIPDTPEEKPKAVVTHSSGNHGQALTYAAKLEGIPAYIVVPQTAPNCKKLAIQA YGASIVYCDPSDESREKVTQRIMQETEGILVHPNQEPAVIAGQGTIALEVLNQVPLVDAL xxxxxxxxxxxx IAITIKALKPSVKVYAAEPSNADDCYQSKLKGELTPNLHPPETIADGV KSSIGLNTWPIIRDLVDDVFTVTEDEIKYATQLVWGRMKLLIEPTAGVALAAVLSQHFQT VSPEVKNVCIVLSGGNVDLTSLNWVGQAERPAPYQTVSV

ProDom domain search (E Sonnhammer; Corpet, Gouzy, D Kahn)

TOP - BOTTOM - ProDom - MView

Identities computed with respect to: (query) prot
Colored by: consensus/70% and property

HSP processing: ranked

                                                                           52 [       .         .         .         .         1         .         .         .         .         :         .         .         .         .         2         .         .         .         .         :         .         .         .         . ] 292
  prot           (#) ppOld, default: single ... score      P(N)  N 100.0%     TGSFKIRGALNAIXXXXXXXPEEKPKAVVTHSSGNHGQALTYAAKLEGIPAYIVVPQTXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXVDALVVPVGGGGMVAGIAITIKXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXDVFTVTEDEIKYATQLVWGRMKLLIEPTAGVALAA    
1 PD000323       p99.2 (175) TRPB(29) CYSK(1...    56     0.019  4  35.2%     SGSYKDRGAYSMI-------PGKKSVIVESTSGNTGAVALAMVAARLGLKCVIVMPES-------------------------------------------------------------------VDVIVASVGTGGTIAGVARYLK-----------------------------------------------------------EAVSVSDEEALEAGLLLGESEGIVPEPASAAAIAA    
  consensus/100%                                                              oGSaK.RGAhshI       PtcKshhV.opSusstu.ALshsAth.Gl.shIVhPpo                                                                   VDslVssVGsGGhlAGlAhhlK                                                           -shoVo--Ehh.As.Llhtp.tll.EPsuusAlAA    
  consensus/90%                                                               oGSaK.RGAhshI       PtcKshhV.opSusstu.ALshsAth.Gl.shIVhPpo                                                                   VDslVssVGsGGhlAGlAhhlK                                                           -shoVo--Ehh.As.Llhtp.tll.EPsuusAlAA    
  consensus/80%                                                               oGSaK.RGAhshI       PtcKshhV.opSusstu.ALshsAth.Gl.shIVhPpo                                                                   VDslVssVGsGGhlAGlAhhlK                                                           -shoVo--Ehh.As.Llhtp.tll.EPsuusAlAA    
  consensus/70%                                                               oGSaK.RGAhshI       PtcKshhV.opSusstu.ALshsAth.Gl.shIVhPpo                                                                   VDslVssVGsGGhlAGlAhhlK                                                           -shoVo--Ehh.As.Llhtp.tll.EPsuusAlAA

--- ------------------------------------------------------------
--- 
--- Again: these results were obtained based on the domain data-
--- base collected by Daniel Kahn and his coworkers in Toulouse.
--- 
--- PLEASE quote: 
---       F Corpet, J Gouzy, D Kahn (1998).  The ProDom database
---       of protein domain families. Nucleic Ac Res 26:323-326.
--- 
--- The general WWW page is on:
----      ---------------------------------------
---       http://www.toulouse.inra.fr/prodom.html
----      ---------------------------------------
--- 
--- For WWW graphic interfaces to PRODOM, in particular for your
--- protein family, follow the following links (each line is ONE
--- single link for your protein!!):
--- 
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=PD000323 ==> multiple alignment, consensus, PDB and PROSITE links of domain PD000323
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=PD000323 ==> graphical output of all proteins having domain PD000323
--- 
--- NOTE: if you want to use the link, make sure the entire line
---       is pasted as URL into your browser!
--- 
--- END of PRODOM
--- ------------------------------------------------------------

MAXHOM alignment header

--- ------------------------------------------------------------
--- MAXHOM multiple sequence alignment
--- ------------------------------------------------------------
--- 
--- MAXHOM ALIGNMENT HEADER: ABBREVIATIONS FOR SUMMARY
--- ID           : identifier of aligned (homologous) protein
--- STRID        : PDB identifier (only for known structures)
--- PIDE         : percentage of pairwise sequence identity
--- WSIM         : percentage of weighted similarity
--- LALI         : number of residues aligned
--- NGAP         : number of insertions and deletions (indels)
--- LGAP         : number of residues in all indels
--- LSEQ2        : length of aligned sequence
--- ACCNUM       : SwissProt accession number
--- NAME         : one-line description of aligned protein
--- 
--- MAXHOM ALIGNMENT HEADER: SUMMARY
ID         STRID  IDE WSIM LALI NGAP LGAP LEN2 ACCNUM NAME                     
ykv8_yeast         40   52  319    4    8  326 P36007 HYPOTHETICAL 34.9 KD PROT
thd2_ecoli         35   47  320    4    6  329 P05792 THREONINE DEHYDRATASE CAT
y4tj_rhisn         32   44  328    4   10  332 P55664 PUTATIVE THREONINE DEHYDR
thdh_yeast         31   42  319    6   13  576 P00927 THREONINE DEHYDRATASE PRE
thdh_arxad         31   43  320    5    9  550 O42615 THREONINE DEHYDRATASE PRE
thd1_haein         31   44  327    5   10  513 P46493 DEAMINASE).              
thd1_salty         31   43  334    6   16  514 P20506 DEAMINASE).              
thd1_lyces         30   41  323    5   11  595 P25306 DEAMINASE).              
thd1_ecoli 1TDJ    30   42  334    6   16  514 P04968 DEAMINASE).              
thd1_burce         30   41  316    6   10  507 P53607 DEAMINASE).              
thd1_myctu         30   37  318    5   13  429 Q10766 DEAMINASE).              
thd1_bacsu         28   39  320    6   12  422 P37946 DEAMINASE).              
thd1_lacla         28   39  312    6   15  441 Q02145 DEAMINASE).              
thd1_soltu         28   41  185    3    3  359 P31212 (FRAGMENT).              
sdhl_rat           28   28  298    9   59  362 P09367 DEHYDRATASE (EC 4.2.1.16)
sdhl_human         28   30  297    7   16  328 P20132 L-SERINE DEHYDRATASE (EC 
thd1_corgl         27   33  313    7   18  436 Q04513 DEAMINASE).              
--- 
--- MAXHOM ALIGNMENT: IN MSF FORMAT

--- ------------------------------------------------------------
--- 3D homologue: the known structure that appeared to have sig-
--- 3D homologue: nificant sequence identity to your protein is:
--- 3D homologue: 1TDJ, 
.
--- 3D homologue: Note: we do  NOT  check whether the similarity
--- 3D homologue:       is in the region for which structure has
--- 3D homologue:       been determined.  Thus, please verify!  
--- ------------------------------------------------------------

--- 
--- Version of database searched for alignment:
--- SWISS-PROT release 38.0 (7/99) with 80000 proteins
---

MAXHOM alignment

TOP - BOTTOM - MaxHom - MView

Identities computed with respect to: (1) predict_h8370
Colored by: consensus/70% and property


                           1 [        .         .         .         .         :         .         .         .         .         1         .         .         .         .         :         .         .         .         .         2         .         .         .         .         :         .         .         .         .         3         .         .         .        ] 339
 1 predict_h8370  100.0%     MCAQYCISFADVEKAHINIQDSIHLTPVLTSSILNQIAGRNLFFKCELFQKTGSFKIRGALNAIRGLIPDTPEEKPKAVVTHSSGNHGQALTYAAKLEGIPAYIVVPQTAPNCKKLAIQAYGASIVYCDPSDESREKVTQRIMQETEGILVHPNQEPAVIAGQGTIALEVLNQVPLVDALVVPVGGGGMVAGIAITIKALKPSVKVYAAEPSNADDCYQSKLKGELTPNLHPPETIADGVKSSIGLNTWPIIRDLVDDVFTVTEDEIKYATQLVWGRMKLLIEPTAGVALAAVLSQHFQTVSPEVKNVCIVLSGGNVDLTSLNWVGQAERPAPYQTVSV    
 2 ykv8_yeast      39.6%     -------TYGDVLDASNRIKEYVNKTPVLTSRMLNDRLGAQIYFKGENFQRVGAFKFRGAMNAVSKL---SDEKRSKGVIAFSSGNHAQAIALSAKLLNVPATIVMPEDAPALKVAATAGYGAHIIRYNRYTEDREQIGRQLAAEHGFALIPPYDHPDVIAGQGTSAKELLEEVGQLDALFVPLGGGGLLSGSALAARSLSPGCKIFGVEPEAGNDGQQSFRSGSIV-HINTPKTIADGAQthLGEYTFAIIRENVDDILTVSDQELVKCMHFLAERMKVVVEPTACLGFAGALLKKEELVG---KKVGIILSGGNVDMKRYATLISGKEDGP------    
 3 thd2_ecoli      33.8%     ITYDLPVAIDDIIEAKQRLAGRIYKTGMPRSNYFSERCKGEIFLKFENMQRTGSFKIRGAFNKLSSL---TDAEKRKGVVACSAGNHAQGVSLSCAMLGIDGKVVMPKGAPKSKVAATCDYSAEVVLHGDNFNDTIAKVSEIVEMEGRIFIPPYDDPKVIAGQGTIGLEIMEDLYDVDNVIVPIGGGGLIAGIAVAIKSINPTIRVIGVQSENVHGMAASFHSGEITTHRT-TGTLADGCdsRPGNLTYEIVRELVDDIVLVSEDEIRNSMIALIQRNKVVTEGAGALACAALLSGKLDQYIQNRKTV-SIISGGNIDLSRVSQI--------------    
 4 y4tj_rhisn      31.0%     MNELSNLSLESIERARERIEEHVFRTPLTTSRSLTELTGTQVSLKLEHYQRTGSFKLRGATNAILQL---SPSDRARGVIAASTGNHGRALSYAAKAVGSRATICMSDLVPENKVSEIRKLGATVRIVGSSQDDAQVEVERLVAEEGLSMIPPFDHPHIIAGQRTVGLEIVEAMPDVAMVLLPLSGGGLAAGVAAAVKALRPHARIIGVTMDRGAAMKASIEAGHPV-QVKEYRSLADSLGGGIGmwTFQMCRALLDDVVLVNEGEIAAGIRHAYEHERQILEGAGAVGIAALLSG---KVAARGGSVGVVLSGQNIDMGLHREVINGVVRATEE----    
 5 thdh_yeast      30.6%     ---------------RSSVYDVINESPISQGVGLSSRLNTNVILKREDLLPVFSFKLRGAYNMIAKL---DDSQRNQGVIACSAGNHAQGVAFAAKHLKIPATIVMPVCTPSIKYQNVSRLGSQVVLYGNDFDEAKAECAKLAEERGLTNIPPFDHPYVIAGQGTVAMEILRQVrkIGAVFVPVGGGGLIAGIGAYLKRVAPHIKIIGVETYDAATLHNSLQRNQRTP-LPVVGTFADGTSvmIGEETFRVAQQVVDEVVLVNTDEICAAVKDIFEDTRSIVEPSGALSVAGMKK-YISTVHPEinTYVPILSGANMNFDRLRFVSERAVLGEGKEVFM    
 6 thdh_arxad      30.6%     ---------------TSKVYDVCNETPVTPAVNLSSKLGANIFLKREDLQPVFSFKLRGAYNMMAHLP---QETRWKGVIACSAGNHAQGVAYSAKHLNIPATIVMPVVTPAIKYKNVDRLGAKVVLHGNDFDAAKAECNRLSEKHGLTNIPLFDNPYVIAGQGTIGVELLRQIdsLKAIFVCIGGGGLIAGVGAYIKRIAPQVKIIGVETYDANAMRQSLQKGERI-TLSEVGLFADGAAviLGEETFRLCQQVVDEIVLVSTDEICAAIKDVFTETRSIVEPAGALSVAGLVkeSHPEIDHSASGYTAILSGANMDFDRLRFVSERAKLGEGSEVFI    
 7 thd1_haein      30.1%     -------SQSDYINAIVKLGSRVyvTPLQKMGKLSERLHNNIWIKREDRQPVNSFKLRGAYAMISSL---SAEQKAAGVIAASAGNHAQGVALSAKQLGLKALIVMPQNTPSIKVDAVRGFGGEVLLHGANFDEAKAKAIELSKEKNMTFIPPFDHPLVIAGQGTLAMEMLQQVADLDYVFVQVGGGGLAAGVAILLKQFMPEIKIIGVESKDSACLKAALDKGEPT-DLTHIGLFADGVAvrIGDETFRLCQQYLDDMVLVDSDEVCAAMKDLFENVRAVAEPSGALGLAGLKKYVKQNHI-EGKNMAAILSGANLNFHTLRYVSERCEIGENREALL    
 8 thd1_salty      29.8%     MAESQPLSVAPEGAEYlpVYEAAQVTPLQKMEKLSSRLDNVILVKREDRQPVHSFKLRGAYAMMTGL---TEEQKAHGVITASAGNHAQGVAFSSARLGVKSLIVMPKATADIKVDAVRGLGGEVLLHGANFDEAKAKAIELAQQQGFTWVPPFDHPMVIAGQGTLALELLQQDSHLDRVFVPVGGGGLAAGVAVLIKQLMPQIKVIAVEAEDSA-CLKAALEAGHPVDLPRVGLFAEGVAvrIGDETFRLCQEYLDDIITVDSDAICAAMKDLFEDVRAVAEPSGALALAGMKKYIAQH-NIRGERLAHVLSGANVNFHGLRYVSEreQREGLLTVTI    
 9 thd1_lyces      29.6%     -----------VDILASPVYDVAIESPLELAEKLSDRLGVNFYIKREDKQRVFSFKLRGAYNMMSNL---SREELDKGVITASAGNHAQGVALAGQRLNCVAKIVMPTTTPQIKIDAVRALGGDVVLYGKTFDEAQTHALELSEKDGLKYIPPFDDPGVIKGQGTIGTEINRQLKDIHAVFIPVGGGGLIAGVATFFKQIAPNTKIIGVEPYGAASMTLSLHEGHRV-KLSNVDTFADGVAvlVGEYTFAKCQELIDGMVLVANDGISAAIKDVYDEGRNILETSGAVAIAGAAA-YCEFYKIKNENIVAIASGANMDFSKLHKVTELakEALLATFMV    
10 thd1_ecoli      29.2%     MADSQPLSGAPEGAEYlpVYEAAQVTPLQKMEKLSSRLDNVILVKREDRQPVHSFKLRGAYAMMAGL---TEEQKAHGVITASAGNHAQGVAFSSARLGVKALIVMPTATADIKVDAVRGFGGEVLLHGANFDEAKAKAIELSQQQGFTWVPPFDHPMVIAGQGTLALELLQQDAHLDRVFVPVGGGGLAAGVAVLIKQLMPQIKVIAVEAEDSA-CLKAALDAGHPVDLPRVGLFAEGVAvrIGDETFRLCQEYLDDIITVDSDAICAAMKDLFEDVRAVAEPSGALALAGMKK-YIALHNIRGERLAHILSGANVNFHGLRYVSEreQREALLAVTI    
11 thd1_burce      29.3%     ---------------TARVYDVAFETELEPARNLSARLRNPVYLKREDNQPVFSFKLRGAYNKMAHIP---ADALARGVITASAGNHAQGVAFSAARMGVKAVIVVPVTTPQVKVDAVRAHGGPGVEVIQAGESYSDaaLKVQEERGLTFVHPFDDPYVIAGQGTIAMEILRQHqpIHAIFVPIGGGGLAAGVAAYVKAVRPEIKVIGVQAEDSCAMAQSLQAGKRV-ELAEVGLFADGTAvlVGEETFRLCKEYLDGVVTVDTDALCAAIKDVFQDTRSVLEPSGALAVAGAKL-YAEREGIENQTLVAVTSGANMNFDRMRFVAERAEVGEARE---    
12 thd1_myctu      28.6%     --PLFSLSGADIDRAAKRIAPVVTPTPLQPSDRLSAITGATVYLKREDLQTVRSYKLRGAYNLLVQL---SDEELAAGVVCSSAGNHAQGFAYACRCLGVHGRVYVPAKTPKQKRDRIRYHGGEFIDLIVGGSTYDLAAAAALEDVErtLVPPFDDLRTIAGQGTIAVEVLGQLeePDLVVVPVGGGGCIAGITTYLAERTTNTAVLGVEPAGAAAMMAALAAGEPVTLDHVDQFVDGAAVNRAGTLTYAALAAAGDMVstVDEGAVCTAMLDLYQNEGIIAEPAGALSVAGLLEADIEPGST----VVCLISGGNNDVSRYGEVLE------------    
13 thd1_bacsu      28.4%     LKENSLIQVKHILKAHQNVKDVVIHTPLQRNDRLSERYECNIYLKREDLQVVRSFKLRGAYHKMKQL---SSEQTENGVVCASAGNHAQGVAFSCKHLGIHGKIFMPSTTPRQKVSQVELFGKgiILTGDTFDDVYKSAAECCEAESRTFIHPFDDPDVMAGQGTLAVEILNDIdePHFLFASVGGGGLLSGVGTYLKNVSPDTKVIAVEPAGAASYFESNKAGHVV-TLDKIDKFVDGAAvkIGEETFRTLETVVDDILLVPEGKVCTSILELYNECAVVAEPAGALSVAALDLYKDQIKG---KNVVCVVSGGNNDIGRMQEMKE------------    
14 thd1_lacla      28.2%     --------LSNKYQANIYLKEVVTKTPLQLDPYLSNKYQANIYLKEENLQKVRSFKLRGAYYSISKL---SDEQRSKGVVCASAGNHAQGVAFAANQLNISATIFMPVTTPNQKISQVKFFGESHVtiGDTFDESARAAKAFSQDNDKPFIDPFDDENVIAGQGTVALEIFAQAksLDKIFVQIGGGGLIAGITAYSKERYPQTEIIGVEAKGATSMKAAYSAGQPV-TLEHIDKFADGIAvtVGQKTYQLINDKVKQLLAVDEGLISQTILELYSKLGIVAEPAGATSVAALELIKDEIKG---KNIVCIISGGNNDISRMQEIEE------------    
15 thd1_soltu      28.9%     --------------------------------------------------------------------------------------------------------------------------------------------------------PFDAPGVIKGQGTIGTEINRQLKDIHAVFVPVGGGGLISGVAAYFTQVAPHTKIIGVEPYGAASMTLSLYEGHRV-KLENVDTFADGVAvlVGEYTFAKCQELIDGMVLVRNDGISAAIKDVYDEGRNILETSGAVAIAGAAA-YCEFYNIKNENIVAIASGANMDFSKLHKVTELAELGSDNEALL    
16 sdhl_rat        27.1%     -------------------QESLhkTPLRDSMALSKVAGTSVFLKMDSSQPSGSFKIRGIGHLCkaLLPDTPSPL-------TAGNAGMATAYAARRLGLPATIVVPSTTPALTIERLKNEGATVEVVGEMLDEAIQLAKALEKNNPgvYISPFDDPLIWEGHTSLVKELKETLskPGAIVLSVGGGGLLCGVVQGLREvwEDVPIIAMETFGAHS-FHAAVKEGKLVTLPKITSVAKALgnTVGAQTLKLFYEHPIFSEVISDQEAVTAIEKFVDDEKILVEPACGAALAAVYSGvgRLQTPLASLVVIVCGGSNISLAQLQAL--------------    
17 sdhl_human      27.2%     -------------------------TPIRDSMALSKMAGTSVYLKMDSAQPSGSFKIRGIGHFCKRWA----KQGCAHFVCSSAGNAGMAAAYAARQLGVPATIVVPGTTPALTIERLKNEGAtkVVGELLDEAFELAKALAKNNPGWVYIPPFDDPLIWEGHASIVKELKETLwkPGAIALSVGGGGLLCGVVQGLQegWGDVPVIAMETFGAHSFHAATTAGKLV-SLPKITSVAKALGvtVGSQALKLFQEHPIFSEVISDQEAVAAIEKFVDDEKILVEPAWGAALAAVYSHVIQKLQLepSLVVIVCGGSNISLAQLRALKE------------    
18 thd1_corgl      26.5%     ------IRAADIQTAQARISSVIAPTPLQYCPRLSEETGAEIYLKREDLQDVRSYKIRGALNSG---AQSPQEQRDAGIVAASAGNHAQGVAYVCKSLGVQGRIYVPVQTPKQKRDRIMVHGGEFVSLVVTGNNFDEASAAAHEDAErtLIEPFDARNTVIGQGTVAAEILSQLtsADHVMVPVGGGGLLAGVVSYMADMAPRTAIVGIEPAGAAS-MQAALHNGGPITLETVDPFVDGAEvrVGDLNYTIVEKNQGRVHMMSATEGAVCTEmlYQNEGIIAEPAGALSIAGLKEMSFAPGSV----VVCIISGGNNDVLRYAEIAE------------    
   consensus/100%            ..........................................................................................................................................................pt..hh.Gpto.shEh.tt....t.lhh.luGGGhhsG.s.h.tth..th.lhuhp..tst....u..ttt........t.hstuh.s..G..sh.hh.t......hh.tt.h..sh..hhtp.t.lhEsshshuhAuh..............h..lhuGtN.sh..ht.h..............    
   consensus/90%             .........................o.h.....hst.ht..h.hK.-....s.uaKhRGhh..h........t.h.......osGNtu.uhshsst..t..uhlhhs..sst.ph.th...ut.h..h....pt.......h.t...h..l.PaDt..hhtGpsolshEl.tp....thlhl.lGGGGhhsGls.hhtth.sph.lhuhps.sut..h.uh.tst....h...t.hscuhtshhG..shthhtt......hlsptth..sh..hhpp.t.lhEsssuhuhAuh..............hs.lhuGuN.shttht.l..............    
   consensus/80%             ..................l.t.h..TPl.....Lophhts.lhlKhEshQ.shSFKlRGAhthh.tl.  ..pph.tullstSuGNHupuhuhust.lsl.uhIhhP.tsPt.Khttlp.hGuphl.h....pph...s.th.pptthhhl.PFDcP.lltGQGTluhElhpph.p.thlhlslGGGGLhuGlshhhpphhPphtllulEs.susshhtuh.tut.s.pl..ht.hA-uhsshlGp.TathhpphhcthhhVspstlssuh.tlhpc.t.lhEsuuululAuhh....t.ht...p.lshlhSGuN.shtphp.l.p............    
   consensus/70%             ..................l.phh..TPlp.s.tLSphhtsslalKtEshQ.stSFKlRGAhshhttL.  stcptstGVlssSAGNHAQulAauuppLsl.uhIshP.sTPp.KhptlpthGuphlhhstsh-phpttshthtpppshshlsPFDcPhVIAGQGTluhElhppltplctlhVslGGGGLlAGlushl+plhPph+lIuVEs.susshhtuh.tGphs.pLtplshhADGsustlGp.TaplhpphlDtllhVspstlssuhcclapc.+.lsEPuGAlulAuhht.hhphhs...pplshllSGuNhshsphp.ltE............

PHD information about accuracy

****************************************************************************
*                                                                          *
*      Prediction of:			                                   *
*	- secondary structure,   		by PHDsec		   *
*	- solvent accessibility, 		by PHDacc		   *
*                                                                          *
*      PHD: Profile fed neural network systems from HeiDelberg             *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~             *
*                                                                          *
*      Author:             Burkhard Rost		                   *
*                          EMBL, Heidelberg, FRG                           *
*                          Meyerhofstrasse 1, 69 117 Heidelberg            *
*                          Internet: Predict-Help@EMBL-Heidelberg.DE       *
*                                                                          *
*      All rights reserved.                                                *
*                                                                          *
****************************************************************************
*                                                                          *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~	                   *
*      Secondary structure prediction by PHDsec:                           *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~	                   *
*                                                                          *
*      Author:             Burkhard Rost		                   *
*                          EMBL, Heidelberg, FRG                           *
*                          Meyerhofstrasse 1, 69 117 Heidelberg            *
*                          Internet: Rost@EMBL-Heidelberg.DE 		   *
*                                                                          *
*      All rights reserved.                                                *
*                                                                          *
*                                                                          *
****************************************************************************
*                                                                          *
*  About the network method                                                *
*  ~~~~~~~~~~~~~~~~~~~~~~~~                                                *
*                                                                          *
*  The network procedure is described in detail in:                        *
*  1) Rost, Burkhard; Sander, Chris:                                       *
*     Prediction of protein structure at better than 70% accuracy.         *
*     J. Mol. Biol., 1993, 232, 584-599.        	                   *
*                                                                          *
*  A brief description is given in:                                        *
*     Rost, Burkhard; Sander, Chris:                                       *
*     Improved prediction of protein secondary structure by use of se-     *
*     quence profiles and neural networks.                                 *
*     Proc. Natl. Acad. Sci. U.S.A., 1993, 90, 7558-7562.   		   *
*                                                                          *
*  The PHD mail server is described in:                                    *
*  2) Rost, Burkhard; Sander, Chris; Schneider, Reinhard:                  *
*     PHD - an automatic mail server for protein secondary structure       *
*     prediction.                                                          *
*     CABIOS, 1994, 10, 53-60.                                             *
*                                                                          *
*  The latest improvement steps (up to 72%) are explained in:              *
*  3) Rost, Burkhard; Sander, Chris:                                       *
*     Combining evolutionary information and neural networks to predict    *
*     protein secondary structure.                                         *
*     Proteins, 1994,  19, 55-72.                                          *
*                                                                          *
*  To be quoted for publications of PHD output:                            *
*     Papers 1-3 for the prediction of secondary structure and the pre-    *
*     diction server.                                                      *
*                                                                          *
****************************************************************************
*                                                                          *
*  About the input to the network                                          *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                         *
*                                                                          *
*  The prediction is performed by a system of neural networks.             *
*  The input is a multiple sequence alignment. It is taken from an HSSP    *
*  file (produced by the program MaxHom:                                   *
*     Sander, Chris & Schneider, Reinhard: Database of Homology-Derived    *
*     Structures and the Structural Meaning of Sequence Alignment.         *
*     Proteins, 1991, 9, 56-68.                                            *
*                                                                          *
*  For optimal results the alignment should contain sequences with varying *
*  degrees of sequence similarity relative to the input protein.           *
*  The following is an ideal situation:                                    *
*                                                                          *
*  +-----------------+----------------------+                              *
*  |   sequence:     |  sequence identity   |                              *
*  +-----------------+----------------------+                              *
*  | target sequence |  100 %               |                              *
*  | aligned seq. 1  |   90 %               |                              *
*  | aligned seq. 2  |   80 %               |                              *
*  |      ...        |   ...                |                              *
*  | aligned seq. 7  |   30 %               |                              *
*  +-----------------+----------------------+                              *
*                                                                          *
****************************************************************************
*                                                                          *
*  Estimated Accuracy of Prediction                                        *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                       *
*                                                                          *
*  A careful cross validation test on some 250 protein chains (in total    *
*  about 55,000 residues) with less than 25% pairwise sequence identity    *
*  gave the following results:                                             *
*                                                                          *
*  ++================++-----------------------------------------+          *
*  || Qtotal = 72.1% ||      ("overall three state accuracy")   |          *
*  ++================++-----------------------------------------+          *
*                                                                          *
*  +----------------------------+-----------------------------+            *
*  | Qhelix (% of observed)=70% | Qhelix (% of predicted)=77% |            *
*  | Qstrand(% of observed)=62% | Qstrand(% of predicted)=64% |            *
*  | Qloop  (% of observed)=79% | Qloop  (% of predicted)=72% |            *
*  +----------------------------+-----------------------------+            *
*..........................................................................*
*                                                                          *
*  These percentages are defined by:                                       *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                       *
*                                                                          *
*  |                    number of correctly predicted residues             *
*  |Qtotal =            ---------------------------------------      (*100)*
*  |                          number of all residues                       *
*  |                                                                       *
*  |                    no of res correctly predicted to be in helix       *
*  |Qhelix (% of obs) = -------------------------------------------- (*100)*
*  |                    no of all res observed to be in helix              *
*  |                                                                       *
*  |                                                                       *
*  |                    no of res correctly predicted to be in helix       *
*  |Qhelix (% of pred)= -------------------------------------------- (*100)*
*  |                    no of all residues predicted to be in helix        *
*                                                                          *
*..........................................................................*
*                                                                          *
*  Averaging over single chains                                            *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                            *
*                                                                          *
*  The most reasonable way to compute the overall accuracies is the above  *
*  quoted percentage of correctly predicted residues.  However, since the  *
*  user is mainly interested in the expected performance of the prediction *
*  for a particular protein, the mean value when averaging over protein    *
*  chains might be of help as well.  Computing first the three state       *
*  accuracy for each protein chain, and then averaging over 250 chains     *
*  yields the following average:                                           *
*                                                                          *
*  +-------------------------------====--+                                 *
*  | Qtotal/averaged over chains = 72.2% |                                 *
*  +-------------------------------====--+                                 *
*  | standard deviation          =  9.3% |                                 *
*  +-------------------------------------+                                 *
*                                                                          *
*..........................................................................*
*                                                                          *
*  Further measures of performance                                         *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                         *
*                                                                          *
*  Matthews correlation coefficient:                                       *
*                                                                          *
*  +---------------------------------------------+                         *
*  | Chelix = 0.63, Cstrand = 0.53, Cloop = 0.52 |                         *
*  +---------------------------------------------+                         *
*..........................................................................*
*                                                                          *
*  Average length of predicted secondary structure segments:               *
*                                                                          *
*  .           +------------+----------+                                   *
*  .           |  predicted | observed |                                   *
*  +-----------+------------+----------+                                   *
*  | Lhelix  = |    10.3    |    9.3   |                                   *
*  | Lstrand = |     5.0    |    5.3   |                                   *
*  | Lloop   = |     7.2    |    5.9   |                                   *
*  +-----------+------------+----------+                                   *
*..........................................................................*
*                                                                          *
*  The accuracy matrix in detail:                                          *
*                                                                          *
*  +---------------------------------------+                               *
*  |    number of residues with H, E, L    |                               *
*  +---------+------+------+------+--------+                               *
*  |         |net H |net E |net L |sum obs |                               *
*  +---------+------+------+------+--------+                               *
*  | obs H   |12447 | 1255 | 3990 |  17692 |                               *
*  | obs E   |  949 | 7493 | 3750 |  12192 |                               *
*  | obs L   | 2604 | 2875 |19962 |  25441 |                               *
*  +---------+------+------+------+--------+                               *
*  | sum Net |16000 |11623 |27702 |  55325 |                               *
*  +---------+------+------+------+--------+                               *
*                                                                          *
*  Note: This table is to be read in the following manner:                 *
*        12447 of all residues predicted to be in helix, were observed to  *
*        be in helix, 949 however belong to observed strands, 2604 to      *
*        observed loop regions.  The term "observed" refers to the DSSP    *
*        assignment of secondary structure calculated from 3D coordinates  *
*        of experimentally determined structures (Dictionary of Secondary  *
*        Structure  of Proteins: Kabsch & Sander (1983) Biopolymers, 22,   *
*        2577-2637).                                                       *
*                                                                          *
****************************************************************************
*                                                                          *
*  Position-specific reliability index                                     *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                     *
*                                                                          *
*  The network predicts the three secondary structure types using real     *
*  numbers from the output units. The prediction is assigned by choosing   *
*  the maximal unit ("winner takes all").  However, the real numbers       *
*  contain additional information.                                         *
*  E.g. the difference between the maximal and the second largest output   *
*  unit can be used to derive a "reliability index".  This index is given  *
*  for each residue along with the prediction.  The index is scaled to     *
*  have values between 0 (lowest reliability), and 9 (highest).            *
*  The accuracies (Qtot) to be expected for residues with values above a   *
*  particular value of the index are given below as well as the fraction   *
*  of such residues (%res).:                                               *
*                                                                          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*  | index|  0  |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |    *
*  | %res |100.0| 99.2| 90.4| 80.9| 71.6| 62.5| 52.8| 42.3| 29.8| 14.1|    *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*  |      |     |     |     |     |     |     |     |     |     |     |    *
*  | Qtot | 72.1| 72.3| 74.8| 77.7| 80.3| 82.9| 85.7| 88.5| 91.1| 94.2|    *
*  |      |     |     |     |     |     |     |     |     |     |     |    *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*  | H%obs| 70.4| 70.6| 73.7| 77.1| 80.1| 83.1| 86.0| 89.3| 92.5| 96.4|    *
*  | E%obs| 61.5| 61.7| 63.7| 66.6| 69.1| 71.7| 74.6| 77.0| 77.8| 68.1|    *
*  |      |     |     |     |     |     |     |     |     |     |     |    *
*  | H%prd| 77.8| 78.0| 80.0| 82.6| 84.7| 86.9| 89.2| 91.3| 93.1| 95.4|    *
*  | E%prd| 64.5| 64.7| 67.8| 71.0| 74.2| 77.6| 81.4| 85.1| 89.8| 93.5|    *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+    *
*                                                                          *
*  The above table gives the cumulative results, e.g. 62.5% of all         *
*  residues have a reliability of at least 5.  The overall three-state     *
*  accuracy for this subset of almost two thirds of all residues is 82.9%. *
*  For this subset, e.g., 83.1% of the observed helices are correctly      *
*  predicted, and 86.9% of all residues predicted to be in helix are       *
*  correct.                                                                *
*                                                                          *
*..........................................................................*
*                                                                          *
*  The following table gives the non-cumulative quantities, i.e. the       *
*  values per reliability index range.  These numbers answer the question: *
*  how reliable is the prediction for all residues labeled with the        *
*  particular index i.                                                     *
*                                                                          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+          *
*  | index|  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |          *
*  | %res |  8.8|  9.5|  9.3|  9.1|  9.7| 10.5| 12.5| 15.7| 14.1|          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+          *
*  |      |     |     |     |     |     |     |     |     |     |          *
*  | Qtot | 46.6| 50.6| 57.7| 62.6| 67.9| 74.2| 82.2| 88.3| 94.2|          *
*  |      |     |     |     |     |     |     |     |     |     |          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+          *
*  | H%obs| 36.8| 42.3| 49.5| 55.2| 61.7| 69.9| 78.8| 87.4| 96.4|          *
*  | E%obs| 44.7| 44.5| 52.1| 55.4| 60.9| 68.0| 75.9| 81.0| 68.1|          *
*  |      |     |     |     |     |     |     |     |     |     |          *
*  | H%prd| 49.9| 52.5| 60.3| 64.2| 69.2| 77.5| 85.4| 89.9| 95.4|          *
*  | E%prd| 41.7| 47.1| 53.6| 57.0| 64.0| 71.6| 78.8| 88.8| 93.5|          *
*  +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+          *
*                                                                          *
*  For example, for residues with Relindex = 5 64% of all predicted betha- *
*  strand residues are correctly identified.                               *
*                                                                          *
*                                                                          *
****************************************************************************
*                                                                          *
*                                                                          *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~		           *
*      Solvent accessibility prediction by PHDacc:                         *
*      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~		           *
*                                                                          *
*      Author:             Burkhard Rost		                   *
*                          EMBL, Heidelberg, FRG                           *
*                          Meyerhofstrasse 1, 69 117 Heidelberg            *
*                          Internet: Rost@EMBL-Heidelberg.DE 		   *
*                                                                          *
*      All rights reserved.                                                *
*                                                                          *
*                                                                          *
****************************************************************************
*                                                                          *
*  About the network method                                                *
*  ~~~~~~~~~~~~~~~~~~~~~~~~                                                *
*                                                                          *
*  The network for prediction of secondary structure is described in       *
*  detail in:                                                              *
*     Rost, Burkhard; Sander, Chris:                                       *
*     Prediction of protein structure at better than 70% accuracy.         *
*     J. Mol. Biol., 1993, 232, 584-599.                                   *
*                                                                          *
*  The analysis of the prediction of solvent exposure is given in:         *
*     Rost, Burkhard; Sander, Chris:                                       *
*     Conservation and prediction of solvent accessibility in protein      *
*     families.  Proteins, 1994, 20, 216-226.                              *
*                                                                          *
*  To be quoted for publications of PHD exposure prediction:               *
*     Both papers quoted above.                                            *
*                                                                          *
****************************************************************************
*                                                                          *
*  Definition of accessibility                                             *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~                                             *
*                                                                          *
*  For training the residue solvent accessibility the DSSP (Dictionary of  *
*  Secondary Structure of Proteins; Kabsch & Sander (1983) Biopolymers, 22,*
*  2577-2637) values of accessible surface area have been used.  The       *
*  prediction provides values for the relative solvent accessibility.  The *
*  normalisation is the following:                                         *
*                                                                          *
*  |                           ACCESSIBILITY (from DSSP in Angstrom)       *
*  |RELATIVE_ACCESSIBILITY =   ------------------------------------- * 100 *
*  |                               MAXIMAL_ACC (amino acid type i)         *
*                                                                          *
*  where MAXIMAL_ACC (i) is the maximal accessibility of amino acid type i.*
*  The maximal values are:                                                 *
*                                                                          *
*  +----+----+----+----+----+----+----+----+----+----+----+----+           *
*  |  A |  B |  C |  D |  E |  F |  G |  H |  I |  K |  L |  M |           *
*  | 106| 160| 135| 163| 194| 197|  84| 184| 169| 205| 164| 188|           *
*  +----+----+----+----+----+----+----+----+----+----+----+----+           *
*  |  N |  P |  Q |  R |  S |  T |  V |  W |  X |  Y |  Z |                *
*  | 157| 136| 198| 248| 130| 142| 142| 227| 180| 222| 196|                *
*  +----+----+----+----+----+----+----+----+----+----+----+                *
*                                                                          *
*  Notation: one letter code for amino acid, B stands for D or N; Z stands *
*     for E or Q; and X stands for undetermined.                           *
*                                                                          *
*  The relative solvent accessibility can be used to estimate the number   *
*  of water molecules (W) in contact with the residue:                     *
*                                                                          *
*  W = ACCESSIBILITY /10                                                   *
*                                                                          *
*  The prediction is given in 10 states for relative accessibility, with   *
*                                                                          *
*  RELATIVE_ACCESSIBILITY = (PREDICTED_ACC * PREDICTED_ACC)                *
*                                                                          *
*  where PREDICTED_ACC = 0 - 9.                                            *
*                                                                          *
****************************************************************************
*                                                                          *
*  Estimated Accuracy of Prediction                                        *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                       *
*                                                                          *
*  A careful cross validation test on some 238 protein chains (in total    *
*  about 62,000 residues) with less than 25% pairwise sequence identity    *
*  gave the following results:                                             *
*                                                                          *
*                                                                          *
*  Correlation                                                             *
*  ...........                                                             *
*                                                                          *
*  The correlation between observed and predicted solvent accessibility    *
*  is:                                                                     *
*                                                                          *
*  -----------                                                             *
*  corr = 0.53                                                             *
*  -----------                                                             *
*                                                                          *
*  This value ought to be compared to the worst and best case prediction   *
*  scenario: random prediction (corr = 0.0) and homology modelling         *
*  (corr = 0.66).  (Note: homology modelling yields a relative accurate    *
*  prediction in 3D if, and only if, a significantly identical sequence    *
*  has a known 3D structure.)                                              *
*                                                                          *
*                                                                          *
*  3-state accuracy                                                        *
*  ................                                                        *
*                                                                          *
*  Often the relative accessibility is projected onto, e.g., 3 states:     *
*     b  = buried       (here defined as < 9% relative accessibility),     *
*     i  = intermediate ( 9% <= rel. acc. < 36% ),                         *
*     e  = exposed      ( rel. acc. >= 36% ).                              *
*                                                                          *
*  A projection onto 3 states or 2 states (buried/exposed) enables the     *
*  compilation of a 3- and 2-state prediction accuracy.  PHD reaches an    *
*  overall 3-state accuracy of:                                            *
*     Q3 = 57.5%                                                           *
*  (compared to 35% for random prediction and 70% for homology modelling). *
*                                                                          *
*  In detail:                                                              *
*                                                                          *
*  +-----------------------------------+-------------------------+         *
*  | Qburied       (% of observed)=77% | Qb (% of predicted)=60% |         *
*  | Qintermediate (% of observed)= 9% | Qi (% of predicted)=44% |         *
*  | Qexposed      (% of observed)=78% | Qe (% of predicted)=56% |         *
*  +-----------------------------------+-------------------------+         *
*                                                                          *
*                                                                          *
*  10-state accuracy                                                       *
*  .................                                                       *
*                                                                          *
*  The network predicts relative solvent accessibility in 10 states, with  *
*  state i (i = 0-9) corresponding to a relative solvent accessibility of  *
*  i*i %.  The 10-state accuracy of the network is:                        *
*                                                                          *
*     Q10 = 24.5%                                                          *
*                                                                          *
*..........................................................................*
*                                                                          *
*  These percentages are defined by:                                       *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                       *
*                                                                          *
*  |                     number of correctly predicted residues            *
*  |Q3 		      = ---------------------------------------      (*100)*
*  |                           number of all residues                      *
*  |                                                                       *
*  |                     no of res. correctly predicted to be buried       *
*  |Qburied (% of obs) = ------------------------------------------- (*100)*
*  |                     no of all res. observed to be buried              *
*  |                                                                       *
*  |                                                                       *
*  |                     no of res. correctly predicted to be buried       *
*  |Qburied (% of pred)= ------------------------------------------- (*100)*
*  |                     no of all residues predicted to be buried         *
*                                                                          *
*..........................................................................*
*                                                                          *
*  Averaging over single chains                                            *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                            *
*                                                                          *
*  The most reasonable way to compute the overall accuracies is the above  *
*  quoted percentage of correctly predicted residues.  However, since the  *
*  user is mainly interested in the expected performance of the prediction *
*  for a particular protein, the mean value when averaging over protein    *
*  chains might be of help as well.  Computing first the correlation       *
*  between observed and predicted accessibility for each protein chan, and *
*  then averaging over all 238 chains yields the following average:        *
*                                                                          *
*  +-------------------------------====--+                                 *
*  | corr/averaged over chains   = 0.53  |                                 *
*  +-------------------------------====--+                                 *
*  | standard deviation          = 0.11  |                                 *
*  +-------------------------------------+                                 *
*                                                                          *
*..........................................................................*
*                                                                          *
*  Further details of performance accuracy                                 *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                 *
*                                                                          *
*  The accuracy matrix in detail:                                          *
*  ..............................                                          *
*                                                                          *
* -------+----------------------------------------------------+----------- *
*  \ PHD |    0    1   2   3    4    5     6     7    8    9  |  SUM  %obs *
* -------+----------------------------------------------------+----------- *
* OBS  0 | 8611  140   8  44   82  169   772   334   27    0  | 10187 16.6 *
* OBS  1 | 4367  164   0  50  106  231   738   346   44    3  |  6049  9.8 *
* OBS  2 | 3194  168   1  68  125  303   951   513   42    7  |  5372  8.7 *
* OBS  3 | 2760  159   8  80  136  327  1246   746   58   19  |  5539  9.0 *
* OBS  4 | 2312  144   2  72  166  396  1615  1245  124   19  |  6095  9.9 *
* OBS  5 | 1873   96   3  84  138  425  1979  1834  187   27  |  6646 10.8 *
* OBS  6 | 1387   67   1  60   80  278  2237  2627  231   51  |  7019 11.4 *
* OBS  7 | 1082   35   0  32   56  225  1871  3107  302   60  |  6770 11.0 *
* OBS  8 |  660   25   0  27   43  136  1206  2374  325   87  |  4883  7.9 *
* OBS  9 |  325   20   2  27   29   74   648  1159  366  214  |  2864  4.7 *
* -------+----------------------------------------------------+----------- *
* SUM    |26571 1018  25 544  961 2564 13263 14285 1706  487  |            *
* %pred  | 43.3  1.7 0.0 0.9  1.6  4.2  21.6  23.3  2.8  0.8  |            *
* -------+----------------------------------------------------+----------- *
*                                                                          *
*  Note: This table is to be read in the following manner:                 *
*        8611 of all residues predicted to be in exposed by 0%, were       *
*        observed with 0% relative accessibility.  However, 325 of all     *
*        residues predicted to have 0% are observed as completely exposed  *
*        (obs = 9 -> rel. acc. >= 81%).  The term "observed" refers to the *
*        DSSP compilation of area of solvent accessibility calculated from *
*        3D coordinates of experimentally determined structures (Diction-  *
*        ary of Secondary Structure  of Proteins: Kabsch & Sander (1983)   *
*        Biopolymers, 22, 2577-2637).                                      *
*                                                                          *
*                                                                          *
*  Accuracy for each amino acid:                                           *
*  .............................                                           *
*                                                                          *
*  +---+------------------------------+-----+-------+------+               *
*  |AA |   Q3 b%o b%p i%o i%p e%o e%p | Q10 |  corr |    N |               *
*  +---+------------------------------+-----+-------+------+               *
*  | A | 59.0  87  60   2  38  66  57 |  31 | 0.530 | 5054 |               *
*  | C | 62.0  91  67   5  39  25  21 |  34 | 0.244 |  893 |               *
*  | D | 56.5  21  45   6  49  94  57 |  20 | 0.321 | 3536 |               *
*  | E | 60.8   9  40   3  41  98  61 |  21 | 0.347 | 3743 |               *
*  | F | 63.3  94  67   9  46  29  37 |  27 | 0.366 | 2436 |               *
*  | G | 52.1  75  51   1  31  67  53 |  22 | 0.405 | 4787 |               *
*  | H | 50.9  63  53  23  45  71  50 |  18 | 0.442 | 1366 |               *
*  | I | 64.9  95  68   6  41  30  38 |  34 | 0.360 | 3437 |               *
*  | K | 66.6   2  11   2  37  98  67 |  23 | 0.267 | 3652 |               *
*  | L | 61.6  93  65   8  44  31  40 |  31 | 0.368 | 5016 |               *
*  | M | 60.1  92  64   5  39  45  44 |  29 | 0.452 | 1371 |               *
*  | N | 55.5  45  45   8  38  87  59 |  17 | 0.410 | 2923 |               *
*  | P | 53.0  48  48   9  39  83  56 |  18 | 0.364 | 2920 |               *
*  | Q | 54.3  27  44   7  44  92  56 |  20 | 0.344 | 2225 |               *
*  | R | 49.9  15  47  36  47  76  51 |  18 | 0.372 | 2765 |               *
*  | S | 55.6  69  53   3  51  81  56 |  22 | 0.464 | 3981 |               *
*  | T | 51.8  61  51   8  38  78  53 |  21 | 0.432 | 3740 |               *
*  | V | 61.1  93  65   5  40  39  42 |  34 | 0.418 | 4156 |               *
*  | W | 56.2  85  62  20  49  29  27 |  21 | 0.318 |  891 |               *
*  | Y | 49.7  73  52  33  49  36  38 |  19 | 0.359 | 2301 |               *
*  +---+------------------------------+-----+-------+------+               *
*                                                                          *
*  Abbreviations:                                                          *
*                                                                          *
*  AA:   amino acid in one-letter code                                     *
*  b%o, i%o, e%o:   = Qburied, Qintermediate, Qexposed (% of observed),    *
*        i.e. percentage of correct prediction in each state, see above    *
*  b%p, i%p, e%p:   = Qburied, Qintermediate, Qexposed (% of predicted),   *
*        i.e. probability of correct prediction in each state, see above   *
*  b%o:  = Qburied (% of observed), see above                              *
*  Q10:  percentage of correctly predicted residues in each of the 10      *
*        states of predicted relative accessibility.                       *
*  corr: correlation between predicted and observed rel. acc.              *
*  N:    number of residues in data set                                    *
*                                                                          *
*                                                                          *
*  Accuracy for different secondary structure:                             *
*  ...........................................                             *
*                                                                          *
*  +--------+------------------------------+----+-------+-------+          *
*  | type   |   Q3 b%o b%p i%o i%p e%o e%p |Q10 |  corr |     N |          *
*  +--------+------------------------------+----+-------+-------+          *
*  | helix  | 59.5  79  64   8  44  80  56 | 27 | 0.574 | 20100 |          *
*  | strand | 61.3  84  73   9  46  69  37 | 35 | 0.524 | 13356 |          *
*  | loop   | 54.4  64  43  11  44  78  61 | 18 | 0.442 | 27968 |          *
*  +--------+------------------------------+----+-------+-------+          *
*                                                                          *
*  Abbreviations as before.                                                *
*                                                                          *
****************************************************************************
*                                                                          *
*  Position-specific reliability index                                     *
*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                     *
*                                                                          *
*  The network predicts the 10 states for relative accessibility using real*
*  numbers from the output units. The prediction is assigned by choosing   *
*  the maximal unit ("winner takes all").  However, the real numbers       *
*  contain additional information.                                         *
*  E.g. the difference between the maximal and the second largest output   *
*  unit (with the constraint that the second largest output is compiled    *
*  among all units at least 2 positions off the maximal unit) can be used  *
*  to derive a "reliability index".  This index is given for each residue  *
*  along with the prediction.  The index is scaled to have values between  *
*  0 (lowest reliability), and 9 (highest).                                *
*  The accuracies (Q3, corr, asf.) to be expected for residues with values *
*  above a particular value of the index are given below as well as the    *
*  fraction of such residues (%res).:                                      *
*                                                                          *
*  +---+------------------------------+----+-------+-------+               *
*  |RI |   Q3 b%o b%p i%o i%p e%o e%p |Q10 |  corr |  %res |               *
*  +---+------------------------------+----+-------+-------+               *
*  | 0 | 57.5  77  60   9  44  78  56 | 24 | 0.535 | 100.0 |               *
*  | 1 | 59.1  76  63   9  45  82  57 | 25 | 0.560 |  91.2 |               *
*  | 2 | 61.7  79  66   4  47  87  58 | 27 | 0.594 |  77.1 |               *
*  | 3 | 66.6  87  70   1  51  89  63 | 30 | 0.650 |  57.1 |               *
*  | 4 | 70.0  89  72   0  83  91  67 | 32 | 0.686 |  45.8 |               *
*  | 5 | 72.9  92  75   0   0  93  70 | 34 | 0.722 |  35.6 |               *
*  | 6 | 76.3  95  77   0   0  93  75 | 36 | 0.769 |  24.7 |               *
*  | 7 | 79.0  97  79   0   0  93  78 | 39 | 0.803 |  16.0 |               *
*  | 8 | 80.9  98  80   0   0  91  81 | 43 | 0.824 |   9.6 |               *
*  | 9 | 81.2  99  80   0   0  88  83 | 45 | 0.828 |   5.9 |               *
*  +---+------------------------------+----+-------+-------+               *
*                                                                          *
*  Abbreviations as before.                                                *
*                                                                          *
*  The above table gives the cumulative results, e.g. 45.8% of all         *
*  residues have a reliability of at least 4.  The correlation for this    *
*  most reliably predicted half of the residues is 0.686, i.e. a value     *
*  comparable to what could be expected if homology modelling were         *
*  possible.  For this subset of 45.8% of all residues, 89% of the buried  *
*  residues are correctly predicted, and 72% of all residues predicted to  *
*  be buried are correct.                                                  *
*                                                                          *
*..........................................................................*
*                                                                          *
*  The following table gives the non-cumulative quantities, i.e. the       *
*  values per reliability index range.  These numbers answer the question: *
*  how reliable is the prediction for all residues labeled with the        *
*  particular index i.                                                     *
*                                                                          *
*  +---+------------------------------+----+-------+-------+               *
*  |RI |   Q3 b%o b%p i%o i%p e%o e%p |Q10 |  corr |  %res |               *
*  +---+------------------------------+----+-------+-------+               *
*  | 0 | 40.9  79  40  16  41  21  40 | 14 | 0.175 |   8.8 |               *
*  | 1 | 45.4  61  46  28  44  48  44 | 17 | 0.278 |  14.1 |               *
*  | 2 | 47.4  53  52  10  46  80  44 | 19 | 0.343 |  19.9 |               *
*  | 3 | 52.9  75  59   4  50  77  47 | 23 | 0.439 |  11.4 |               *
*  | 4 | 60.0  81  63   0  83  84  56 | 25 | 0.547 |  10.1 |               *
*  | 5 | 65.2  82  70   0   0  93  62 | 28 | 0.607 |  10.9 |               *
*  | 6 | 71.3  90  72   0   0  94  70 | 31 | 0.692 |   8.8 |               *
*  | 7 | 76.0  94  76   0   0  95  75 | 34 | 0.762 |   6.3 |               *
*  | 8 | 80.5  97  81   0   0  94  79 | 39 | 0.808 |   3.8 |               *
*  | 9 | 81.2  99  80   0   0  88  83 | 45 | 0.828 |   5.9 |               *
*  +---+------------------------------+----+-------+-------+               *
*                                                                          *
*  For example, for residues with RI = 4 83% of all predicted intermediate *
*  residues are correctly predicted as such.                               *
*                                                                          *
*                                                                          *
****************************************************************************

PHD predictions

TOP - BOTTOM - PHD

PHD predictions for predict_h8378

Different levels of data:

PHD brief
PHD normal

PHDsec summary overall your protein can be classified as:
mixed given the following classes:
- 'all-alpha': %H > 45% AND %E < 5%
- 'all-beta': %H < 5% AND %E > 45%
- 'alpha-beta': %H > 30% AND %E > 20%
- 'mixed': all others

Predicted secondary structure composition for your protein:

%H: 43.4 %E: 17.1 %L: 39.5

Residue composition for your protein:

%A: 9.7 %C: 2.1 %D: 4.1 %E: 5.6 %F: 2.1

%G: 7.1 %H: 2.1 %I: 8.0 %K: 5.6 %L: 8.6

%M: 1.2 %N: 4.1 %P: 6.5 %Q: 5.3 %R: 2.4

%S: 5.9 %T: 6.2 %V: 10.0 %W: 0.9 %Y: 2.6

AA : amino acid sequence

PHD_sec: PHD predicted secondary structure: H=helix, E=extended (sheet), blank=other (loop)
PHD = PHD: Profile network prediction HeiDelberg

Rel_sec: reliability index for PHDsec prediction (0=low to 9=high)
Note: for the brief presentation strong predictions marked by '*'

SUB_sec: subset of the PHDsec prediction, for all residues with an expected average accuracy > 82% (tables in header)
NOTE: for this subset the following symbols are used:
L: is loop (for which above ' ' is used)
.: means that no prediction is made for this residue, as the reliability is: Rel < 5

pH_sec: 'probability' for assigning helix (1=high, 0=low)

pE_sec: 'probability' for assigning strand (1=high, 0=low)

pL_sec: 'probability' for assigning neither helix, nor strand (1=high, 0=low)

P_3_acc: PHD predicted relative solvent accessibility (acc) in 3 states: b = 0-9%, i = 9-36%, e = 36-100%.

Rel_acc: reliability index for PHDacc prediction (0=low to 9=high)
Note: for the brief presentation strong predictions marked by '*'

SUB_acc: subset of the PHDacc prediction, for all residues with an expected average correlation > 0.69 (tables in header)
NOTE: for this subset the following symbols are used:
I: is intermediate (for which above ' ' is used)
.: means that no prediction is made for this residue, as the reliability is: Rel < 4

PHD_acc: PHD predicted relative solvent accessibility (acc) in 10 states: a value of n (=0-9) corresponds to a relative acc. of between n*n % and (n+1)*(n+1) % (e.g. for n=5: 16-25%).

PHD results (brief)

PHD results (normal)

....,....1....,....2....,....3....,....4....,....5....,....6....,....7....,....8....,....9....,....10...,....11...,....12...,....13...,....14...,....15...,....16...,....17...,....18...,....19...,....20...,....21...,....22...,....23...,....24...,....25...,....26...,....27...,....28...,....29...,....30...,....31...,....32...,....33...,....34 AA MCAQYCISFADVEKAHINIQDSIHLTPVLTSSILNQIAGRNLFFKCELFQKTGSFKIRGALNAIRGLIPDTPEEKPKAVVTHSSGNHGQALTYAAKLEGIPAYIVVPQTAPNCKKLAIQAYGASIVYCDPSDESREKVTQRIMQETEGILVHPNQEPAVIAGQGTIALEVLNQVPLVDALVVPVGGGGMVAGIAITIKALKPSVKVYAAEPSNADDCYQSKLKGELTPNLHPPETIADGVKSSIGLNTWPIIRDLVDDVFTVTEDEIKYATQLVWGRMKLLIEPTAGVALAAVLSQHFQTVSPEVKNVCIVLSGGNVDLTSLNWVGQAERPAPYQTVSV PHD_sec HHHHHHHHHHHHHH HHHHH EEEEEE EE HHHHHHHH HHH EEEEE HHHHHHHHHHHH EEEEE HHHHHHHHHHH EEEEE HHHHHHHHHHHHH EEEE EE HHHHHHHHHH EEEEE HHHHHHHHHHHHHH EEEEE HHHHHHHHH E EE E HHHHHHHHHH EEEE HHHHHHHHHHHHH EEEE HHHHHHHHHHHH EEEEEE HHHHHHHHHHHHH E Rel_sec 999854453169999987876212488764168763169549985139886563131315999985599973422686599725327899999999718813999538993787999998199189982595499999999996489655517999741103334599999983378859996484599999999999938984599832577589999998179922266421111453111325899999993495598235379999999994247375163167999999963253466788499998189834999999999985299942139 SUB_sec LLLLL..L..HHHHHHHHHHH....LLLL..HHHH..LLL.EEEE..LLLLLL......HHHHHHHLLLLL....LLLEEEE.L..HHHHHHHHHHH.LL..EEEE.LLL.HHHHHHHHH.LL.EEEE.LLL.HHHHHHHHHHH.LLLEEE.LLLLL........HHHHHHHH..LLLEEEEE.L.HHHHHHHHHHHHH.LLL.EEEE..LLLHHHHHHHHH.LLL...LL.......L......HHHHHHHHH..LLEEE..L.HHHHHHHHHH...L.EE.L..HHHHHHHHHH..L..LLLLL.EEEEE.LLL..HHHHHHHHHHHH.LLL....L P_3_acc eeeebebbbeebeebeeebeebbeebbbeebeebbeebebebbbeeeeeeeb bbebbbbbbbbeeb eeeeeeeeebbbbbbbbbbbbbbbbbbeebbbebbbbbbee bebebebbee bbebbbbbee eebeeebeebeeeee ebbbbbe bbbbbbbbbbbbebbeebeebebbbbbbbbbbbbbbbbbbbeeb eebebbbb bebbbbbebbbeeeeebbebeebebbbebbbbebbeebbebbeebbeebbbbbeeebbbbbeebbeebebbbebbbbbbbbbbeeeeeeeeeeeeeebbbbbbbbbbebbebeebeeebeeeeeeebbb Rel_acc 122101202001125112711150211021012533100117460001011117616057002502302202321034689757101608665771040318194721000301011712031155212110210211312203310002601001106625030582082212221028894633245687844361211115177770210201310502211111121120035123221551155226120110534601106126711512101064123146879651111011010302069896432020401402512003103211201 SUB_acc ..............b...b...b..........b.......bbb.........bb.b.bb...b.............bbbbbbb...b.bbbbbb..b...b.bbb...........b......bb........................b.......bb.b...bb..b.........bbbbb...bbbbbbbb.b......b.bbbb..........b................b......bb..bb..b......b.bb....b..bb..b......bb....bbbbbbb..............bbbbbb.....b..b..b..............

GLOBE prediction of globularity

--- 
--- GLOBE: prediction of protein globularity
--- 
--- nexp =   147    (number of predicted exposed residues)
--- nfit =   139    (number of expected exposed residues
--- diff =     8.00 (difference nexp-nfit)
--- =====> your protein appears as compact, as a globular domain
--- 
--- 
--- GLOBE: further explanations preliminaryily in:
---        http://www.columbia.edu/~rost/Papers/98globe.html
--- 
--- END of GLOBE

END of results for file predict_h8378

Quotes for methods

PredictProtein: PredicProtein: B Rost (1996) Methods in Enzymology, 266:525-539
- Url: http://dodo.cpmc.columbia.edu
- Version: 1.99.08
- Description: PredictProtein is the acronym for all prediction programs run.
PROSITE: A Bairoch, P Bucher & K Hofmann (1997) Nucleic Acids Research, 25:217-221
- Author: A Bairoch, bairoch@cmu.unige.ch P Bucher & K Hofmann
- Contact: bairoch@cmu.unige.ch
- Url: http://www.expasy.ch/prosite/
- Version: 99.07
- Description: PROSITE is a database of functional motifs. ScanProsite, finds all functional motifs in your sequence that are annotated in the ProSite db.
SEG: J C Wootton & S Federhen (1996) Methods in Enzymology, 266:554-571
- Author: J C Wootton & S Federhen, wootton@ncbi.nlm.nih.gov
- Contact: wootton@ncbi.nlm.nih.gov
- Version: 1994
- Description: SEG divides sequences into regions of low-, and high-complexity. Low-complexity regions typically correspond to 'simple sequences' or 'compositionally-biased' regions.
ProDom: ELL Sonnhammer & D Kahn (1994) Protein Science, 3:482-492
- Author: LL Sonnhammer; J Gouzy, F Corpet, F Servant, D Kahn, dkahn@zyx.toulouse.inra.fr
- Contact: dkahn@zyx.toulouse.inra.fr
- Url: http://protein.toulouse.inra.fr/prodom.html
- Version: 99_2
- Description: ProDom is a database of putative protein domains. The database is searched with BLAST for domains corresponding to your protein.
MaxHom: MaxHom: C Sander R Schneider (1991) Proteins, 9:56-68
- Author: C Sander & R Schneider, schneider@lion-ag.de
- Contact: schneider@lion-ag.de
- Version: 1.99.04
- Description: MaxHom is a dynamic multiple sequence alignment program which finds similar sequences in a database.
MView: MView: N P Brown, C Leroy & C Sander (1998) Bioinformatics, 14:380-381
- Author: N Brown, nbrown@nimr.mrc.ac.uk
- Contact: nbrown@nimr.mrc.ac.uk
- Url: http://mathbio.nimr.mrc.ac.uk/~nbrown/mview/
- Copyright: Copyright (C) Nigel P. Brown, 1997-1998. All rights reserved.
- Version: 1.40.2
- Description: MView is a program converting multiple sequence alignments into fancy HTML formatted output.
PHD: B Rost (1996) Methods in Enzymology, 266:525-539
- Author: B Rost
- Version: 1.96
- Description: PHD is a suite of programs predicting 1D structure (secondary structure, solvent accessibility) from multiple sequence alignments.
PHDsec: B Rost & C Sander (1993) J. of Molecular Biology, 232:584-599
- Author: B Rost
- Version: 1.96
- Description: PHD predicts secondary structure from multiple sequence alignments.
PHDacc: B Rost & C Sander (1994) Proteins, 20:216-226
- Author: B Rost
- Version: 1.96
- Description: PHD predicts per residue solvent accessibility from multiple sequence alignments.
GLOBE: B Rost (1998) unpublished
- Author: B Rost
- Version: 1.98.05
- Description: GLOBE predicts the globularity of a protein.

Links:

TOP

%A: 9.7	%C: 2.1	%D: 4.1	%E: 5.6	%F: 2.1
%G: 7.1	%H: 2.1	%I: 8.0	%K: 5.6	%L: 8.6
%M: 1.2	%N: 4.1	%P: 6.5	%Q: 5.3	%R: 2.4
%S: 5.9	%T: 6.2	%V: 10.0	%W: 0.9	%Y: 2.6

AA :	amino acid sequence
PHD_sec:	PHD predicted secondary structure: H=helix, E=extended (sheet), blank=other (loop) PHD = PHD: Profile network prediction HeiDelberg
Rel_sec:	reliability index for PHDsec prediction (0=low to 9=high) Note: for the brief presentation strong predictions marked by '*'
SUB_sec:	subset of the PHDsec prediction, for all residues with an expected average accuracy > 82% (tables in header) NOTE: for this subset the following symbols are used: L: is loop (for which above ' ' is used) .: means that no prediction is made for this residue, as the reliability is: Rel < 5

pH_sec:	'probability' for assigning helix (1=high, 0=low)
pE_sec:	'probability' for assigning strand (1=high, 0=low)
pL_sec:	'probability' for assigning neither helix, nor strand (1=high, 0=low)
P_3_acc:	PHD predicted relative solvent accessibility (acc) in 3 states: b = 0-9%, i = 9-36%, e = 36-100%.
Rel_acc:	reliability index for PHDacc prediction (0=low to 9=high) Note: for the brief presentation strong predictions marked by '*'
SUB_acc:	subset of the PHDacc prediction, for all residues with an expected average correlation > 0.69 (tables in header) NOTE: for this subset the following symbols are used: I: is intermediate (for which above ' ' is used) .: means that no prediction is made for this residue, as the reliability is: Rel < 4

PHD_acc:	PHD predicted relative solvent accessibility (acc) in 10 states: a value of n (=0-9) corresponds to a relative acc. of between nn % and (n+1)(n+1) % (e.g. for n=5: 16-25%).