Links: | BOTTOM |
|
reference predict_h8378 (Jun 19, 2000 20:55:42) reference pred_h8378 (Jun 19, 2000 20:56:08) PPhdr from: kapilm@cs.brandeis.edu PPhdr resp: MAIL PPhdr orig: HTML PPhdr want: HTML PPhdr password(###) prediction of: - default prediction of: - PHDsec PHDacc PHDhtm ProSite SEG ProDom return msf format ret html # default: single protein sequence description=Serine Racemase MCAQYCISFADVEKAHINIQDSIHLTPVLTSSILNQIAGRNLFFKCELFQKTGSFKIRGALNAIRGLIPDTPEEKPKAVVTHSSGNHGQALTYAAKLEGIPAYIVVPQTAPNCKKLAIQAYGASIVYCDPSDESREKVTQRIMQETEGILVHPNQEPAVIAGQGTIALEVLNQVPLVDALVVPVGGGGMVAGIAITIKALKPSVKVYAAEPSNADDCYQSKLKGELTPNLHPPETIADGVKSSIGLNTWPIIRDLVDDVFTVTEDEIKYATQLVWGRMKLLIEPTAGVALAAVLSQHFQTVSPEVKNVCIVLSGGNVDLTSLNWVGQAERPAPYQTVSV
------------------------------------------------------------- Pattern-ID: PKC_PHOSPHO_SITE PS00005 PDOC00005 Pattern-DE: Protein kinase C phosphorylation site Pattern: [ST].[RK] 54 SFK 139 TQR 196 TIK 203 SVK Pattern-ID: CK2_PHOSPHO_SITE PS00006 PDOC00006 Pattern-DE: Casein kinase II phosphorylation site Pattern: [ST].{2}[DE] 8 SFAD 71 TPEE 212 SNAD 235 TIAD 261 TVTE Pattern-ID: MYRISTYL PS00008 PDOC00008 Pattern-DE: N-myristoylation site Pattern: G[^EDRKHPFYW].{2}[STAGCN][^P] 59 GALNAI 88 GQALTY 187 GGMVAG 239 GVKSSI 287 GVALAA Pattern-ID: DEHYDRATASE_SER_THR PS00165 PDOC00149 Pattern-DE: Serine/threonine dehydratases pyridoxal-phosphate attachment site Pattern: [DESH].{4,5}[STVG].[AS][FYI]K[DLIFSA][RVMF][GA][LIVMGA] 47 ELFQKTGSFKIRGA
>prot (#) ppOld, default: single protein sequence description=serine racemase /home/phd/server/work/predict_h8378
MCAQYCISFADVEKAHINIQDSIHLTPVLTSSILNQIAGRNLFFKCELFQKTGSFKIRGA
LNAIRGLIPDTPEEKPKAVVTHSSGNHGQALTYAAKLEGIPAYIVVPQTAPNCKKLAIQA
YGASIVYCDPSDESREKVTQRIMQETEGILVHPNQEPAVIAGQGTIALEVLNQVPLVDAL
xxxxxxxxxxxx
IAITIKALKPSVKVYAAEPSNADDCYQSKLKGELTPNLHPPETIADGV
KSSIGLNTWPIIRDLVDDVFTVTEDEIKYATQLVWGRMKLLIEPTAGVALAAVLSQHFQT
VSPEVKNVCIVLSGGNVDLTSLNWVGQAERPAPYQTVSV
Identities computed with respect to: (query) prot Colored by: consensus/70% and property
HSP processing: ranked
52 [ . . . . 1 . . . . : . . . . 2 . . . . : . . . . ] 292 prot (#) ppOld, default: single ... score P(N) N 100.0% TGSFKIRGALNAIXXXXXXXPEEKPKAVVTHSSGNHGQALTYAAKLEGIPAYIVVPQTXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXVDALVVPVGGGGMVAGIAITIKXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXDVFTVTEDEIKYATQLVWGRMKLLIEPTAGVALAA 1 PD000323 p99.2 (175) TRPB(29) CYSK(1... 56 0.019 4 35.2% SGSYKDRGAYSMI-------PGKKSVIVESTSGNTGAVALAMVAARLGLKCVIVMPES-------------------------------------------------------------------VDVIVASVGTGGTIAGVARYLK-----------------------------------------------------------EAVSVSDEEALEAGLLLGESEGIVPEPASAAAIAA consensus/100% oGSaK.RGAhshI PtcKshhV.opSusstu.ALshsAth.Gl.shIVhPpo VDslVssVGsGGhlAGlAhhlK -shoVo--Ehh.As.Llhtp.tll.EPsuusAlAA consensus/90% oGSaK.RGAhshI PtcKshhV.opSusstu.ALshsAth.Gl.shIVhPpo VDslVssVGsGGhlAGlAhhlK -shoVo--Ehh.As.Llhtp.tll.EPsuusAlAA consensus/80% oGSaK.RGAhshI PtcKshhV.opSusstu.ALshsAth.Gl.shIVhPpo VDslVssVGsGGhlAGlAhhlK -shoVo--Ehh.As.Llhtp.tll.EPsuusAlAA consensus/70% oGSaK.RGAhshI PtcKshhV.opSusstu.ALshsAth.Gl.shIVhPpo VDslVssVGsGGhlAGlAhhlK -shoVo--Ehh.As.Llhtp.tll.EPsuusAlAA |
--- ------------------------------------------------------------ --- --- Again: these results were obtained based on the domain data- --- base collected by Daniel Kahn and his coworkers in Toulouse. --- --- PLEASE quote: --- F Corpet, J Gouzy, D Kahn (1998). The ProDom database --- of protein domain families. Nucleic Ac Res 26:323-326. --- --- The general WWW page is on: ---- --------------------------------------- --- http://www.toulouse.inra.fr/prodom.html ---- --------------------------------------- --- --- For WWW graphic interfaces to PRODOM, in particular for your --- protein family, follow the following links (each line is ONE --- single link for your protein!!): --- http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=PD000323 ==> multiple alignment, consensus, PDB and PROSITE links of domain PD000323 http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=PD000323 ==> graphical output of all proteins having domain PD000323 --- --- NOTE: if you want to use the link, make sure the entire line --- is pasted as URL into your browser! --- --- END of PRODOM --- ------------------------------------------------------------
--- ------------------------------------------------------------ --- MAXHOM multiple sequence alignment --- ------------------------------------------------------------ --- --- MAXHOM ALIGNMENT HEADER: ABBREVIATIONS FOR SUMMARY --- ID : identifier of aligned (homologous) protein --- STRID : PDB identifier (only for known structures) --- PIDE : percentage of pairwise sequence identity --- WSIM : percentage of weighted similarity --- LALI : number of residues aligned --- NGAP : number of insertions and deletions (indels) --- LGAP : number of residues in all indels --- LSEQ2 : length of aligned sequence --- ACCNUM : SwissProt accession number --- NAME : one-line description of aligned protein --- --- MAXHOM ALIGNMENT HEADER: SUMMARY ID STRID IDE WSIM LALI NGAP LGAP LEN2 ACCNUM NAME ykv8_yeast 40 52 319 4 8 326 P36007 HYPOTHETICAL 34.9 KD PROT thd2_ecoli 35 47 320 4 6 329 P05792 THREONINE DEHYDRATASE CAT y4tj_rhisn 32 44 328 4 10 332 P55664 PUTATIVE THREONINE DEHYDR thdh_yeast 31 42 319 6 13 576 P00927 THREONINE DEHYDRATASE PRE thdh_arxad 31 43 320 5 9 550 O42615 THREONINE DEHYDRATASE PRE thd1_haein 31 44 327 5 10 513 P46493 DEAMINASE). thd1_salty 31 43 334 6 16 514 P20506 DEAMINASE). thd1_lyces 30 41 323 5 11 595 P25306 DEAMINASE). thd1_ecoli 1TDJ 30 42 334 6 16 514 P04968 DEAMINASE). thd1_burce 30 41 316 6 10 507 P53607 DEAMINASE). thd1_myctu 30 37 318 5 13 429 Q10766 DEAMINASE). thd1_bacsu 28 39 320 6 12 422 P37946 DEAMINASE). thd1_lacla 28 39 312 6 15 441 Q02145 DEAMINASE). thd1_soltu 28 41 185 3 3 359 P31212 (FRAGMENT). sdhl_rat 28 28 298 9 59 362 P09367 DEHYDRATASE (EC 4.2.1.16) sdhl_human 28 30 297 7 16 328 P20132 L-SERINE DEHYDRATASE (EC thd1_corgl 27 33 313 7 18 436 Q04513 DEAMINASE). --- --- MAXHOM ALIGNMENT: IN MSF FORMAT
--- ------------------------------------------------------------ --- 3D homologue: the known structure that appeared to have sig- --- 3D homologue: nificant sequence identity to your protein is: --- 3D homologue: 1TDJ, . --- 3D homologue: Note: we do NOT check whether the similarity --- 3D homologue: is in the region for which structure has --- 3D homologue: been determined. Thus, please verify! --- ------------------------------------------------------------
--- --- Version of database searched for alignment: --- SWISS-PROT release 38.0 (7/99) with 80000 proteins ---
Identities computed with respect to: (1) predict_h8370 Colored by: consensus/70% and property
1 [ . . . . : . . . . 1 . . . . : . . . . 2 . . . . : . . . . 3 . . . ] 339 1 predict_h8370 100.0% MCAQYCISFADVEKAHINIQDSIHLTPVLTSSILNQIAGRNLFFKCELFQKTGSFKIRGALNAIRGLIPDTPEEKPKAVVTHSSGNHGQALTYAAKLEGIPAYIVVPQTAPNCKKLAIQAYGASIVYCDPSDESREKVTQRIMQETEGILVHPNQEPAVIAGQGTIALEVLNQVPLVDALVVPVGGGGMVAGIAITIKALKPSVKVYAAEPSNADDCYQSKLKGELTPNLHPPETIADGVKSSIGLNTWPIIRDLVDDVFTVTEDEIKYATQLVWGRMKLLIEPTAGVALAAVLSQHFQTVSPEVKNVCIVLSGGNVDLTSLNWVGQAERPAPYQTVSV 2 ykv8_yeast 39.6% -------TYGDVLDASNRIKEYVNKTPVLTSRMLNDRLGAQIYFKGENFQRVGAFKFRGAMNAVSKL---SDEKRSKGVIAFSSGNHAQAIALSAKLLNVPATIVMPEDAPALKVAATAGYGAHIIRYNRYTEDREQIGRQLAAEHGFALIPPYDHPDVIAGQGTSAKELLEEVGQLDALFVPLGGGGLLSGSALAARSLSPGCKIFGVEPEAGNDGQQSFRSGSIV-HINTPKTIADGAQthLGEYTFAIIRENVDDILTVSDQELVKCMHFLAERMKVVVEPTACLGFAGALLKKEELVG---KKVGIILSGGNVDMKRYATLISGKEDGP------ 3 thd2_ecoli 33.8% ITYDLPVAIDDIIEAKQRLAGRIYKTGMPRSNYFSERCKGEIFLKFENMQRTGSFKIRGAFNKLSSL---TDAEKRKGVVACSAGNHAQGVSLSCAMLGIDGKVVMPKGAPKSKVAATCDYSAEVVLHGDNFNDTIAKVSEIVEMEGRIFIPPYDDPKVIAGQGTIGLEIMEDLYDVDNVIVPIGGGGLIAGIAVAIKSINPTIRVIGVQSENVHGMAASFHSGEITTHRT-TGTLADGCdsRPGNLTYEIVRELVDDIVLVSEDEIRNSMIALIQRNKVVTEGAGALACAALLSGKLDQYIQNRKTV-SIISGGNIDLSRVSQI-------------- 4 y4tj_rhisn 31.0% MNELSNLSLESIERARERIEEHVFRTPLTTSRSLTELTGTQVSLKLEHYQRTGSFKLRGATNAILQL---SPSDRARGVIAASTGNHGRALSYAAKAVGSRATICMSDLVPENKVSEIRKLGATVRIVGSSQDDAQVEVERLVAEEGLSMIPPFDHPHIIAGQRTVGLEIVEAMPDVAMVLLPLSGGGLAAGVAAAVKALRPHARIIGVTMDRGAAMKASIEAGHPV-QVKEYRSLADSLGGGIGmwTFQMCRALLDDVVLVNEGEIAAGIRHAYEHERQILEGAGAVGIAALLSG---KVAARGGSVGVVLSGQNIDMGLHREVINGVVRATEE---- 5 thdh_yeast 30.6% ---------------RSSVYDVINESPISQGVGLSSRLNTNVILKREDLLPVFSFKLRGAYNMIAKL---DDSQRNQGVIACSAGNHAQGVAFAAKHLKIPATIVMPVCTPSIKYQNVSRLGSQVVLYGNDFDEAKAECAKLAEERGLTNIPPFDHPYVIAGQGTVAMEILRQVrkIGAVFVPVGGGGLIAGIGAYLKRVAPHIKIIGVETYDAATLHNSLQRNQRTP-LPVVGTFADGTSvmIGEETFRVAQQVVDEVVLVNTDEICAAVKDIFEDTRSIVEPSGALSVAGMKK-YISTVHPEinTYVPILSGANMNFDRLRFVSERAVLGEGKEVFM 6 thdh_arxad 30.6% ---------------TSKVYDVCNETPVTPAVNLSSKLGANIFLKREDLQPVFSFKLRGAYNMMAHLP---QETRWKGVIACSAGNHAQGVAYSAKHLNIPATIVMPVVTPAIKYKNVDRLGAKVVLHGNDFDAAKAECNRLSEKHGLTNIPLFDNPYVIAGQGTIGVELLRQIdsLKAIFVCIGGGGLIAGVGAYIKRIAPQVKIIGVETYDANAMRQSLQKGERI-TLSEVGLFADGAAviLGEETFRLCQQVVDEIVLVSTDEICAAIKDVFTETRSIVEPAGALSVAGLVkeSHPEIDHSASGYTAILSGANMDFDRLRFVSERAKLGEGSEVFI 7 thd1_haein 30.1% -------SQSDYINAIVKLGSRVyvTPLQKMGKLSERLHNNIWIKREDRQPVNSFKLRGAYAMISSL---SAEQKAAGVIAASAGNHAQGVALSAKQLGLKALIVMPQNTPSIKVDAVRGFGGEVLLHGANFDEAKAKAIELSKEKNMTFIPPFDHPLVIAGQGTLAMEMLQQVADLDYVFVQVGGGGLAAGVAILLKQFMPEIKIIGVESKDSACLKAALDKGEPT-DLTHIGLFADGVAvrIGDETFRLCQQYLDDMVLVDSDEVCAAMKDLFENVRAVAEPSGALGLAGLKKYVKQNHI-EGKNMAAILSGANLNFHTLRYVSERCEIGENREALL 8 thd1_salty 29.8% MAESQPLSVAPEGAEYlpVYEAAQVTPLQKMEKLSSRLDNVILVKREDRQPVHSFKLRGAYAMMTGL---TEEQKAHGVITASAGNHAQGVAFSSARLGVKSLIVMPKATADIKVDAVRGLGGEVLLHGANFDEAKAKAIELAQQQGFTWVPPFDHPMVIAGQGTLALELLQQDSHLDRVFVPVGGGGLAAGVAVLIKQLMPQIKVIAVEAEDSA-CLKAALEAGHPVDLPRVGLFAEGVAvrIGDETFRLCQEYLDDIITVDSDAICAAMKDLFEDVRAVAEPSGALALAGMKKYIAQH-NIRGERLAHVLSGANVNFHGLRYVSEreQREGLLTVTI 9 thd1_lyces 29.6% -----------VDILASPVYDVAIESPLELAEKLSDRLGVNFYIKREDKQRVFSFKLRGAYNMMSNL---SREELDKGVITASAGNHAQGVALAGQRLNCVAKIVMPTTTPQIKIDAVRALGGDVVLYGKTFDEAQTHALELSEKDGLKYIPPFDDPGVIKGQGTIGTEINRQLKDIHAVFIPVGGGGLIAGVATFFKQIAPNTKIIGVEPYGAASMTLSLHEGHRV-KLSNVDTFADGVAvlVGEYTFAKCQELIDGMVLVANDGISAAIKDVYDEGRNILETSGAVAIAGAAA-YCEFYKIKNENIVAIASGANMDFSKLHKVTELakEALLATFMV 10 thd1_ecoli 29.2% MADSQPLSGAPEGAEYlpVYEAAQVTPLQKMEKLSSRLDNVILVKREDRQPVHSFKLRGAYAMMAGL---TEEQKAHGVITASAGNHAQGVAFSSARLGVKALIVMPTATADIKVDAVRGFGGEVLLHGANFDEAKAKAIELSQQQGFTWVPPFDHPMVIAGQGTLALELLQQDAHLDRVFVPVGGGGLAAGVAVLIKQLMPQIKVIAVEAEDSA-CLKAALDAGHPVDLPRVGLFAEGVAvrIGDETFRLCQEYLDDIITVDSDAICAAMKDLFEDVRAVAEPSGALALAGMKK-YIALHNIRGERLAHILSGANVNFHGLRYVSEreQREALLAVTI 11 thd1_burce 29.3% ---------------TARVYDVAFETELEPARNLSARLRNPVYLKREDNQPVFSFKLRGAYNKMAHIP---ADALARGVITASAGNHAQGVAFSAARMGVKAVIVVPVTTPQVKVDAVRAHGGPGVEVIQAGESYSDaaLKVQEERGLTFVHPFDDPYVIAGQGTIAMEILRQHqpIHAIFVPIGGGGLAAGVAAYVKAVRPEIKVIGVQAEDSCAMAQSLQAGKRV-ELAEVGLFADGTAvlVGEETFRLCKEYLDGVVTVDTDALCAAIKDVFQDTRSVLEPSGALAVAGAKL-YAEREGIENQTLVAVTSGANMNFDRMRFVAERAEVGEARE--- 12 thd1_myctu 28.6% --PLFSLSGADIDRAAKRIAPVVTPTPLQPSDRLSAITGATVYLKREDLQTVRSYKLRGAYNLLVQL---SDEELAAGVVCSSAGNHAQGFAYACRCLGVHGRVYVPAKTPKQKRDRIRYHGGEFIDLIVGGSTYDLAAAAALEDVErtLVPPFDDLRTIAGQGTIAVEVLGQLeePDLVVVPVGGGGCIAGITTYLAERTTNTAVLGVEPAGAAAMMAALAAGEPVTLDHVDQFVDGAAVNRAGTLTYAALAAAGDMVstVDEGAVCTAMLDLYQNEGIIAEPAGALSVAGLLEADIEPGST----VVCLISGGNNDVSRYGEVLE------------ 13 thd1_bacsu 28.4% LKENSLIQVKHILKAHQNVKDVVIHTPLQRNDRLSERYECNIYLKREDLQVVRSFKLRGAYHKMKQL---SSEQTENGVVCASAGNHAQGVAFSCKHLGIHGKIFMPSTTPRQKVSQVELFGKgiILTGDTFDDVYKSAAECCEAESRTFIHPFDDPDVMAGQGTLAVEILNDIdePHFLFASVGGGGLLSGVGTYLKNVSPDTKVIAVEPAGAASYFESNKAGHVV-TLDKIDKFVDGAAvkIGEETFRTLETVVDDILLVPEGKVCTSILELYNECAVVAEPAGALSVAALDLYKDQIKG---KNVVCVVSGGNNDIGRMQEMKE------------ 14 thd1_lacla 28.2% --------LSNKYQANIYLKEVVTKTPLQLDPYLSNKYQANIYLKEENLQKVRSFKLRGAYYSISKL---SDEQRSKGVVCASAGNHAQGVAFAANQLNISATIFMPVTTPNQKISQVKFFGESHVtiGDTFDESARAAKAFSQDNDKPFIDPFDDENVIAGQGTVALEIFAQAksLDKIFVQIGGGGLIAGITAYSKERYPQTEIIGVEAKGATSMKAAYSAGQPV-TLEHIDKFADGIAvtVGQKTYQLINDKVKQLLAVDEGLISQTILELYSKLGIVAEPAGATSVAALELIKDEIKG---KNIVCIISGGNNDISRMQEIEE------------ 15 thd1_soltu 28.9% --------------------------------------------------------------------------------------------------------------------------------------------------------PFDAPGVIKGQGTIGTEINRQLKDIHAVFVPVGGGGLISGVAAYFTQVAPHTKIIGVEPYGAASMTLSLYEGHRV-KLENVDTFADGVAvlVGEYTFAKCQELIDGMVLVRNDGISAAIKDVYDEGRNILETSGAVAIAGAAA-YCEFYNIKNENIVAIASGANMDFSKLHKVTELAELGSDNEALL 16 sdhl_rat 27.1% -------------------QESLhkTPLRDSMALSKVAGTSVFLKMDSSQPSGSFKIRGIGHLCkaLLPDTPSPL-------TAGNAGMATAYAARRLGLPATIVVPSTTPALTIERLKNEGATVEVVGEMLDEAIQLAKALEKNNPgvYISPFDDPLIWEGHTSLVKELKETLskPGAIVLSVGGGGLLCGVVQGLREvwEDVPIIAMETFGAHS-FHAAVKEGKLVTLPKITSVAKALgnTVGAQTLKLFYEHPIFSEVISDQEAVTAIEKFVDDEKILVEPACGAALAAVYSGvgRLQTPLASLVVIVCGGSNISLAQLQAL-------------- 17 sdhl_human 27.2% -------------------------TPIRDSMALSKMAGTSVYLKMDSAQPSGSFKIRGIGHFCKRWA----KQGCAHFVCSSAGNAGMAAAYAARQLGVPATIVVPGTTPALTIERLKNEGAtkVVGELLDEAFELAKALAKNNPGWVYIPPFDDPLIWEGHASIVKELKETLwkPGAIALSVGGGGLLCGVVQGLQegWGDVPVIAMETFGAHSFHAATTAGKLV-SLPKITSVAKALGvtVGSQALKLFQEHPIFSEVISDQEAVAAIEKFVDDEKILVEPAWGAALAAVYSHVIQKLQLepSLVVIVCGGSNISLAQLRALKE------------ 18 thd1_corgl 26.5% ------IRAADIQTAQARISSVIAPTPLQYCPRLSEETGAEIYLKREDLQDVRSYKIRGALNSG---AQSPQEQRDAGIVAASAGNHAQGVAYVCKSLGVQGRIYVPVQTPKQKRDRIMVHGGEFVSLVVTGNNFDEASAAAHEDAErtLIEPFDARNTVIGQGTVAAEILSQLtsADHVMVPVGGGGLLAGVVSYMADMAPRTAIVGIEPAGAAS-MQAALHNGGPITLETVDPFVDGAEvrVGDLNYTIVEKNQGRVHMMSATEGAVCTEmlYQNEGIIAEPAGALSIAGLKEMSFAPGSV----VVCIISGGNNDVLRYAEIAE------------ consensus/100% ..........................................................................................................................................................pt..hh.Gpto.shEh.tt....t.lhh.luGGGhhsG.s.h.tth..th.lhuhp..tst....u..ttt........t.hstuh.s..G..sh.hh.t......hh.tt.h..sh..hhtp.t.lhEsshshuhAuh..............h..lhuGtN.sh..ht.h.............. consensus/90% .........................o.h.....hst.ht..h.hK.-....s.uaKhRGhh..h........t.h.......osGNtu.uhshsst..t..uhlhhs..sst.ph.th...ut.h..h....pt.......h.t...h..l.PaDt..hhtGpsolshEl.tp....thlhl.lGGGGhhsGls.hhtth.sph.lhuhps.sut..h.uh.tst....h...t.hscuhtshhG..shthhtt......hlsptth..sh..hhpp.t.lhEsssuhuhAuh..............hs.lhuGuN.shttht.l.............. consensus/80% ..................l.t.h..TPl.....Lophhts.lhlKhEshQ.shSFKlRGAhthh.tl. ..pph.tullstSuGNHupuhuhust.lsl.uhIhhP.tsPt.Khttlp.hGuphl.h....pph...s.th.pptthhhl.PFDcP.lltGQGTluhElhpph.p.thlhlslGGGGLhuGlshhhpphhPphtllulEs.susshhtuh.tut.s.pl..ht.hA-uhsshlGp.TathhpphhcthhhVspstlssuh.tlhpc.t.lhEsuuululAuhh....t.ht...p.lshlhSGuN.shtphp.l.p............ consensus/70% ..................l.phh..TPlp.s.tLSphhtsslalKtEshQ.stSFKlRGAhshhttL. stcptstGVlssSAGNHAQulAauuppLsl.uhIshP.sTPp.KhptlpthGuphlhhstsh-phpttshthtpppshshlsPFDcPhVIAGQGTluhElhppltplctlhVslGGGGLlAGlushl+plhPph+lIuVEs.susshhtuh.tGphs.pLtplshhADGsustlGp.TaplhpphlDtllhVspstlssuhcclapc.+.lsEPuGAlulAuhht.hhphhs...pplshllSGuNhshsphp.ltE............ |
**************************************************************************** * * * Prediction of: * * - secondary structure, by PHDsec * * - solvent accessibility, by PHDacc * * * * PHD: Profile fed neural network systems from HeiDelberg * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * * * Author: Burkhard Rost * * EMBL, Heidelberg, FRG * * Meyerhofstrasse 1, 69 117 Heidelberg * * Internet: Predict-Help@EMBL-Heidelberg.DE * * * * All rights reserved. * * * **************************************************************************** * * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * Secondary structure prediction by PHDsec: * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * * * Author: Burkhard Rost * * EMBL, Heidelberg, FRG * * Meyerhofstrasse 1, 69 117 Heidelberg * * Internet: Rost@EMBL-Heidelberg.DE * * * * All rights reserved. * * * * * **************************************************************************** * * * About the network method * * ~~~~~~~~~~~~~~~~~~~~~~~~ * * * * The network procedure is described in detail in: * * 1) Rost, Burkhard; Sander, Chris: * * Prediction of protein structure at better than 70% accuracy. * * J. Mol. Biol., 1993, 232, 584-599. * * * * A brief description is given in: * * Rost, Burkhard; Sander, Chris: * * Improved prediction of protein secondary structure by use of se- * * quence profiles and neural networks. * * Proc. Natl. Acad. Sci. U.S.A., 1993, 90, 7558-7562. * * * * The PHD mail server is described in: * * 2) Rost, Burkhard; Sander, Chris; Schneider, Reinhard: * * PHD - an automatic mail server for protein secondary structure * * prediction. * * CABIOS, 1994, 10, 53-60. * * * * The latest improvement steps (up to 72%) are explained in: * * 3) Rost, Burkhard; Sander, Chris: * * Combining evolutionary information and neural networks to predict * * protein secondary structure. * * Proteins, 1994, 19, 55-72. * * * * To be quoted for publications of PHD output: * * Papers 1-3 for the prediction of secondary structure and the pre- * * diction server. * * * **************************************************************************** * * * About the input to the network * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * * * The prediction is performed by a system of neural networks. * * The input is a multiple sequence alignment. It is taken from an HSSP * * file (produced by the program MaxHom: * * Sander, Chris & Schneider, Reinhard: Database of Homology-Derived * * Structures and the Structural Meaning of Sequence Alignment. * * Proteins, 1991, 9, 56-68. * * * * For optimal results the alignment should contain sequences with varying * * degrees of sequence similarity relative to the input protein. * * The following is an ideal situation: * * * * +-----------------+----------------------+ * * | sequence: | sequence identity | * * +-----------------+----------------------+ * * | target sequence | 100 % | * * | aligned seq. 1 | 90 % | * * | aligned seq. 2 | 80 % | * * | ... | ... | * * | aligned seq. 7 | 30 % | * * +-----------------+----------------------+ * * * **************************************************************************** * * * Estimated Accuracy of Prediction * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * * * A careful cross validation test on some 250 protein chains (in total * * about 55,000 residues) with less than 25% pairwise sequence identity * * gave the following results: * * * * ++================++-----------------------------------------+ * * || Qtotal = 72.1% || ("overall three state accuracy") | * * ++================++-----------------------------------------+ * * * * +----------------------------+-----------------------------+ * * | Qhelix (% of observed)=70% | Qhelix (% of predicted)=77% | * * | Qstrand(% of observed)=62% | Qstrand(% of predicted)=64% | * * | Qloop (% of observed)=79% | Qloop (% of predicted)=72% | * * +----------------------------+-----------------------------+ * *..........................................................................* * * * These percentages are defined by: * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * * * | number of correctly predicted residues * * |Qtotal = --------------------------------------- (*100)* * | number of all residues * * | * * | no of res correctly predicted to be in helix * * |Qhelix (% of obs) = -------------------------------------------- (*100)* * | no of all res observed to be in helix * * | * * | * * | no of res correctly predicted to be in helix * * |Qhelix (% of pred)= -------------------------------------------- (*100)* * | no of all residues predicted to be in helix * * * *..........................................................................* * * * Averaging over single chains * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * * * The most reasonable way to compute the overall accuracies is the above * * quoted percentage of correctly predicted residues. However, since the * * user is mainly interested in the expected performance of the prediction * * for a particular protein, the mean value when averaging over protein * * chains might be of help as well. Computing first the three state * * accuracy for each protein chain, and then averaging over 250 chains * * yields the following average: * * * * +-------------------------------====--+ * * | Qtotal/averaged over chains = 72.2% | * * +-------------------------------====--+ * * | standard deviation = 9.3% | * * +-------------------------------------+ * * * *..........................................................................* * * * Further measures of performance * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * * * Matthews correlation coefficient: * * * * +---------------------------------------------+ * * | Chelix = 0.63, Cstrand = 0.53, Cloop = 0.52 | * * +---------------------------------------------+ * *..........................................................................* * * * Average length of predicted secondary structure segments: * * * * . +------------+----------+ * * . | predicted | observed | * * +-----------+------------+----------+ * * | Lhelix = | 10.3 | 9.3 | * * | Lstrand = | 5.0 | 5.3 | * * | Lloop = | 7.2 | 5.9 | * * +-----------+------------+----------+ * *..........................................................................* * * * The accuracy matrix in detail: * * * * +---------------------------------------+ * * | number of residues with H, E, L | * * +---------+------+------+------+--------+ * * | |net H |net E |net L |sum obs | * * +---------+------+------+------+--------+ * * | obs H |12447 | 1255 | 3990 | 17692 | * * | obs E | 949 | 7493 | 3750 | 12192 | * * | obs L | 2604 | 2875 |19962 | 25441 | * * +---------+------+------+------+--------+ * * | sum Net |16000 |11623 |27702 | 55325 | * * +---------+------+------+------+--------+ * * * * Note: This table is to be read in the following manner: * * 12447 of all residues predicted to be in helix, were observed to * * be in helix, 949 however belong to observed strands, 2604 to * * observed loop regions. The term "observed" refers to the DSSP * * assignment of secondary structure calculated from 3D coordinates * * of experimentally determined structures (Dictionary of Secondary * * Structure of Proteins: Kabsch & Sander (1983) Biopolymers, 22, * * 2577-2637). * * * **************************************************************************** * * * Position-specific reliability index * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * * * The network predicts the three secondary structure types using real * * numbers from the output units. The prediction is assigned by choosing * * the maximal unit ("winner takes all"). However, the real numbers * * contain additional information. * * E.g. the difference between the maximal and the second largest output * * unit can be used to derive a "reliability index". This index is given * * for each residue along with the prediction. The index is scaled to * * have values between 0 (lowest reliability), and 9 (highest). * * The accuracies (Qtot) to be expected for residues with values above a * * particular value of the index are given below as well as the fraction * * of such residues (%res).: * * * * +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ * * | index| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | * * | %res |100.0| 99.2| 90.4| 80.9| 71.6| 62.5| 52.8| 42.3| 29.8| 14.1| * * +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ * * | | | | | | | | | | | | * * | Qtot | 72.1| 72.3| 74.8| 77.7| 80.3| 82.9| 85.7| 88.5| 91.1| 94.2| * * | | | | | | | | | | | | * * +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ * * | H%obs| 70.4| 70.6| 73.7| 77.1| 80.1| 83.1| 86.0| 89.3| 92.5| 96.4| * * | E%obs| 61.5| 61.7| 63.7| 66.6| 69.1| 71.7| 74.6| 77.0| 77.8| 68.1| * * | | | | | | | | | | | | * * | H%prd| 77.8| 78.0| 80.0| 82.6| 84.7| 86.9| 89.2| 91.3| 93.1| 95.4| * * | E%prd| 64.5| 64.7| 67.8| 71.0| 74.2| 77.6| 81.4| 85.1| 89.8| 93.5| * * +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+ * * * * The above table gives the cumulative results, e.g. 62.5% of all * * residues have a reliability of at least 5. The overall three-state * * accuracy for this subset of almost two thirds of all residues is 82.9%. * * For this subset, e.g., 83.1% of the observed helices are correctly * * predicted, and 86.9% of all residues predicted to be in helix are * * correct. * * * *..........................................................................* * * * The following table gives the non-cumulative quantities, i.e. the * * values per reliability index range. These numbers answer the question: * * how reliable is the prediction for all residues labeled with the * * particular index i. * * * * +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ * * | index| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | * * | %res | 8.8| 9.5| 9.3| 9.1| 9.7| 10.5| 12.5| 15.7| 14.1| * * +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ * * | | | | | | | | | | | * * | Qtot | 46.6| 50.6| 57.7| 62.6| 67.9| 74.2| 82.2| 88.3| 94.2| * * | | | | | | | | | | | * * +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ * * | H%obs| 36.8| 42.3| 49.5| 55.2| 61.7| 69.9| 78.8| 87.4| 96.4| * * | E%obs| 44.7| 44.5| 52.1| 55.4| 60.9| 68.0| 75.9| 81.0| 68.1| * * | | | | | | | | | | | * * | H%prd| 49.9| 52.5| 60.3| 64.2| 69.2| 77.5| 85.4| 89.9| 95.4| * * | E%prd| 41.7| 47.1| 53.6| 57.0| 64.0| 71.6| 78.8| 88.8| 93.5| * * +------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ * * * * For example, for residues with Relindex = 5 64% of all predicted betha- * * strand residues are correctly identified. * * * * * **************************************************************************** * * * * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * Solvent accessibility prediction by PHDacc: * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * * * Author: Burkhard Rost * * EMBL, Heidelberg, FRG * * Meyerhofstrasse 1, 69 117 Heidelberg * * Internet: Rost@EMBL-Heidelberg.DE * * * * All rights reserved. * * * * * **************************************************************************** * * * About the network method * * ~~~~~~~~~~~~~~~~~~~~~~~~ * * * * The network for prediction of secondary structure is described in * * detail in: * * Rost, Burkhard; Sander, Chris: * * Prediction of protein structure at better than 70% accuracy. * * J. Mol. Biol., 1993, 232, 584-599. * * * * The analysis of the prediction of solvent exposure is given in: * * Rost, Burkhard; Sander, Chris: * * Conservation and prediction of solvent accessibility in protein * * families. Proteins, 1994, 20, 216-226. * * * * To be quoted for publications of PHD exposure prediction: * * Both papers quoted above. * * * **************************************************************************** * * * Definition of accessibility * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * * * For training the residue solvent accessibility the DSSP (Dictionary of * * Secondary Structure of Proteins; Kabsch & Sander (1983) Biopolymers, 22,* * 2577-2637) values of accessible surface area have been used. The * * prediction provides values for the relative solvent accessibility. The * * normalisation is the following: * * * * | ACCESSIBILITY (from DSSP in Angstrom) * * |RELATIVE_ACCESSIBILITY = ------------------------------------- * 100 * * | MAXIMAL_ACC (amino acid type i) * * * * where MAXIMAL_ACC (i) is the maximal accessibility of amino acid type i.* * The maximal values are: * * * * +----+----+----+----+----+----+----+----+----+----+----+----+ * * | A | B | C | D | E | F | G | H | I | K | L | M | * * | 106| 160| 135| 163| 194| 197| 84| 184| 169| 205| 164| 188| * * +----+----+----+----+----+----+----+----+----+----+----+----+ * * | N | P | Q | R | S | T | V | W | X | Y | Z | * * | 157| 136| 198| 248| 130| 142| 142| 227| 180| 222| 196| * * +----+----+----+----+----+----+----+----+----+----+----+ * * * * Notation: one letter code for amino acid, B stands for D or N; Z stands * * for E or Q; and X stands for undetermined. * * * * The relative solvent accessibility can be used to estimate the number * * of water molecules (W) in contact with the residue: * * * * W = ACCESSIBILITY /10 * * * * The prediction is given in 10 states for relative accessibility, with * * * * RELATIVE_ACCESSIBILITY = (PREDICTED_ACC * PREDICTED_ACC) * * * * where PREDICTED_ACC = 0 - 9. * * * **************************************************************************** * * * Estimated Accuracy of Prediction * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * * * A careful cross validation test on some 238 protein chains (in total * * about 62,000 residues) with less than 25% pairwise sequence identity * * gave the following results: * * * * * * Correlation * * ........... * * * * The correlation between observed and predicted solvent accessibility * * is: * * * * ----------- * * corr = 0.53 * * ----------- * * * * This value ought to be compared to the worst and best case prediction * * scenario: random prediction (corr = 0.0) and homology modelling * * (corr = 0.66). (Note: homology modelling yields a relative accurate * * prediction in 3D if, and only if, a significantly identical sequence * * has a known 3D structure.) * * * * * * 3-state accuracy * * ................ * * * * Often the relative accessibility is projected onto, e.g., 3 states: * * b = buried (here defined as < 9% relative accessibility), * * i = intermediate ( 9% <= rel. acc. < 36% ), * * e = exposed ( rel. acc. >= 36% ). * * * * A projection onto 3 states or 2 states (buried/exposed) enables the * * compilation of a 3- and 2-state prediction accuracy. PHD reaches an * * overall 3-state accuracy of: * * Q3 = 57.5% * * (compared to 35% for random prediction and 70% for homology modelling). * * * * In detail: * * * * +-----------------------------------+-------------------------+ * * | Qburied (% of observed)=77% | Qb (% of predicted)=60% | * * | Qintermediate (% of observed)= 9% | Qi (% of predicted)=44% | * * | Qexposed (% of observed)=78% | Qe (% of predicted)=56% | * * +-----------------------------------+-------------------------+ * * * * * * 10-state accuracy * * ................. * * * * The network predicts relative solvent accessibility in 10 states, with * * state i (i = 0-9) corresponding to a relative solvent accessibility of * * i*i %. The 10-state accuracy of the network is: * * * * Q10 = 24.5% * * * *..........................................................................* * * * These percentages are defined by: * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * * * | number of correctly predicted residues * * |Q3 = --------------------------------------- (*100)* * | number of all residues * * | * * | no of res. correctly predicted to be buried * * |Qburied (% of obs) = ------------------------------------------- (*100)* * | no of all res. observed to be buried * * | * * | * * | no of res. correctly predicted to be buried * * |Qburied (% of pred)= ------------------------------------------- (*100)* * | no of all residues predicted to be buried * * * *..........................................................................* * * * Averaging over single chains * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * * * The most reasonable way to compute the overall accuracies is the above * * quoted percentage of correctly predicted residues. However, since the * * user is mainly interested in the expected performance of the prediction * * for a particular protein, the mean value when averaging over protein * * chains might be of help as well. Computing first the correlation * * between observed and predicted accessibility for each protein chan, and * * then averaging over all 238 chains yields the following average: * * * * +-------------------------------====--+ * * | corr/averaged over chains = 0.53 | * * +-------------------------------====--+ * * | standard deviation = 0.11 | * * +-------------------------------------+ * * * *..........................................................................* * * * Further details of performance accuracy * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * * * The accuracy matrix in detail: * * .............................. * * * * -------+----------------------------------------------------+----------- * * \ PHD | 0 1 2 3 4 5 6 7 8 9 | SUM %obs * * -------+----------------------------------------------------+----------- * * OBS 0 | 8611 140 8 44 82 169 772 334 27 0 | 10187 16.6 * * OBS 1 | 4367 164 0 50 106 231 738 346 44 3 | 6049 9.8 * * OBS 2 | 3194 168 1 68 125 303 951 513 42 7 | 5372 8.7 * * OBS 3 | 2760 159 8 80 136 327 1246 746 58 19 | 5539 9.0 * * OBS 4 | 2312 144 2 72 166 396 1615 1245 124 19 | 6095 9.9 * * OBS 5 | 1873 96 3 84 138 425 1979 1834 187 27 | 6646 10.8 * * OBS 6 | 1387 67 1 60 80 278 2237 2627 231 51 | 7019 11.4 * * OBS 7 | 1082 35 0 32 56 225 1871 3107 302 60 | 6770 11.0 * * OBS 8 | 660 25 0 27 43 136 1206 2374 325 87 | 4883 7.9 * * OBS 9 | 325 20 2 27 29 74 648 1159 366 214 | 2864 4.7 * * -------+----------------------------------------------------+----------- * * SUM |26571 1018 25 544 961 2564 13263 14285 1706 487 | * * %pred | 43.3 1.7 0.0 0.9 1.6 4.2 21.6 23.3 2.8 0.8 | * * -------+----------------------------------------------------+----------- * * * * Note: This table is to be read in the following manner: * * 8611 of all residues predicted to be in exposed by 0%, were * * observed with 0% relative accessibility. However, 325 of all * * residues predicted to have 0% are observed as completely exposed * * (obs = 9 -> rel. acc. >= 81%). The term "observed" refers to the * * DSSP compilation of area of solvent accessibility calculated from * * 3D coordinates of experimentally determined structures (Diction- * * ary of Secondary Structure of Proteins: Kabsch & Sander (1983) * * Biopolymers, 22, 2577-2637). * * * * * * Accuracy for each amino acid: * * ............................. * * * * +---+------------------------------+-----+-------+------+ * * |AA | Q3 b%o b%p i%o i%p e%o e%p | Q10 | corr | N | * * +---+------------------------------+-----+-------+------+ * * | A | 59.0 87 60 2 38 66 57 | 31 | 0.530 | 5054 | * * | C | 62.0 91 67 5 39 25 21 | 34 | 0.244 | 893 | * * | D | 56.5 21 45 6 49 94 57 | 20 | 0.321 | 3536 | * * | E | 60.8 9 40 3 41 98 61 | 21 | 0.347 | 3743 | * * | F | 63.3 94 67 9 46 29 37 | 27 | 0.366 | 2436 | * * | G | 52.1 75 51 1 31 67 53 | 22 | 0.405 | 4787 | * * | H | 50.9 63 53 23 45 71 50 | 18 | 0.442 | 1366 | * * | I | 64.9 95 68 6 41 30 38 | 34 | 0.360 | 3437 | * * | K | 66.6 2 11 2 37 98 67 | 23 | 0.267 | 3652 | * * | L | 61.6 93 65 8 44 31 40 | 31 | 0.368 | 5016 | * * | M | 60.1 92 64 5 39 45 44 | 29 | 0.452 | 1371 | * * | N | 55.5 45 45 8 38 87 59 | 17 | 0.410 | 2923 | * * | P | 53.0 48 48 9 39 83 56 | 18 | 0.364 | 2920 | * * | Q | 54.3 27 44 7 44 92 56 | 20 | 0.344 | 2225 | * * | R | 49.9 15 47 36 47 76 51 | 18 | 0.372 | 2765 | * * | S | 55.6 69 53 3 51 81 56 | 22 | 0.464 | 3981 | * * | T | 51.8 61 51 8 38 78 53 | 21 | 0.432 | 3740 | * * | V | 61.1 93 65 5 40 39 42 | 34 | 0.418 | 4156 | * * | W | 56.2 85 62 20 49 29 27 | 21 | 0.318 | 891 | * * | Y | 49.7 73 52 33 49 36 38 | 19 | 0.359 | 2301 | * * +---+------------------------------+-----+-------+------+ * * * * Abbreviations: * * * * AA: amino acid in one-letter code * * b%o, i%o, e%o: = Qburied, Qintermediate, Qexposed (% of observed), * * i.e. percentage of correct prediction in each state, see above * * b%p, i%p, e%p: = Qburied, Qintermediate, Qexposed (% of predicted), * * i.e. probability of correct prediction in each state, see above * * b%o: = Qburied (% of observed), see above * * Q10: percentage of correctly predicted residues in each of the 10 * * states of predicted relative accessibility. * * corr: correlation between predicted and observed rel. acc. * * N: number of residues in data set * * * * * * Accuracy for different secondary structure: * * ........................................... * * * * +--------+------------------------------+----+-------+-------+ * * | type | Q3 b%o b%p i%o i%p e%o e%p |Q10 | corr | N | * * +--------+------------------------------+----+-------+-------+ * * | helix | 59.5 79 64 8 44 80 56 | 27 | 0.574 | 20100 | * * | strand | 61.3 84 73 9 46 69 37 | 35 | 0.524 | 13356 | * * | loop | 54.4 64 43 11 44 78 61 | 18 | 0.442 | 27968 | * * +--------+------------------------------+----+-------+-------+ * * * * Abbreviations as before. * * * **************************************************************************** * * * Position-specific reliability index * * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * * * The network predicts the 10 states for relative accessibility using real* * numbers from the output units. The prediction is assigned by choosing * * the maximal unit ("winner takes all"). However, the real numbers * * contain additional information. * * E.g. the difference between the maximal and the second largest output * * unit (with the constraint that the second largest output is compiled * * among all units at least 2 positions off the maximal unit) can be used * * to derive a "reliability index". This index is given for each residue * * along with the prediction. The index is scaled to have values between * * 0 (lowest reliability), and 9 (highest). * * The accuracies (Q3, corr, asf.) to be expected for residues with values * * above a particular value of the index are given below as well as the * * fraction of such residues (%res).: * * * * +---+------------------------------+----+-------+-------+ * * |RI | Q3 b%o b%p i%o i%p e%o e%p |Q10 | corr | %res | * * +---+------------------------------+----+-------+-------+ * * | 0 | 57.5 77 60 9 44 78 56 | 24 | 0.535 | 100.0 | * * | 1 | 59.1 76 63 9 45 82 57 | 25 | 0.560 | 91.2 | * * | 2 | 61.7 79 66 4 47 87 58 | 27 | 0.594 | 77.1 | * * | 3 | 66.6 87 70 1 51 89 63 | 30 | 0.650 | 57.1 | * * | 4 | 70.0 89 72 0 83 91 67 | 32 | 0.686 | 45.8 | * * | 5 | 72.9 92 75 0 0 93 70 | 34 | 0.722 | 35.6 | * * | 6 | 76.3 95 77 0 0 93 75 | 36 | 0.769 | 24.7 | * * | 7 | 79.0 97 79 0 0 93 78 | 39 | 0.803 | 16.0 | * * | 8 | 80.9 98 80 0 0 91 81 | 43 | 0.824 | 9.6 | * * | 9 | 81.2 99 80 0 0 88 83 | 45 | 0.828 | 5.9 | * * +---+------------------------------+----+-------+-------+ * * * * Abbreviations as before. * * * * The above table gives the cumulative results, e.g. 45.8% of all * * residues have a reliability of at least 4. The correlation for this * * most reliably predicted half of the residues is 0.686, i.e. a value * * comparable to what could be expected if homology modelling were * * possible. For this subset of 45.8% of all residues, 89% of the buried * * residues are correctly predicted, and 72% of all residues predicted to * * be buried are correct. * * * *..........................................................................* * * * The following table gives the non-cumulative quantities, i.e. the * * values per reliability index range. These numbers answer the question: * * how reliable is the prediction for all residues labeled with the * * particular index i. * * * * +---+------------------------------+----+-------+-------+ * * |RI | Q3 b%o b%p i%o i%p e%o e%p |Q10 | corr | %res | * * +---+------------------------------+----+-------+-------+ * * | 0 | 40.9 79 40 16 41 21 40 | 14 | 0.175 | 8.8 | * * | 1 | 45.4 61 46 28 44 48 44 | 17 | 0.278 | 14.1 | * * | 2 | 47.4 53 52 10 46 80 44 | 19 | 0.343 | 19.9 | * * | 3 | 52.9 75 59 4 50 77 47 | 23 | 0.439 | 11.4 | * * | 4 | 60.0 81 63 0 83 84 56 | 25 | 0.547 | 10.1 | * * | 5 | 65.2 82 70 0 0 93 62 | 28 | 0.607 | 10.9 | * * | 6 | 71.3 90 72 0 0 94 70 | 31 | 0.692 | 8.8 | * * | 7 | 76.0 94 76 0 0 95 75 | 34 | 0.762 | 6.3 | * * | 8 | 80.5 97 81 0 0 94 79 | 39 | 0.808 | 3.8 | * * | 9 | 81.2 99 80 0 0 88 83 | 45 | 0.828 | 5.9 | * * +---+------------------------------+----+-------+-------+ * * * * For example, for residues with RI = 4 83% of all predicted intermediate * * residues are correctly predicted as such. * * * * * ****************************************************************************
%H: 43.4 | %E: 17.1 | %L: 39.5 |
%A: 9.7 | %C: 2.1 | %D: 4.1 | %E: 5.6 | %F: 2.1 |
%G: 7.1 | %H: 2.1 | %I: 8.0 | %K: 5.6 | %L: 8.6 |
%M: 1.2 | %N: 4.1 | %P: 6.5 | %Q: 5.3 | %R: 2.4 |
%S: 5.9 | %T: 6.2 | %V: 10.0 | %W: 0.9 | %Y: 2.6 |
AA : | amino acid sequence | |
PHD_sec: | PHD predicted secondary structure: H=helix, E=extended (sheet), blank=other (loop) PHD = PHD: Profile network prediction HeiDelberg | |
Rel_sec: | reliability index for PHDsec prediction (0=low to 9=high) Note: for the brief presentation strong predictions marked by '*' | |
SUB_sec: | subset of the PHDsec prediction, for all residues with an expected average accuracy > 82% (tables in header) NOTE: for this subset the following symbols are used: L: is loop (for which above ' ' is used) .: means that no prediction is made for this residue, as the reliability is: Rel < 5 | |
pH_sec: | 'probability' for assigning helix (1=high, 0=low) | |
pE_sec: | 'probability' for assigning strand (1=high, 0=low) | |
pL_sec: | 'probability' for assigning neither helix, nor strand (1=high, 0=low) | |
P_3_acc: | PHD predicted relative solvent accessibility (acc) in 3 states: b = 0-9%, i = 9-36%, e = 36-100%. | |
Rel_acc: | reliability index for PHDacc prediction (0=low to 9=high) Note: for the brief presentation strong predictions marked by '*' | |
SUB_acc: | subset of the PHDacc prediction, for all residues with an expected average correlation > 0.69 (tables in header) NOTE: for this subset the following symbols are used: I: is intermediate (for which above ' ' is used) .: means that no prediction is made for this residue, as the reliability is: Rel < 4 | |
PHD_acc: | PHD predicted relative solvent accessibility (acc) in 10 states: a value of n (=0-9) corresponds to a relative acc. of between n*n % and (n+1)*(n+1) % (e.g. for n=5: 16-25%). |
PHD results (brief)....,....1....,....2....,....3....,....4....,....5....,....6....,....7....,....8....,....9....,....10...,....11...,....12...,....13...,....14...,....15...,....16...,....17...,....18...,....19...,....20...,....21...,....22...,....23...,....24...,....25...,....26...,....27...,....28...,....29...,....30...,....31...,....32...,....33...,....34 AA MCAQYCISFADVEKAHINIQDSIHLTPVLTSSILNQIAGRNLFFKCELFQKTGSFKIRGALNAIRGLIPDTPEEKPKAVVTHSSGNHGQALTYAAKLEGIPAYIVVPQTAPNCKKLAIQAYGASIVYCDPSDESREKVTQRIMQETEGILVHPNQEPAVIAGQGTIALEVLNQVPLVDALVVPVGGGGMVAGIAITIKALKPSVKVYAAEPSNADDCYQSKLKGELTPNLHPPETIADGVKSSIGLNTWPIIRDLVDDVFTVTEDEIKYATQLVWGRMKLLIEPTAGVALAAVLSQHFQTVSPEVKNVCIVLSGGNVDLTSLNWVGQAERPAPYQTVSV PHD_sec HHHHHHHHHHHHHH HHHHH EEEEEE EE HHHHHHHH HHH EEEEE HHHHHHHHHHHH EEEEE HHHHHHHHHHH EEEEE HHHHHHHHHHHHH EEEE EE HHHHHHHHHH EEEEE HHHHHHHHHHHHHH EEEEE HHHHHHHHH E EE E HHHHHHHHHH EEEE HHHHHHHHHHHHH EEEE HHHHHHHHHHHH EEEEEE HHHHHHHHHHHHH E Rel_sec ***** * *********** **** **** *** **** ****** ************ ******* * *********** ** **** *** ********* ** **** *** *********** ****** ***** ******** ******** * ************* *** **** ************ *** ** * ********* ***** * ********** * ** * ********** * ***** ***** *** ************ *** * P_3_acc eeeebebbbeebeebeeebeebbeebbbeebeebbeebebebbbeeeeeeeb bbebbbbbbbbeeb eeeeeeeeebbbbbbbbbbbbbbbbbbeebbbebbbbbbee bebebebbee bbebbbbbee eebeeebeebeeeee ebbbbbe bbbbbbbbbbbbebbeebeebebbbbbbbbbbbbbbbbbbbeeb eebebbbb bebbbbbebbbeeeeebbebeebebbbebbbbebbeebbebbeebbeebbbbbeeebbbbbeebbeebebbbebbbbbbbbbbeeeeeeeeeeeeeebbbbbbbbbbebbebeebeeebeeeeeeebbb Rel_acc * * * * *** ** * ** * ******* * ****** * * *** * ** * ** * ** * ***** ******** * * **** * * ** ** * * ** * ** * ** ******* ****** * * *
PHD results (normal) ....,....1....,....2....,....3....,....4....,....5....,....6....,....7....,....8....,....9....,....10...,....11...,....12...,....13...,....14...,....15...,....16...,....17...,....18...,....19...,....20...,....21...,....22...,....23...,....24...,....25...,....26...,....27...,....28...,....29...,....30...,....31...,....32...,....33...,....34 AA MCAQYCISFADVEKAHINIQDSIHLTPVLTSSILNQIAGRNLFFKCELFQKTGSFKIRGALNAIRGLIPDTPEEKPKAVVTHSSGNHGQALTYAAKLEGIPAYIVVPQTAPNCKKLAIQAYGASIVYCDPSDESREKVTQRIMQETEGILVHPNQEPAVIAGQGTIALEVLNQVPLVDALVVPVGGGGMVAGIAITIKALKPSVKVYAAEPSNADDCYQSKLKGELTPNLHPPETIADGVKSSIGLNTWPIIRDLVDDVFTVTEDEIKYATQLVWGRMKLLIEPTAGVALAAVLSQHFQTVSPEVKNVCIVLSGGNVDLTSLNWVGQAERPAPYQTVSV PHD_sec HHHHHHHHHHHHHH HHHHH EEEEEE EE HHHHHHHH HHH EEEEE HHHHHHHHHHHH EEEEE HHHHHHHHHHH EEEEE HHHHHHHHHHHHH EEEE EE HHHHHHHHHH EEEEE HHHHHHHHHHHHHH EEEEE HHHHHHHHH E EE E HHHHHHHHHH EEEE HHHHHHHHHHHHH EEEE HHHHHHHHHHHH EEEEEE HHHHHHHHHHHHH E Rel_sec 999854453169999987876212488764168763169549985139886563131315999985599973422686599725327899999999718813999538993787999998199189982595499999999996489655517999741103334599999983378859996484599999999999938984599832577589999998179922266421111453111325899999993495598235379999999994247375163167999999963253466788499998189834999999999985299942139 SUB_sec LLLLL..L..HHHHHHHHHHH....LLLL..HHHH..LLL.EEEE..LLLLLL......HHHHHHHLLLLL....LLLEEEE.L..HHHHHHHHHHH.LL..EEEE.LLL.HHHHHHHHH.LL.EEEE.LLL.HHHHHHHHHHH.LLLEEE.LLLLL........HHHHHHHH..LLLEEEEE.L.HHHHHHHHHHHHH.LLL.EEEE..LLLHHHHHHHHH.LLL...LL.......L......HHHHHHHHH..LLEEE..L.HHHHHHHHHH...L.EE.L..HHHHHHHHHH..L..LLLLL.EEEEE.LLL..HHHHHHHHHHHH.LLL....L P_3_acc eeeebebbbeebeebeeebeebbeebbbeebeebbeebebebbbeeeeeeeb bbebbbbbbbbeeb eeeeeeeeebbbbbbbbbbbbbbbbbbeebbbebbbbbbee bebebebbee bbebbbbbee eebeeebeebeeeee ebbbbbe bbbbbbbbbbbbebbeebeebebbbbbbbbbbbbbbbbbbbeeb eebebbbb bebbbbbebbbeeeeebbebeebebbbebbbbebbeebbebbeebbeebbbbbeeebbbbbeebbeebebbbebbbbbbbbbbeeeeeeeeeeeeeebbbbbbbbbbebbebeebeeebeeeeeeebbb Rel_acc 122101202001125112711150211021012533100117460001011117616057002502302202321034689757101608665771040318194721000301011712031155212110210211312203310002601001106625030582082212221028894633245687844361211115177770210201310502211111121120035123221551155226120110534601106126711512101064123146879651111011010302069896432020401402512003103211201 SUB_acc ..............b...b...b..........b.......bbb.........bb.b.bb...b.............bbbbbbb...b.bbbbbb..b...b.bbb...........b......bb........................b.......bb.b...bb..b.........bbbbb...bbbbbbbb.b......b.bbbb..........b................b......bb..bb..b......b.bb....b..bb..b......bb....bbbbbbb..............bbbbbb.....b..b..b..............
--- --- GLOBE: prediction of protein globularity --- --- nexp = 147 (number of predicted exposed residues) --- nfit = 139 (number of expected exposed residues --- diff = 8.00 (difference nexp-nfit) --- =====> your protein appears as compact, as a globular domain --- --- --- GLOBE: further explanations preliminaryily in: --- http://www.columbia.edu/~rost/Papers/98globe.html --- --- END of GLOBE
END of results for file predict_h8378
Links: | TOP |
|