Research on Proteins with Potentially Novel Tertiary Structures

Last Updated:

7/16/00

Research on Proteins with Novel Tertiary Structures

The following 2 proteins with potentially novel structures have been explored on this page:

Serine Racemase

Lysine Aminomutase

A list of amino acids describing the composition of the two proteins can be found at the National Center for Biotechnology Information’s Query Genbank Database. The Genbank page is a valuable tool for discovering the primary sequences for proteins.

The search for serine racemase yielded the following two different sequences:

/product="serine racemase" (SOURCE: House mouse)

(NOTE: Referred to as Mammalian Serine Racemase from hereon)

/translation=

"MCAQYCISFADVEKAHINIQDSIHLTPVLTSSILNQIAGRNLFFKCELFQKTGSFKIRGALNAIRGLIPDTPEEKPKAVVTHSSGNHGQALTYAAKLEGIPAYIVVPQT

APNCKKLAIQAYGASIVYCDPSDESREKVTQRIMQETEGILVHPNQEPAVIAGQGTIALEVLNQVPLVDALVVPVGGGGMVAGIAITIKALKPSVKVYAAEPSNADD

CYQSKLKGELTPNLHPPETIADGVKSSIGLNTWPIIRDLVDDVFTVTEDEIKYATQLVWGRMKLLIEPTAGVALAAVLSQHFQTVSPEVKNVCIVLSGGNVDLTSLN

WVGQAERPAPYQTVSV"

(339 characters)

/product="serine racemase" (SOURCE: Enterococcus faecalis)

(NOTE: Referred to as Bacterial Serine Racemase from hereon)

/translation=

"MTKNESYSGIDYFRFIAALLIVAIHTSPLFSFSETGNFIFTRIVAPVAVPFFFMTSGFFLISRYTCNAEKLGAFIKKTTLIYGVAILLYIPINVYNGYFKMDNLLPNIIKDIV

FDGTLYHLWYLPASIIGAAIAWYLVKKVHYRKAFLIASILYIIGLFGDSYYGIVKSVSCLNVFYNLIFQLTDYTRNGIFFAPIFFVLGGYISDSPNRYRKKNYIRIYSLFCL

MFGKTLTLQHFDIQKHDSMYVLLLPSVWCLFNLLLHFRGKRRTGLRTISLDQLYHSSVYDCCNTIVCAELLHLQSLLVENSLVHYIAVCFASVVLAVVITALLSSLKP

KKAKHTADTDRAYLEINLNNLEHNVNTLQKAMSPKCELMAVVKAEAYGHGMYEVTTYFEPIGVFYLAVATIDEGIRLRKYGIFSEILILGYTSPSRAKELCKFELTQT

LIDYRYLLLLNKQGYDIKAHIKIDTGMHRLGFSTEDKDKILAAFFLKHIKVAGIFTHLCAADSLEEKEVAFTNKPIGSFYKVLDWPKSSGLNIPKVNIQTSYGLWNIQS

WNVIYQSGVALYGVLRSTNDKTKLETDLRACSFLKAKVVLIRKIKQGGSVGYSRAFTATRDSLIAILPIGYADGFPRNLSCGNSYVLIGGRQAPIVGKICMDQLAVD

VTDIPNVKTGSIATLIGKDGKEEITAPMVAESAESITNELLSRMEHRLNIIRRA”

(711 characters)

The search for Lysine Aminomutase yielded 3 different results.

/product="L-lysine 2,3-aminomutase"

/translation=

"MINRRYELFKDVSDADWNDWRWQVRNRIETVEELKKYIPLTKEEEEGVAQCVKSLRMAITPYYLSLIDPNDPNDPVRKQAIPTALELNKAAADLEDPLHEDTDSPV

PGLTHRYPDRVLLLITDMCSMYCRHCTRRRFAGQSDDSMPMERIDKAIDYIRNTPQVRDVLLSGGDALLVSDETLEYIIAKLREIPHVEIVRIGSRTPVVLPQRITPE

LVNMLKKYHPVWLNTHFNHPNEITEESTRACQLLADAGVPLGNQSVLLRGVNDCVHVMKELVNKLVKIRVRPYYIYQCDLSLGLEHFRTPVSKGIEIIEGLRGHTS

GYCVPTFVVDAPGGGGKTPVMPNYVISQSHDKVILRNFEGVITTYSEPINYTPGCNCDVCTGKKKVHKVGVAGLLNGEGMALEPVGLERNKRHVQE"

(416 chars)

/product="D-lysine 5,6-aminomutase alpha subunit"

/translation=

"MESKLNLDFNLVEKARAKAKAIAIDTQEFIEKHTTVTVERAVCRLLGIDGVDTDEVPLPNIVVDHIKENNGLNLGAAMYIANAVLNTGKTPQEIAQAISAGELDLTKL

PMKDLFEVKTKALSMAKETVEKIKNNRSIRESRFEEYGDKSGPLLYVIVATGNIYEDITQAVAAAKQGADVIAVIRTTGQSLLDYVPYGATTEGFGGTYATQENFRL

MREALDKVGAEVGKYIRLCNYCSGLCMPEIAAMGAIERLDVMLNDALYGILFRDINMQRTMIDQNFSRIINGFAGVIINTGEDNYLTTADAFEEAHTVLASQFINEQ

FALLAGLPEEQMGLGHAFEMDPELKNGFLYELSQAQMAREIFPKAPLKYMPPTKFMTGNIFKGHIQDALFNMVTIMTNQRIHLLGMLTEALHTPFMSDRALSIEN

AQYIFNNMESISEEIQFKEDGLIQKRAGFVLEKANELLEEIEQLGLFDTLEKGIFGGVKRPKDGGKGLNGVVSKDENYYNPFVELMLNK"

(516 chars)

/product="D-lysine 5,6-aminomutase beta subunit"

/translation=

"MSSGLYSMEKKEFDKVLDLERVKPYGDTMNDGKVQLSFTLPLKNNERSAEAAKQIALKMGLEEPSVVMQQSLDEEFTFFVVYGNFVQSVNYNEIHVEAVNSEIL

SMEETDEYIKENIGRKIVVVGASTGTDAHTVGIDAIMNMKGYAGHYGLERYEMIDAYNLGSQVANEDFIKKAVELEADVLLVSQTVTQKNVHIQNMTHLIELLEAEG

LRDRFVLLCGGPRINNEIAKELGYDAGFGPGRFADDVATFAVKTLNDRMNS"

(262 char)

Further information on protein structure can be found at the Network Protein Sequence Analysis website.

The NPSA is an interactive Web server dedicated to protein sequence analysis.

The following types of analysis can be carried out at the NPSA site:

Primary structure analysis
Secondary structure prediction
Sequence similarity search
Sites and signatures detection
Multiple alignment

In order to use the NPSA site, one must have either a complete protein sequence (such as those listed above) or a pattern. The sequences can be input to the site using simple cut and paste commands or by using scripting languages such as perl script.

The following methods are available for secondary structure prediction at the NPSA site:

SOPM (Geourjon and Deléage, 1994)
SOPMA (Geourjon and Deléage, 1995)
HNN (Guermeur, 1997)
MLRC (Guermeur et al., 1999)
DPM (Deléage and Roux, 1987)
DSC (King and Sternberg, 1996)
GOR I (Garnier et al., 1978)
GOR III (Gibrat et al., 1987)
GOR IV (Garnier et al., 1996)
PHD (Rost and Sander, 1993)
PREDATOR (Frishman and Argos, 1996)
SIMPA96 (Levin, 1997)

A consensus prediction has been provided for the proteins discussed above via the links below. The consensus prediction pages provide an easy way to compare the results provided by the various prediction techniques. It may be noticed that while no two methods produced the exact same results for secondary structure the results are somewhat similar.

Secondary Structure Consensus Predictions for:

· Mammalian Serine Racemase

· Bacterial Serine Racemase

· L-lysine 2,3-aminomutase

· D-lysine 5,6-aminomutase alpha subunit

· D-lysine 5,6-aminomutase beta subunit

A Sequence homology search for the Mammalian Serine Racemase string was conducted using BLAST search (available at the NPSA site.) Click on the links below to view the results for the search against each individual database:

The results of the same search on Bacterial Serine Racemase have been presented below:

Below are the search results for L-lysine 2,3-aminomutase:

Further analysis including threading on Mamalian Serine Racemase and L-lysine 2,3-aminomutase was carried out using The PredictProtein Server and the 123D+ protein threading program. These servers simply require a protein sequence as input and they email back results at most within a day and often within a few minutes. To get a brief listing of what the PredictProtein Server does, click here. Another site offering 123D+ search can by accessed by clicking here. To view the list of structures 123D+ uses when performing threading, click here. The results for the above two proteins are accessible via the links below:

NOTE: While the first two Predict Protein results mainly deal with secondary structure information, the TOPITS and 123D+ search returns actual PDB’s of similar molecules. These PDB files may be downloaded and viewed with RasMol or the Chime Plug-in.

The Molecular-Graphics Viewers and Sample Images page provides information on PDB files, RasMol and Chime as well as some samples of the kinds of images that can be created using them.

GenTHREADER is another protein threading program available for use through the internet. To view the GenTHREAD results for the two proteins in question, use the links below:

Click here to access a number of different structure prediction programs by Zimmer and Lengauer. This page requires users to login but anonymous logins are accepted i.e. you may choose any ID and password to login to the program area.

Information on ROSETTA and the use of ab initio techniques to study protein folding can be found at the The Baker Laboratory Homepage.

Further information on protein structure is available on websites maintained by Dr. Christine A. Orengo and Dr. Janet M. Thornton. Links to some of these sites are provided below:

· The Biomolecular Structure and Modelling group homepage

· Structure and Modelling Group home page

· CATH protein classification home page

· Orengo, Jones & Thornton Classification of Protein Families and Domain Superfolds.

The results posted above for the proteins in questions are not regularly updated. You may however use a collection of scripts that automatically access the above sites and retrieve back prediction results. Use the Perl Scripts Page to download and run the scripts.

Click here to view a design prototype for a program that automatically generates scripts similar to those from the Perl Scripts Page given any web-based query form URL as input.

Proteins.mdb is a Microsoft Access 2000 file containing a design prototype for a database that may eventually be used to store the results obtained from the Perl scripts. It currently only contains data regarding Mammalian Serine Racemase, Baterial Serine Racemase and L-lysine 2,3-aminomutase. The contents of each database field are described in the Table design section within the file itself. “Annotation A” and “Annotation B” are two fields containing numerical values that will be manually filled in based on certain tests or filters (more details on this will be posted at a later time.) The file also contains three sample SQL queries to give a rough idea of how information will be mined from the database.. Below is a description of what each query does:

· Query 1: Returns the Name, Blast search Results, Secondary Structure Results, 123D+ Results and GenTHREADER Results for all proteins whose Annotation A fields contain a value greater than 5.

· Query 2: Returns the Name and Protein String for all proteins whose Annotation B fields is less than or equal to 10 and whose Annotation A field is greater than or equal to 1.

· Query 3: Returns the Blast search Results and Secondary Structure Results for all proteins whose Annotation A fields are equal to their Annotation B fields.

For any questions or comments, feel free to email me:

Kapil Mehra

Brandeis University