Bioinformatics & Computational Biology Bioinformatics & Computational Biology

Menu:
Iowa State University

Abstracts of

Bioinformatics and Computational Biology Research at

Iowa State University

Recent Student and Postdoctoral Fellow Abstracts:

Name Laboratory Abstract Title
Ahmed Awad Postdoctoral Staff in Nilsen-Hamilton Lab Identification of an efficient approach to identifying effective antisense nucleic acids to target a mRNA molecule
Haitao Cheng Dr. Robert Jernigan Prediction of protein secondary structure by mining structural fragment database
Pan Du Dr. Julie Dickerson and Dr. Eve Wurtele Genetic Network Inference Based on Time Series Gene Expression Profiles
Jo Etzel Dr. Julie Dickerson and Dr. Adolphs A Program to Accurately Identify Peaks in Respiration and EKG Signals for use in Psychophysiological Research
Yaping Feng Dr. Robert Jernigan Four-Body Contact Potentials derived from Two Protein Databases to Discriminate Native structures from Decoys
Ajith Gunaratne Math Department 3D Molecular Dynamics Simulation with Distance-Constrained Penalty Terms
Brent Kronmiller Dr. Roger Wise and Dr. Xun Gu Sequencing a 1.3 Mb contig spanning the rf1 fertility restorer locus as a prototype to assess complex-genome coverage strategies
Michael Lawrence Major Professor: Dr. Di Cook GeneGobi: Software for the exploratory analysis of biochemical systems
Myron Peto Dr. Robert Jernigan The application of the transfer matrix method to compact lattice conformations - cyclic conformations of larger sizes, non-cyclic conformations, and irregular conformations
Raul Piaggio-Talice Dr. Oliver Eulenstein and Dr. Drena Dobbs Evolutionary History Model Selection via Improved Phylogenetic Compression
Jeff Sander Dr. Drena Dobbs Designing C2H2 Zinc Finger Proteins to Target Specific DNA Sequences
Michael Terribilini Dr. Drena Dobbs and Dr. Vasant Honavar Computational Prediction of RNA-Binding Sites in Proteins
Peter Vedell Dr. Zhijun Wu and Dr. Robert Jernigan Multiple-shooting methods for boundary-value approaches to molecular dynamics simulation
Yan-fang Wang Postdoctoral Staff, Animal Science Dept. Differential transcript response to infection to host-specific and hostgeneralist Salmonella enterica serotypes in pigs
Zhijun Wu Faculty Member, Mathematics Multiple-shooting methods for boundary-value approaches to molecular dynamics simulation
Lei Yang Dr. Robert Jernigan and Dr. Zhijun Wu Motion Analysis of HIV-1 Protease by Anisotropic Network Model and Principle Component Analysis
Peter Zaback Dr. Vasant Honavar and Dr. Drena Dobbs Improved support vector machine prediction of protein structural features with a substitution matrix based kernel

Abstracts of Research by BCB faculty members:

Dean Adams Ecology, Evolution and Organismal Biology Ecology, Evolution, and the Nature of the Phenotype
Xun Gu Genetics, Development and Cell Biology Expression Divergence after Gene Duplication or Speciation
Roger Wise Plant Pathology Department Gene-specific regulation of innate immunity to plant disease
 


Abstract Descriptions


Dean Adams
BCB Faculty member
Department of Ecology, Environment and Organismal Biology

Title: Ecology, Evolution, and the Nature of the Phenotype

Abstract: Dr. Adams' research is motivated by two-longstanding questions concerning how ecological and evolutionary forces generate and maintain species diversity and phenotypic diversity. His work is both empirical and theoretical, where computational, mathematical, statistical and quantitative morphological methods are integrated to examine ecological hypotheses from an evolutionary perspective. His empirical research examines species interactions, community organization, and the evolution of phenotypic diversity, largely in Plethodon salamanders. These studies examine patterns of phenotypic variation within and among populations, to understand the ecological and evolutionary processes responsible for generating phenotypic diversity, and regulating community structure. In his theoretical research, Dr. Adams develops new analytical tools for examining patterns of phenotypic variation, with current emphasis on a general analytical framework for assessing multivariate patterns of phenotypic change. This approach provides unifying perspective from which researchers studying phenotypic plasticity, biomechanics, ontogeny, and quantitative genetics, can explore the mechanistic processes underlying patterns of phenotypic change.

Ahmed Awad
Postdoctoral Staff
Iowa State University
Identification of an efficient approach to identifying effective antisense nucleic acids to target a mRNA molecule

ABSTRACT

Ahmed M. Awad, Torry Cong, Xiaoling Song, Long Qu, and Marit Nilsen-Hamilton Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa 50011

The efficient identification of sequences in mRNAs that are effective targets for antisense oligodeoxynucleotides (ODNs) in vivo is an important goal. In vivo tests of antisense ODNs are time consuming and expensive. Therefore we investigated computational and in vitro methods to identify antisense sequences that are effective in vivo. The SIP24/lcn2 mRNA sequence was used for this analysis. Several computational programs were tested and found to predict a range of potential antisense sequences. As an in vitro test of antisense activity, we determined the ability of the SIP24/lcn2 mRNA to hybridize with a microarray of 20mers complementary to sequential sequences in the mRNA. Hybridization was performed under nondenaturing conditions and at salt concentrations that resemble those in the cell. From these preliminary computational and in vitro studies seven oligodeoxynucleotides (ODNs) were chosen to be tested in vivo as antisense ODNs to target the SIP24/lcn2 mRNA of the mouse HC11 cell line. Each of the three microarray-chosen ODNs decreased the SIP24 protein level as demonstrated by Western blot analysis. Also, they caused a reduction in the SIP24/lcn2 mRNA level as determined by real-time RT-PCR. These results show that the in vitro microarray analysis accurately predicted appropriate sequences for effective antisense ODNs in vivo whereas the predictions from computational analyses were more varied.


Haitao Cheng
Major Professor: Dr. Jernigan
Iowa State University
Prediction of protein secondary structure by mining structural fragment database

ABSTRACT

A new method for predicting protein secondary structure from amino acid sequence has been developed. The method is based on multiple sequence alignment of the query sequence with all other sequences with known structure from the protein data bank (PDB) by using BLAST. The fragments of the alignments belonging to proteins from the PBD are then used for further analysis. We have studied various schemes of assigning weights for matching segments and calculated normalized scores to predict one of the three secondary structures: a -helix, b -sheet, or coil. We applied several artificial intelligence techniques: decision trees (DT), neural networks (NN) and support vector machines (SVM) to improve the accuracy of predictions and found that SVM gave the best performance. Preliminary data show that combining the fragment mining approach with GOR V (Kloczkowski et al, Proteins 49 (2002) 154–166) for regions of low sequence similarity improves the prediction accuracy.


Pan Du
Major Professors: Dr. Julie Dickerson and Dr. Eve Wurtele
Iowa State University
Multi-scale Genetic Network Inference based on Time Series Gene Expression Profiles

ABSTRACT

Gene expression data are noisy and large scale. Clustering is widely used to group genes with similar pattern. The cluster centers can be used to infer the genetic networks among these clusters. This work introduces the Multi-scale Fuzzy K-means clustering algorithm to uncover groups of coregulated genes and capture the networks in different levels of detail.

Time series expression profiles provide dynamic information for inferring gene regulatory relationships. Large scale network inference, identifying the transient interactions and feedback loops as well as differentiating direct and indirect interactions are among the major challenges of genetic network inference. Pairwise time correlation can detect linear interactions between genes. Estimates of the time delay and direction of causality in the inferred network can also be made. Partial correlation and d-separation theory are combined to differentiate the direct and indirect interactions and identify feedback loops. Gene expression regulation can happen in specific time periods and conditions instead of across the whole expression profile. Short-time correlation can capture transient interactions.

The network discovery algorithm was validated using yeast cell cycle data. The algorithm successfully identified the yeast cell cycle development stages, cell cycle and negative feedback loops, and indicated how the networks dynamically changes over time. The inferred network reflects most interactions previously identified by genome-wide location analysis and matches extant literature results. The inferred network provides more detailed information about genes (or clusters) and the interactions among them. Interesting genes, clusters and interactions were identified, which match the literature and the gene ontology information and provide hypotheses for further studies.


Jo Etzel
Major Professors: Dr. Dickerson and Dr. Adolphs
Iowa State University
A Program to Accurately Identify Peaks in Respiration and EKG Signals for use in Psychophysiological Research

ABSTRACT

Statistical techniques to describe physiological responses in an experiment rely upon correct identification of each breath and heart beat. Accurate measurements are especially important when frequency analyses are performed or short recordings are used. A new program, called "puka", accurately identifies normal beats in EKG signals and the phases of each breath in recordings produced using single strain gauge chest belts. Portions of the well-validated WFDB (WaveForm DataBase) Software Package ( www.physionet.org) are used in puka to accurately obtain the time of each normal R wave. A new method of identifying the breaths and pauses in strain gauge belt recordings was developed. This technique locates the points of maximum inspiration and expiration for each breath as well as post-inspiratory and post-expiratory pauses. Analyses to validate the measurements produced by puka indicate that the program correctly locates normal R waves in EKG signals and breaths in strain gauge belt recordings. The program was tested using artificial EKG data, paced respiration recordings from healthy young subjects, and recordings from neurological patients. Puka is flexible and easy to use yet produces accurate timing measurements of breath components and heart beats, which allow more complex and complete statistical analyses. The source code and documentation is freely available on the PhysioNet archive.


Yaping Feng
Major Professor: Dr. Jernigan
Iowa State University,

Four-Body Contact Potentials derived from Two Protein Databases to Discriminate Native structures from Decoys

ABSTRACT

We have developed a new scheme to derive four-body contact potentials, primarily with the intention of devising a way to consider protein interactions as being more cooperative, but also with the intention of distinguishing between surface and buried residues. The four-body contact potentials can discriminate native structures from partially unfolded or deliberately misfolded structures. We used two protein datasets, one with resolution ≤1.5A and the other with resolution ≤2.5A, to separately derive two sets of four-body contact potentials. Surprisingly, the latter one shows better Z-score in fold recognition.


Xun Gu, Asso. Professor
Department of Genetics, Development & Cell Biology
Iowa State UniversityExpression Divergence after Gene Duplication or Speciation

Abstract: Regulatory divergence between duplicated genes in the same species, or between orthologous tissues in different species, is one of fundamental issues in comparative genomics. In this talk, I will first report our study in testing the expression pattern after gene duplication. Using yeast multi-microarrays, we showed a 10-fold increase in the initial rate for expression evolution after gene duplications. Relative rate tests suggest that the expression of duplicate genes tends to evolve asymmetrically, that is, the expression of one copy evolves rapidly, whereas the other one largely maintains the ancestral expression profile. Secondly, I will show that developmental constraints of tissues may be the primary factor to determine the extent of expression divergence between human and mouse othologous tissues. The same tissue-specific factor may also affect the evolutionary rate of expressed proteins, and the capability for duplicated genes to become tissue-specificity. Finally, I will briefly discuss the case of human-chimpanzee brain expression differences to show the role of natural selection in a small number of genes that may have the potential to change the regulatory network dramatically.


Ajith Gunaratne
Math Department
Iowa State University

3D Molecular Dynamics Simulation with Distance-Constrained Penalty Terms

ABSTRACT

Molecular dynamics simulations are important tools for understanding the physical basis of the structure and function of biological macromolecules. The molecular dynamics approach has been very successful in revealing structural and dynamical characteristics of proteins. In particular, the motions of intermolecular bond vibrations are typically the highest frequencies in the proteins. The fastest components of the potential energy field impose severe restrictions on stability. This could challenge the speed of the computational method. One possibility for treating this problem is to replace the fastest components with algebraic constraints when they are not that important. Penalty function method is widely known way of transforming a non linear constrained optimization problem into a sequence of unconstrained optimization problems by adding penalty function to the unconstrained original function. The minima in the design parameter space are depending on the scalar penalty parameter value. In the penalty method, the parameter value is gradually changed until the penalty function value approaches infinity when the constrained are violated and zero otherwise. The final local minima are then the minima for the original constrained problem. The Penalty function method for integrating the Cartesian equation of motion of protein with bond length constraints has been tested and analyzed with bovine pancreatic trypsin Inhibitor (BPTI). The BPTI is used because of its small size (58 amino acid residues), high stability and accurately determined X-ray structure. It consists of 454 atoms including four strongly bound water molecules. We find that averages and fluctuations of many properties are not significantly modified by the constraint. The Shake and Penalty function methods have strongly positively correlated system characteristics. This makes it possible to obtain threefold increase of the computational efficiency of macromolecular simulations by the application of bond-length constraints.


Brent Kronmiller
Major Professors: Dr. Wise and Dr. Gu
Iowa State University

Sequencing a 1.3 Mb contig spanning the rf1 fertility restorer locus as a prototype to assess complex-genome coverage strategies

ABSTRACT

Brent Kronmiller 1,3, Karin Werner 2,3 and Roger Wise 1,2,3 1) Bioinformatics and Computational Biology, 2) USDA-ARS Corn Insects and Crop Genetics Research; 3) Plant Pathology, Iowa State University, Ames, Iowa, USA, 50011 In T-cytoplasm maize, cytoplasmic male sterility (CMS) is attributed to the presence of the unique mitochondrial gene, T-urf13. Full suppression of T-urf13- mediated CMS is directed by the combined action of dominant alleles of the nuclear (fertility restorer) genes, rf1 and rf2a. To facilitate a candidate approach towards identification of the rf1 gene, three B73 BAC libraries were used to create a physical map of 794 clones from the centromeric region of chromosome 3 anchored to the rf1 locus. A minimum-tiling path of 14 contiguous BACs covering 1.3 megabases were shotgun sequenced, assembled and finished to completion for annotation and display in the GBrowse viewer. Eighty-seven percent has been identified as repetitive sequences, with most transposable elements found in large nested clusters spanning up to 300 kb with insertion chronologies of –0.19 to –8.50 million years. GeneSeqer, Fgenesh, and GeneMark.hmm were used to predict consensus locations and structures for 53 genes. Thirty-seven of these are positioned in gene clusters with as many as 8 members. Two hundred fifty-four GSS assemblies (including MAGIs, TIGR’s AZM and PlantGDB’s GSS) aligned to the 1.3 Mb contig, 36 of which aligned to predicted genes. Two hundred eighteen GSS assemblies aligned to regions not predicted as genes, revealing that only 15% of GSS contigs align to genes in this region. Seventeen predicted genes did not correspond to any GSS assembly indicating that at least in the centromeric region of chromosome 3, finished sequence can provide a significant number of previously undescribed gene predictions. Research funded by USDA-NRI 2002-35301-12064.


Michael Lawrence
Major Professor: Dr. Cook
Iowa State University

GeneGobi: Software for the exploratory analysis of biochemical systems

ABSTRACT

GeneGobi is a software tool for analyzing multivariate data related to biochemical systems. It provides a biologist-friendly interface to GGobi, a mature tool for exploratory multivariate visualization and analysis. It allows plotting and touring of data in high dimensions. GeneGobi facilitates the biologist's task of managing data describing the transcriptome, proteome, and metabolome, and supports integration of such data through biochemical networks. It also features pluggable analysis routines, such as hierarchical clustering and pattern finding, written in the statistical language R, upon which GeneGobi is based. The user may focus the analysis on particular experiments and stored lists of interesting transcripts, proteins, and/or metabolites.


Myron Peto
Major Professor: Dr. Jernigan
Iowa State University

The application of the transfer matrix method to compact lattice conformations - cyclic conformations of larger sizes, non-cyclic conformations, and irregular conformations

ABSTRACT

Enumerating all protein conformations in a compact lattice is a biologically important yet computationally challenging process. The challenge arises when using traditional methods (especially with larger conformations) because of attrition – many conformations end up terminating at dead ends. We’ve developed a superior method, the transfer matrix method, which is far less computationally expensive. Our method overcomes the problem of attrition by enumerating all possible conformations in a row-by-row process. In previous works we’ve applied the transfer matrix method 2-D square and 3-D rectangular matrices and analyzed the effect of HP model potentials on the average statistical properties based on all cyclic conformations on the 4 x 10 square lattice. Here, we apply our method to all cyclic conformations on the 5 x 10 square matrix, show ways of including non-cyclic conformations, and expand our method to more irregular conformations that contain sequential configurations of various m x n square and m x n x l cubic lattices. All of the above directions would be steps towards more closely modeling real proteins in the hopes of further understanding the energetics of the protein folding problem.


Raul Piaggio-Talice
Major Professors: Dr. Eulenstein and Dr. Dobbs
Iowa State University

Evolutionary History Model Selection via Improved Phylogenetic Compression

ABSTRACT

A central problem in phylogenetics is to select a hypothesis that best describes the evolutionary history leading to the sequences in a given alignment. Such a hypothesis can consist of a single tree for the whole alignment or multiple trees for different sections of it (as in the case of horizontal transfer or gene duplication). Ane and Sanderson recently proposed a method of approaching this model selection problem by using the minimum description length principle from algorithmic information theory. In this approach, the alignment is described by a two-part encoding composed of a code for a candidate hypothesis plus a code to recover the alignment given such hypothesis. The hypothesis assumed to be correct will be the one that minimizes the length of such encoding. Note that a minimum length encoding is also the best compression possible of the sequence alignment. We present a modification to the Ane and Sanderson method that results in provably shorter codes, closer to the minimum description length. The improvement is achieved by using ranking (and unranking, if the code is to be uncompressed) techniques. This is applied to code the hypothesis (tree or trees) as well as to code the sequence alignment given the hypothesis. The shorter code provides a better compression mechanism and is expected to sharpen the hypothesis decision criterion. The new method still produces (efficiently) computable codes despite the fact that finding the hypothesis that minimizes a two-part code is akin to computing the Kolmogorov complexity of the alignment, known to be uncomputable [LV93]. When tested in real-world datasets from [SDEL03], the new method shows hypothesis distinction capabilities similar to that of the original version while the compression improved on average by 26.18%, with gains that range from 8.79% to 49.02%.


Jeff Sander
Major Professor: Dobbs
Iowa State University

Designing C2H2 Zinc Finger Proteins to Target Specific DNA Sequences

ABSTRACT

Zinc fingers, the most abundant DNA binding motifs in eukaryotes, provide one of the simplest and best understood protein-DNA binding mechanisms. They promise to become valuable tools for genome modification and clinical intervention in disease because they can be used to target proteins, including nucleases and transcription factors, to virtually any desired location in any genome. Consisting of multiple modular and interchangeable nucleic acid binding domains, C2H2 zinc finger proteins provide an excellent framework for engineering new sequence-specific DNA binding proteins. Using known C2H2 zinc finger binding specificities, we have developed a program to locate candidate sequences for zinc finger binding within a given DNA sequence. In ongoing work, we are designing and experimentally testing additional zinc finger binding modules by exploiting knowledge-based approaches that incorporate information such as binding affinities, module position dependence, and DNA sequence/structural characteristics. Our short-term goal is to develop tools that can be used both to identify optimal DNA sites for targeting any desired genomic region and to predict optimal zinc finger protein sequences that recognize these sites with high affinity and specificity. Insight gained from these studies should be valuable in deciphering the "protein-DNA recognition code" that mediates gene regulation in cells.


Michael Terribilini
Major Professors: Dr. Dobbs and Dr. Honavar
Iowa State University

Computational Prediction of RNA-Binding Sites in Proteins

ABSTRACT

Protein-RNA interactions are vitally important to a wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses. The ability to reliably predict which residues of a protein directly contribute to RNA binding – without the requirement for structural information regarding either the protein or the protein-RNA complex – would significantly enhance our understanding of how proteins recognize RNA and potentially generate new strategies for clinical intervention in both genetic and infectious diseases. We have developed a machine learning approach for predicting which amino acids of an RNA-binding protein are likely to be involved in protein-RNA interactions, using only the protein sequence as input. Interfaces from known protein-RNA complexes in the PDB were extracted to generate a non-redundant set of 109 protein chains from which a total of 3581 “interface” and 21,537 “non-interface” residues were obtained. Using this dataset, a Naïve Bayes classifier was trained to predict which residues in a given RNA-binding protein are located in the protein-RNA interface. The classifier identifies interface residues with 85% overall accuracy, correlation coefficient of 0.35, specificity for interface residues of 49%, and sensitivity for interface residues of 40%. Classifiers can be calibrated to increase the specificity of interface residue prediction for specific functional classes of RNA-binding proteins. To our knowledge, this simple approach provides the best available sequence-based prediction of protein-RNA interaction sites. It should be valuable both for hypothesis-driven investigations of specific RNA-binding proteins and for discovery-based large scale functional genomics efforts.


Peter Vedell
Major Professors: Dr. Wu and Dr. Jernigan
Iowa State University

Multiple-shooting methods for boundary-value approaches to molecular dynamics simulation

ABSTRACT

Biomolecules can transition from one conformation to another. Molecular transitioning between different conformations is an important component of many normal biological processes. It also can have an detrimental in some cases. Allatom molecular dynamics simulation (AAMDS) using an empirical force field can be an informative way to study molecular transitioning. This type of simulation can be defined mathematically as an initial value problem (IVP) for ordinary differential equations. IVP-based AAMDS can be applied to study local motions near a stable conformation of a molecule, but it also can used a tool for the study of transitioning between different conformations ([Kim2003], [Zag2001]). A boundary value problem (BVP) can be defined for molecular transitioning when beginning and ending structures are known. A multiple-shooting approach for BVP-based AAMDS is introduced and applied to the study of protein folding. A force field has been developed in MATLAB using an empirical energy function based on MOIL9/AMBER99 form and parameterization. Algorithms for generating initial guesses based on elastic network interpolation ([Kim2002]) are outlined. Results from simulation of conformational transitions of alanine dipeptide are presented. The convergence behavior of the algorithm is demonstrated. The feasibility of the approach for larger systems is investigated and approaches that may address some of the computational challenges are considered.


Yan-fang Wang
Postdoctoral Staff, Animal Science Dept.
Iowa State University

Differential transcript response to infection to host-specific and hostgeneralist Salmonella enterica serotypes in pigs

ABSTRACT

Wang, Y.F.#, J. Uthe##, SMB Bearson##, L. Qu###, D. Nettleton###, C.K. Tuggle# #Department of Animal Science, ISU, ##National Animal Disease Center, USDA-ARS, Ames, IA, ###Department of Statistics, ISU Salmonellosis is prevalent worldwide and Salmonella most serotype has a broad host range. The classic salmonella of pigs is S. enterica serotype Choleraesuis (SC) and S. enterica serotype Typhimurium (ST), the latter of which can also infect humans. The former can cause septicemia, enterocolitis, pneumonia and/or hepatitis whereas the latter only results in mild enterocolitis. Understanding the porcine systemic transcriptional response to SC and ST has significance for both animal disease resistance and human food safety. Mesenteric lymph nodes tissue (MLN; n=3) was collected from uninfected controls as well as MLN from pigs infected for 48 hours or for 21 days with each S. enterica serotype. The porcine Affymetrix microarray chip, which contains 23,937 probe sets were used to hybridize with MLN RNA. At the acute phase, 48h post-infected with SC, 1014 genes showed differential expression (p<0.01; false discovery rate (FDR) =5 %; fold change > 2), while at the chronic phase (21d), 163 genes were differentially expressed (p<0.01; FDR = 13 %; fold change > 2). At 48h post-infection with ST, 126 genes showed differential expression (p<0.01 level; FDR =21.9 %; fold change > 2); for the 21d post-infection, there are 133 genes differentially expressed (p<0.01; FDR = 16 %). The host specific response at the same time course in two Salmonella infection were compared and results showed that at 48h post-infection, 33 genes were differentially expressed in both SC and ST studies, while only 6 genes showed differential expression at 21d post-infection (both p<0.01). Functional classification revealed that these differentially expressed genes are involved in the DNA, RNA and protein binding genes, immune responsible genes, ubiquitin-proteasome pathway genes, apoptosis genes and some hypothetical protein genes. Some important genes will be selected to confirm their expression profile by the Q-PCR. This data is the first step to help us to understand the mechanism of Salmonella-host specific response in pigs.


Zhijun Wu
Faculty Member, Mathematics
Iowa State University

Multiple-shooting methods for boundary-value approaches to molecular dynamics simulation

ABSTRACT

Biomolecules can transition from one conformation to another. Molecular transitioning between different conformations is an important component of many normal biological processes. It also can have an detrimental in some cases. Allatom molecular dynamics simulation (AAMDS) using an empirical force field can be an informative way to study molecular transitioning. This type of simulation can be defined mathematically as an initial value problem (IVP) for ordinary differential equations. IVP-based AAMDS can be applied to study local motions near a stable conformation of a molecule, but it also can used a tool for the study of transitioning between different conformations ([Kim2003], [Zag2001]). A boundary value problem (BVP) can be defined for molecular transitioning when beginning and ending structures are known. A multiple-shooting approach for BVP-based AAMDS is introduced and applied to the study of protein folding. A force field has been developed in MATLAB using an empirical energy function based on MOIL9/AMBER99 form and parameterization. Algorithms for generating initial guesses based on elastic network interpolation ([Kim2002]) are outlined. Results from simulation of conformational transitions of alanine dipeptide are presented. The convergence behavior of the algorithm is demonstrated. The feasibility of the approach for larger systems is investigated and approaches that may address some of the computational challenges are considered.


Lei Yang
Major Professors: Dr. Jernigan and Dr. Wu
Iowa State University

Motion Analysis of HIV-1 Protease by Anisotropic Network Model and Principle Component Analysis

ABSTRACT

Protein functions are to a great extent related to the conformational changes of protein structures. The analysis and characterization of the collective motions of proteins are very important for predicting and understanding protein functions. The HIV-1 protease is an ideal test system to study the relation between protein functions and structures due to the abundance of available crystallography structures and its relative small size. In this study, the normal modes of HIV-1 protease are obtained by the Anisotropic Network Model (ANM). On the other hand, the principle deformation modes are extracted by Principle Component Analysis (PCA). The two motion spaces are then compared by computing the overlap between each ANM mode and each principle component. A significant high overlap between the third slowest mode and the first principle component is found, indicating that most functions of this protein are realized by a specific mode. Further study of the principle mode may help us understanding how this protein functions in detail.


Peter Zaback
Major Professors: Dr. Honavar and Dr. Dobbs
Iowa State University

Improved support vector machine prediction of protein structural features with a substitution matrix based kernel

ABSTRACT

Improved support vector machine prediction of protein structural features with a substitution matrix based kernel Peter Zaback, Josh Williams, Feihong Wu, Vasant Honavar, and Drena Dobbs Bioinformatics and Computational Biology Graduate Program, Laurence H Baker Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, IA, 50011 Technological advances continue to widen the chasm between available protein sequence data and associated structural information. Because tertiary structure ultimately determines a protein’s function, it is important to develop computational methods capable of accurately predicting structural features from sequence. Recently, support vector machines (SVMs) have been applied to a variety of such classification tasks with success comparable to other state-of-the-art methods. At the heart of the SVM training method is the kernel function, which calculates a 'similarity score' between two instances. SVM training is most successful when the kernel returns a high score for a pair of instances when they are members of the same class, and a low score when they are not. Past approaches have frequently used what we call a sequence identity kernel (SIK), which scores a pair of instances based only on the number of positions at which they share the same residue. This scoring method ther!efore ignores the varying degrees of physicochemical similarity between amino acids - information that is captured in a wide variety of substitution matrices. We demonstrate that use of a substitution matrix based kernel (SMK) significantly improves accuracy and correlation coefficient in prediction of residue solvent accessibility, when compared with the SIK. Current work is directed at testing whether this approach will similarly improve predictions for other protein structural or functional features (e.g. secondary structural elements, catalytic sites), through the use of appropriate substitution matrices for specific tasks.


Dr. Roger Wise
Plant Pathology Department
Title:"Gene-specific regulation of innate immunity to plant disease"

Abstract: Gene-specific regulation of innate immunity to barley powdery mildew Rico A. Caldo1, Dan Nettleton2, and Roger P. Wise1,3 1Department of Plant Pathology and Center for Plant Responses to Environmental Stresses, 2Department of Statistics, 3Corn Insects and Crop Genetics Research, USDA-ARS, Iowa State University, Ames, IA 50011-1020 rpwise@iastate.edu Active plant defense to microbial attack is highly dependent upon recognition events involving associated gene products in the host and pathogen. To ascertain the global framework of host gene expression during biotrophic pathogen invasion, we utilized the Barley1 GeneChip to analyze the transcriptional regulation of 22,792 host genes throughout various time-course interactions among barley and the powdery mildew fungus, Blumeria graminis f. sp. hordei. 432 Barley1 GeneChips, representing 144 replicated barley-powdery mildew interactions, were used to interrogate plants containing allelic variants and mutants of Mla, Rar1, and rom1, a restorer of Mla-specified resistance. Using linear mixed model analyses, three basic patterns of expression were revealed. In the comparison of Mla1, Mla6, and Mla13 genotypes, over 50 genes exhibited highly significant patterns of up-regulation among all incompatible and compatible interactions up to 16 hai, coinciding with germination of Bgh conidiospores and formation of appressoria. By contrast, significant divergent expression was observed from 16 to 32 hai, during membrane-to-membrane contact between fungal haustoria and host epidermal cells, with notable suppression of most transcripts identified as differentially expressed in compatible interactions. In rar1-2 (susceptible) vs. rom1 (resistant) comparisons, two additional patterns of expression were observed when plants were inoculated, but were unaffected in non-inoculated plants. One set of genes are steadily down-regulated until 16 hai and then diverge significantly; the other pattern is just the opposite – defense-related genes are constitutively up-regulated in rom1 plants. By contrast, transcript levels in rar1-2 plants were initially low, but steadily increased over a period of 8 to 24 hai until they reached the same level as that of rom1 at the later stages of infection. This suggests at least two contrasting effects for rom1. One may prevent the down-regulation (or increased turnover) of one set of transcripts and the second effect is to promote constitutive overexpression (or prevent turnover) of a second set of defense-related transcripts.

URL: 
Copyright© 2005, Iowa State University, all rights reserved.
Please direct corrections, suggestions, and comments to bcb@iastate.edu.
Last Modified: