![]() |
![]() |
| Return to Home |
Prasith BaccamHome Departments: Math and ImmunobiologyMajor Professor: Dr. Cornette Title: Genetic Variation and evolution of equine infectious anemia virus rev quasispecis during long term persistent infection Abstract: Genetic variation has been observed in many viruses. Viruses that carry their genetic information in the form of RNA exhibit high mutation rates because the viral polymerase lacks proof-reading mechanisms commonly found in DNA polymerase complexes. The combination of high mutation rates, small genome size, and high replication rates results in a population of closely related viral genotypes, which are commonly referred to as a quasispecies. A consequence of the genetic variation in viruses is possible variation in viral phenotype of the quasispecies population. Furthermore, changes in viral phenotype may be a biologically important factor in progression of disease. Here, we undertook a longitudinal study to describe the quasispecies nature and genetic variation in a lentivirus regulatory protein, Rev, during the course of disease in a pony experimentally infected with equine infections anemia virus (EIAV). This study examined rev variants that comprised the quasispecies population in sequential sera samples. Over the course of disease, there was continual appearance of novel rev variants, with some variants growing in frequency to predominate certain time points. Phylogenetic and cluster analyses suggested that the Rev quasispecies was comprised of two distinct populations that co-existed during infection. These two quasispecies populations differed in their pattern of evolution, with one population accumulating changes in a linear, time-dependent manner, while the other population evolved radially from a common variant. Changes in the population size of the two Rev quasispecies coincided with changes in the clinical stages of disease. Rev variants from each population were biologically tested, and significant differences in Rev activity were detected between the two populations. Together, these results suggested that the distinct Rev populations differed in selective advantage. A statistical correlation was found between Rev quasispecies activity differed significantly between different stages of clinical disease. This study suggests that distinct quasispecies populations, which differed in patter of evolution and niche advantage, co-existed during long term persistent infection by EIAV. A multi-population quasispecies model challenges our current thinking of viral populations and may have significant biological implications. Kara ButterworthHome Department: BotanyMajor Professor: Dr. Jonathan WendelCo-Major Professor: Dr. Dean Adams Title: Initiation and early development of fibers in wild and cultivated cotton Abstract: Gossypium (Malvaceae) is a diverse genus best known for cultivated cotton. It includes about 50 species, 45 diploid and 5 allopolyploid, which occur in arid and semi-arid regions throughout the world (Vollesen, 1987; Fryxell, 1992). The diploids are divided into eight genome groups based on chromosome pairing and size, and fertility between species (Endrizzi, Turcotte, and Kohel, 1985). These groups comprise natural lineages within the genus and correspond to geographic locations: A, B, E, F- Africa and Arabia; C, G, K- Australia; and D- New World. Allopolyploid members are founds in the New World and contain the A and D genomes (Wendel, 1995; Wendel et al., 1998; Brubaker, Bourland, and Wendel, 1999; Percival, Wendel, and Stewart, 1999; Cronn et al., 2002). This understanding of the evolutionary history of the genus allows many aspects of evolutionary differences in development and morphology to be studied in a phylogenetic context. Feng CuiHome Department: Mathematics
Major Professor: Dr. Zhijun Wu Title: Distance-based NMR Structure determination and refinement Abstract: X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy are two widely used experimental techniques for protein structure determination. In protein Data Bank (PDB), about 85% of deposited protein structures are determined by X-ray crystallography. The rest of the structures are determined by NMR spectroscopy. The main difference between these two approaches lies in the state of protein samples to which they are applied: for X-ray crystallography, a protein has to be in the crystalline state while in NMR, it may be in the solution state. Both approaches have their own pros and cons. For example, X-ray crystallography is a mature technique capable of providing more objective interpretation of data. This approach has various quality indicators such as resolution and R-factor to assess the structures. It can be applied to large molecules, e.g., virus particles, and produce a single model that is easy to visualize and interpret. Raw data processing is highly automatic. In contrast, NMR is a relatively new technique and provides more subjective interpretation of the data. It lacks established quality indicators of data and models. In addition, it is limited to determination of relatively small proteins (<20kDa) and produces an ensemble of possible structures rather than one model. Data sometimes have to be manually processed. On the other hand, a protein has to form stable crystals for X-ray analysis, which could be time-consuming and often impossible. The crystalline state is not a natural and physiological environment for the protein either. In addition, X-ray crystallography is less useful for large flexible modular proteins. In contrast, the solution state of a protein is closer to biological conditions and relatively easy to prepare. NMR can provide information on dynamics and identify individual side-chain motion, often used to monitor conformational change on ligand binding. With the pros and cons, both approaches have undergone dramatic development during the past five years, especially for NMR. Advances in data collection, spectra assignment and analysis, structure calculation and computer graphics bring no barrier among NMR spectra assignment process, NMR structure assessment and visualization. Many quality indicators such as bond length, angle and NOE violations (inter-atomic distances that lie outside of NOE ranges) have been developed and used for quality assessment of NMR structures. Novel refinement schemes aimed at increasing the accuracy of the resulting structures have been proposed and tested. As a result, nowadays, proteins in size up to 30 kDa (about 260 residues) are routinely accessible by NMR spectroscopy with increased resolution, equivalent to approximately 2.5-A resolution crystal structures.
Garrett DancikHome Department: Statistics Major Professor: Dr. Karin Dorman Title: Exploring host-pathogen relationships through computer simulations of intracellular infection Abstract: Computer simulations of infectious disease allow for the identification and estimation of important pathogen and immune parameters, the validation of theoretical biological models with experimental data, and the characterization of the host-pathogen interactions that lead to emergent and sometimes counterintuitive behavior. This dissertation describes the development, analysis, and calibration of a computer model of Leishmania major infection, the identification of correlates of escape mutant success and optimal escape strategies in a computer model of a viral infection, and statistical software to aid in computer model analysis and calibration. Lixia DiaoHome Department: Computer Science Major Professor: Dr. David Fernandez-Baca Title: Consensus properties of supertree construction methods Abstract: The combination of a set of rooted perfect phylogenetic trees on overlapping leaf sets into one supertree is important and fundamental for evolutionary biology. In this thesis, we will present three supertree techniques – MRP, MRF, MinCutSupertree – and compare the consensus properties of MRP and MRF with some consensus tree criteria.
Jing DingHome Department: Electrical and Computer EngineeringMajor Professor: Dr. Dan Berleant Title: BOW-Based vs. Concept-Based Text Clustering for Functional Analysis of Genes Abstract: The rapid development in genomic technologies (e.g. microarray) has enabled biologists to simultaneously monitor expression of hundreds or even thousands of genes in a single experiment. To interpret the biological meaning of the expression patterns, it still largely relies on biologists domain knowledge, as well as collected information from literature and/or various public databases. Individual experts domain knowledge is insufficient for large datasets, and manually collecting and analyzing information from literature and/or public databases are tedious and time-consuming. Computer-aided functional analyzing tools are highly desirable. We developed GeneNarrator, a text-mining system for functional analysis of microarray data. Given a list of genes, GeneNarrator collects functional information (MEDLINE citations) from PubMed, and clusters the citations into functional topics. The genes are then mapped to the topics and clustered into groups based on their similarities in topic distribution. Pan DuHome Department: Electrical and Computer EngineeringMajor Professor: Dr. Julie Dickerson Title: Multi-scale Genetic Network Inference based on Time Series Gene Expression Profiles Abstract: This work integrates multi-scale clustering and short-time correlation to estimate genetic regulatory networks with different time resolutions and detail levels. Gene expression data are noisy and large scale. Clustering is widely used to group genes with similar pattern. The cluster centers can be used to infer the genetic networks among these clusters. This work introduces the Multi-scale Fuzzy K-means clustering algorithm to uncover groups of coregulated genes and capture the networks in different levels of detail.Time series expression profiles provide dynamic information for inferring gene regulatory relationships. Large scale network inference, identifying the transient interactions and feedback loops as well as differentiating direct and indirect interactions are among the major challenges of genetic network inference. Pairwise time correlation can detect linear interactions between genes. Estimates of the time delay and direction of causality in the inferred network can also be made. Partial correlation and d-separation theory are combined to differentiate the direct and indirect interactions and identify feedback loops. Gene expression regulation can happen in specific time periods and conditions instead of across the whole expression profile. Short-time correlation can capture transient interactions. The network discovery algorithm was validated using yeast cell cycle data. The algorithm successfully identified the yeast cell cycle development stages, cell cycle and negative feedback loops, and indicated how the networks dynamically changes over time. The inferred network reflects most interactions previously identified by genome-wide location analysis and matches extant literature results. The inferred network provides more detailed information about genes (or clusters) and the interactions among them. Interesting genes, clusters and interactions were identified, which match the literature and the gene ontology information and provide hypotheses for further studies. Tyra DunnHome Department: Genetics, Development & Cell Biology Major Professor: Dr. Xun Gu Title: Genomic differences between humans and primates Abstract: Scientists around the world have wondered for many years what distinguishes speciation. Of particular interest is the genetic basis for human/primate (chimpanzee or gorilla) separation. Humans and chimpanzees are 99% identical in their genomic DNA sequence, thus making them very closely related. Despite this high degree of sequence similarity, humans and primates have a number of striking phenotypic differences. We hypothesize that sequence changes that have occurred between humans and primates have altered developmental programs. Because transcription factors alter the expression of numerous genes, we also hypothesize that changes in the expression or activity of transcription factors are responsible for the different phenotypic traits among humans and primates. Using human chromosome 22 as a model for comparison between human and primate DNA, a random selection of noncoding genes approximately 1-2 kilobases (kb) long upstream was sequenced. Focused on promoter regions from the sequence data, significant differences were detected when comparing humans and gorillas (p-value= < 0.01) and gorillas and chimpanzees (p-value= <0.01) suggesting that limited similarities existed between the species. When comparing humans and chimpanzees (p-value= >0.1), no significant difference was detected. Using this information, transcription factors were analyzed between the human and chimpanzee data to determine if transcription regulation was different between the species. The results indicated no significant difference between humans and chimpanzees at the single-nucleotide level even though the species differ at the genetic and phenotypic levels. The results also indicated that changes in transcription regulation have played a major role in determining speciation. This research opens new avenues in investigating how many of the differences have functional consequences and the relative contributions of these transcription factors to expression differences. Tyra DunnHome Department: Genetics, Development & Cell Biology Major Professor: Dr. Heather Greenlee Thesis Presentation: June 12, 2007 Title: Characterizing and Influencing Differentiation Of Retinal Progenitor Cells Scott EmrichHome Department: Electrical and Computer EngineeringMajor Professor: Dr. Srinivas Aluru Title: Assembly and Analysis of Complex Plant Genomes Presentation: June 8, 2007 Abstract: Concurrent advances in high-throughput sequencing and assembly have led to the completion of many complex genomes. Even so, these assemblies require substantial computational resources. In this dissertation, we present a massively parallel approach that scales to thousands of processors without duplicating the biological expertise present in conventional assembly software. Additional bioinformatics techniques were required to accurately assemble the maize genome including novel repeat detection, and the resulting framework has been strongly supported by maize experimental data. More recently, this framework has been generalized for fruit fly, sorghum, soybean and environmental sequence assemblies. Questions in plant genome analysis were also addressed. For example, we have discovered an estimated 350 “orphan” maize genes and have shown that approximately 1% of all maize genes were recently duplicated, many of which into at least two functional copies. LCM-454 sequencing is introduced and analyses that indicate this approach can discover rare, potentially tissue-specific transcripts and thousands of SNPs will be presented. This dissertation combines high performance computing, computational biology and high-throughput sequencing for our ongoing work on the maize genome project. We conclude by describing how these contributions can be useful for any species, including non-model organisms that are unlikely to be fully sequenced. Joset EtzelHome Department: Electrical and Computer EngineeringMajor Professor: Dr. Julie Dickerson Title: Algorithms and Procedures to Analyze Physiological Signals in Psychophysiological Research Abstract: This dissertation presents analytical techniques which allow more information to be derived from psychophysiological data than otherwise possible. The techniques include an implemented algorithm for chest strain-gauge respiration signal analysis and a permutation testing method for evaluating changes over time in physiological signals. These methods are applied to three data sets, each examining physiological correlates of emotional experience. In the first study physiological correlates of moods induced using music were identified, although respiration entrainment confounds the issue of whether mood or the music caused the observed patterns. The second study examined physiological responses while subjects watched an emotional movie under three conditions; changes relating both to the movie scenes and condition were identified. Finally, the third study evaluates short term changes in heart rate while viewing words in terms of the type of word viewed and later word recall. Fang FangHome Department: StatisticsMajor Professor: Dr. Karin Dorman Title: Virus Recombination: Modeling and Data Analysis Abstract: As a key evolutionary process, recombination shapes the genetic structure of virus populations. The dramatic increase of virus full-length sequences provides a chance to study virus recombination through molecular data. Many statistical methods have been developed, and a lot of the methods are phylogenetic-based. My research focuses on recombination modeling and data analysis. I first apply an existing phylogenetic-base method, Bayesian dual change-point model (DMCP), to investigate the role of representative data types for recombination study. We conclude that consensus data is overall the best data type to represent virus genotypes. Using consensus data we studied recombination on all full-length hepatitis B virus (HBV) sequences, and set up a system for using DMCP model for large scale sequence analysis. We discovered that HBV has extremly high recombination rate. For the first time we reported circulating recombination forms of hepatitis B virus, and identified one potential recombination hotspot. One important goal of studying recombination is to find potential recombination hotspot, and to reveal the recombination molecular mechanism. This goal requires identification of all recombinants generated by different recombination events,which is not trivial when recombination sequences have similar mosaic structures. Extending the DMCP model, I developed a metnod to identify the number of recombination event producing multiple recombinants. I apply this method to several HBV recombinants that have identical mosaic structure and find at least two recombinant events. Jianmin FengHome Department: Genetics, Development & Cell Biology Major Professor: Dr. Volker Brendel Title: A new approach for discovering protein motifs Abstract: Motif recognition is a powerful homology based sequence analysis tool for clustering new protein sequences into different families based on characteristic motifs. Compared to BLAST, these approaches typically have lower false positive rates and can reveal more remotely related family members. However, the current motif databases do not cover all the sequences in protein sequence databases. One of the major reasons for the low coverage of motif databases is that there is only a small set of known member sequences available for constructing protein motifs for many gene families. I have designed a new algorithm, “mFISHER”, to detect protein motifs from only 2-5 known member sequences by artificial evolution of given sequences based on a position specific PAM evolution model. Based on my test results on 160 motif families, the overall average recall rate or sensitivity (true/(true + false negative)) and specificity (true/(true + false positive)) are 88% and 95%, respectively. Compared with MEME (Multiple EM for Motif Extraction), mFISHER is better based on the recall rate, especially when only 2 or 3 members are available. Both approaches have the similar sensitivity. MFISHER is promising for constructing protein motifs when only a few known members. Xiang GaoHome Department: Genetics, Development & Cell Biology Major Professor: Dr. Daniel Voytas Title: Studying the replication mechanism of the yeast retrotransposon Ty5 by molecular and computational approaches Abstract: The yeast retrotransposon Ty5 is a Ty1/copia element. Officially, it is in the Hemivirus genus of the Pseudoviridae family. The ability to genetically manipulate retrotransposons and the yeast host cell was taken advantage of to explore replication mechanisms unique to Ty5 and common to most retrotransposons. Because of the abundance and diversity of retroelement sequences, along with the fact that many retroelement enzymes have evolved unique functional specificities, computational approaches were also developed to study functional divergence in replication. By screening a randomly mutagenized Ty5 library, two mutations (Y68C, D252N) that caused higher transposition frequencies were identified. Both mutations increased Ty5 cDNA levels, but did not have dramatic effects on the steps after cDNA synthesis (i.e. integration and recombination), or protein synthesis, processing, or solubility. The D252N mutation increased the hydrogen bonding potential of the CCHC zinc finger of nucleocapsid protein (NCp), making the Ty5 NCp zinc finger more like Ty1/copia consensus zinc fingers in terms of hydrogen bonding potential. Other mutations that increased the hydrogen bonding potential (D252R, D252K) provided the same fold increase in Ty5 reverse transcription, and natural occurring mutations in the Ty5 zinc finger repress this function. Hydrogen bonding is suggested to be a universal requirement for the function of retroviral type zinc fingers and cellular zinc fingers. A half-tRNA priming mechanism for Ty5 reverse transcription was also demonstrated. Mutations in the anticodon of tRNA (IMT) and the putative PBS of Ty5 decreased transposistion, but transposistion was restored when complementarity between the IMT and PBS was restored. A tree-based method and supplemental Split Tester software were developed to study the functional divergence of reverse transcriptase (RT) with respect to half-tRNA and full-tRNA priming mechanisms. The domains identified by this computational approach were previously experimentally demonstrated to bind with the tRNA primer/template in HIV RT. Using this software, another domain related to integrase functional specificity, namely whether or not integrase carries out 3’-end processing during integration, was also consistently identified in different integrase datasets. A model describing this functional divergence is proposed. Zhong GaoHome Department: Computer ScienceMajor Professor: Dr. Vasant Honavar Title: Genome wide recognition of Tumor Necrosis Factor (TNF) related ligands in human and Arabidopsis genomes: A structural genomics approach Abstract: Tumor necrosis factors (TNFs) play a crucial role in mammalian signal transduction pathways for cell proliferation, survival, and differentiation. Human and other species (such as Arabidopsis) genome sequencing projects provide a unique opportunity for genome-wide recognition of TNF related ligand proteins and discovery of potential TNF-TNFR signal transduction mechanism in plants. Genome-wide recognition of TNF related proteins in human and Arabidopsis was carried out using secondary structure prediction and protein fold recognition. In the protein fold recognition scheme, sequence-structure models are evaluated using contact energy score based on Miyazawa-Jernigan and Li-Tan-Wingreen models. Secondary structure composition based initial screening not only reduces search space of protein fold recognition but also shifts the score distribution of the selected candidates to a higher score region. In order to investigate influence of sequence length on threading results, protein fold recognition was conducted on human and Arabidopsis genome sequences of different length. The test on known TNFs from diverse species indicates that about 83% of TNFs are able to be identified; the test on human genome sequences shows that about 80% of known TNFs can be recognized. Integration of secondary structure profiling into the scheme can improve performance by adjusting local sequence-structure relationship. However, this improvement largely depends on accuracy of secondary structure prediction. Average scoring performs better than maximal scoring in model evaluation and selection. Pattern classification algorithms such as decision tree, neural network, Naïve Bayes classifier, and support vector machine are applied to discriminate TNF related proteins from the competitive false positives which have similar secondary structure composition to known TNFs and also have high fold recognition scores. Both known TNF and false positive sequences are represented with the twenty q values corresponding to twenty amino acids in Li-Tan-Wingreen model. Cross-validation results show that Naïve Bayes classifier performs better than SVM, neural network, and decision tree, and Naïve Bayes classifier is suitable for stringent control of false positive. This genome-wide search scheme was used to search potential TNF-like signal proteins in Arabidopsis genome. Possible role of candidates in human and Arabidopsis genomes is discussed. These results demonstrate that structure based methods can facilitate functional prediction in a genome scale. Aspen GarryHome Department: Ecology, Evolution, & Organismal BiologyMajor Professor: Dr. Dean Adams Title: Geometric Morphometric analysis of shark teeth of the genus Rhizoprionodon: The modern, the ancient, and the hypothetical. Modern tooth shape analysis and test of ancestory prediction methods by comparison to fossil shapes Abstract: Shark teeth are extremely common in the fossil record, and they can potentially provide insight into the evolutionary history of sharks. However, isolated fossil teeth are difficult to assign to the correct jaw, position, and taxon without organismal context because individual sharks exhibit a variety of tooth shapes. Tooth shape varies across jaws, positions within each jaw, and taxa. Fortunately, tooth shape is quantifiable, and shapes can be compared using the techniques of geometric morphometrics, which measure shape and its covariation with other variables. Analysis of modern tooth shapes was performed in order to gain understanding of patterns of modern tooth shape variation. These results could then be applied to fossils to provide better identification of fossils in order to make use of sharks’ extensive fossil record. To quantify modern patterns of tooth shape variation, teeth of five Rhizoprionodon species and representative of three closely related genera (Loxodon,Eusphyra, and Sphyrna) were quantified and analyzed using geometric morphometric methods. Ancestral tooth shapes were estimated using the modern shape data mapped onto a phylogeny created using molecular data, and a Brownian motion model of evolution. These shapes were compared to fossil teeth from Rhizoprionodon sp. and Sphyrna spp. to evaluate the accuracy of the estimated ancestral shapes. Modern teeth at the front of the jaw displayed the most dramatic shape differences between jaws and positions. Teeth from each genus could be distinguished, but species within Rhizoprionodon could not. Fossil tooth shapes most closely resembled those of modern teeth, indicating that tooth shape did not change according to the Brownian motion model used to predict ancestral shapes. Jianying GuHome Department: Genetics, Development & Cell Biology Major Professor: Dr. Xun Gu Title: Functional divergence and genome evolution of vertebrate protein kinases Abstract: The emerging complete and nearly complete genome sequences have provided a significant amount of materials for large-scale comparative genomic analysis. Novel methods have been developed to elucidate the function of gene products and functional interacting networks. Many of these post-genomic attempts have focused on unveiling the evolutionary forces that have shaped the network organization. Among various evolutionary forces, duplication of functional domain, individual gene, chromosomal segment, or entire genome has long been thought as primary resource for the function novelties in a vast number of gene families. It is therefore intriguing to quantitatively trace the changes of evolutionary constraints after a duplication event. This study is focused on the exploitation of the functional divergence and evolutionary patterns in vertebrate kinase complements (denoted as kinomes) and kinase-regulated signaling transduction pathways, using a combinatorial statistical and evolutionary approach. The analysis of an individual kinase gene family (Jak), protein tyrosine kinase superfamily, and a kinase mediated signaling transduction pathway (TGF- b ) showed that functional divergence (altered functional constraint) after (domain or gene) duplication is a general pattern. Moreover, the age distribution of the vertebrate kinomes showed that (1) The major kinase-related animal specific signal-transduction pathways have been generated through an ancient continuous domain shuffling (or duplications) during the time period from early stage of eukaryotes to metazoan evolution; (2) Vertebrate tissue-specificity of signal-transduction is facilitated by large-scale duplication event(s) in the early stage of vertebrates; and (3) The kinase pseudogenes are generated through either segmental duplication or retrotransposition very recently. Home Department: Genetics, Development & Cell BiologyMajor Professor: Dr. Patrick Schnable Title: Adaption of Multiclustering to the Analysis of Microarray Data Presentation Date: Thursday, May 10, 2007 Abstract: Clustering has become an integral part of microarray data analysis and interpretation. It is helpful to reduce the scale of information generated by microarray experiment to the level that biologists can generate hypothesis. There is a danger that artifacts induced by clustering methods can cause misinterpretation of the data. Clustering method that can accurately capture the natural structure of the data would be a useful tool for biologists to discovery the biological meaning buried in the data. To this end, a new clustering algorithm, called K-means multiclustering, is introduced. The method can avoid the artifacts induced by distance or similarity metrics by amalgamating the results of many K-means clusterings. Results: The multiclustering algorithm is a model-free clustering method. It is found to be reliable and consist in capturing the underlying data structure with high accuracy that is competitive with model based clustering and superior to other methods on synthetic micorarry data generated in a manner consistent with the hypothesis of model based clustering. The algorithm has a high level of immunity to artifacts introduced by the metric used to measure the distance between data points. It can successfully cluster data sets which are designed to have different shapes and variation and cannot be correctly clustered by traditional clustering method. The cut plot computed by this method is a very simple and useful summary of the data structure. A detailed view of the formation of clustering can also be generated by the method to reveal the underlying hierarchical structure of data set. Home Department: Genetics, Development & Cell BiologyMajor Professor: Dr. Daniel Voytas Title: Characterization of the Sireviruses: A unique group of Ty1/copia LTR retrotransposons in plants Abstract: Plant genomes have allowed the expansion of many types of mobile genetic elements. LTR retrotransposons are a subclass of mobile genetic elements that replicate using an RNA intermediate. The Pseudoviridae (Ty1/copia) are a family of LTR retrotransposons, and the Sireviruses are one of three genera in the Pseudoviridae. The Sireviruses have features that set them apart from classical retrotransposons. Different members of the Sireviruses show great variability in their genomic structures and the translational tricks they use to express their encoded proteins. For example, we have shown that the SIRE1 elements of soybean use stop codon suppression to express their Env-like protein. Secondly, some monocot members of the Sireviruses may use a bypass mechanism to translate Pol. Home Department: Biochemistry, Biophysics and Molecular BiologyMajor Professor: Dr. Mark Hargrove Title: Structural Characterization of Ligand Binding in Hexacoordinate Hemoglobins Presentation: Thursday, August 17, 2006 Abstract: The goal of biophysics is to study the structures of the components of living organisms and to understand the mechanics of the processes of life. Hemoglobin is a well suited model for this study. As an essential component of the life blood of mammals, and easy to obtain in large quantities, hemoglobin and its monomeric partner myoglobin are two of the most well studied and characterized components of life. Yet hemoglobin studies continue to reveal new forms of hemoglobin, raising new questions, functional possibilities, and research opportunities. My research focuses on hemoglobins classified as hexacoordinate. I have focused particularly on the structural characterization of these proteins upon ligand binding. Included below for your benefit are a list of abbreviations and terms used in my talk along with their definitions. Hbs -- hemoglobins
List of publications: Hoy, J. A., Kundu, S., Trent, J. T., 3rd, Ramaswamy, S., and Hargrove, M. S. (2004). The crystal structure of Synechocystis hemoglobin with a covalent heme linkage. J Biol Chem. 279, 16535-16542. Trent, J. T., 3rd, Kundu, S., Hoy, J. A., and Hargrove, M. S. (2004). Crystallographic analysis of synechocystis cyanoglobin reveals the structural changes accompanying ligand binding in a hexacoordinate hemoglobin. J Mol Biol. 341, 1097-1108. Smagghe, B. J., Kundu, S., Hoy, J. A., Halder, P., Weiland, T. R., Savage, A., Venugopal, A., Goodman, M., Premer, S., Hargrove, M. S. (2006). Role of Phenylalanine B10 in Plant Nonsymbiotic Hemoglobins. Biochemistry Aug 15;45(32):9735-9745. Hoy, J. A., Smagghe, B. J., Halder, P., Hargrove, M. S. (2006). Covalent heme attachement in Synechocystis hemoglobin is required to prevent ferrous heme dissociation. Manuscript in preparation. Hoy, J. A., Robinson, H., Trent, J. T., Kakar, S., Smagghe, B. J., Hargrove, M. S. (2006). Crystal structure of a nonsymbiotic plant hemoglobin; implications for the evolution of oxygen transport. Manuscript in preparation. Bio: BA in Physics and BA in Humanities from Wartburg College, Waverly, Iowa 1996 MS in Physics from Iowa State University, 1999 Temporary Instructor of Physics, ISU, 1999 - 2000 PhD studies in Biophysics, ISU, 2000 - 2006 Postdoc in Hargrove Lab LaRon Hughes - M.S.Home Department: Genetics, Development & Cell Biology Major Professor: Dr. Karin Dorman Title: EIAV DB: A comprehensive Equine Infectious Anemia (EIAV) Virus database M.S. Abstract: A major problem in biology is the storage and retrieval of biological data in a meaningful and efficient manner. With the advent of mass sequencing projects, such as the human genome project, the need to store, retrieve, and analyze sequence data is stronger than ever before. The following thesis tackles a small part of this problem by presenting techniques, models, and applications for productively storing and retrieving a set of related viral sequences in a central data bank. The thesis begins by providing an overview of the relational database and its role in storing biological data. The main chapter of the thesis is a description of a novel relational database application (EIAV DB). EIAV DB is a central repository of Equine Infectious Anemia Virus sequence and feature information. The models and application provide insight into technologies that help alleviate the storage and retrieval problem.
LaRon Hughes - PhDHome Department: Animal Science Major Professor: Dr. Jim Reecy Title: Hypothesis building using the Animal Trait Ontology PhD Abstract: With the advent of sequencing projects in model organisms, humans, and domesticated livestock species, the need for storage, retrieval, and analysis of genomics information for these animals has become important. The Animal Trait Ontology (ATO) is an ontology that has been created to store the relationships between farm animal traits for several domesticated farm animals. The Collaborative Ontology Building (COB) editor was used to create and edit the ATO. An online ontology browser has been developed to search and browse the ontology and to view the relationships between the terms. Some of the traits in the ontology are linked to associated quantitative trait loci (QTL) information for each species through a tool called the Comparative Animal QTL (CAQ) tool which allows users to compare QTL experiments in livestock species. The tool allows QTL experiments to be compared based on 1) one trait given one species, and 2) two traits given one species. The effectiveness of the tool is recorded in the form of a data and statistical analysis which demonstrates its use in examining pleiotropic effects for traits in the pig. In addition, the Human and Animal Trait Ontology is discussed and it will form an agglomeration of several different species ontologies, including the ATO, that will form a consensus for describing phenotypes and traits across different disease models. Cizhong JiangHome Department: Genetics, Development & Cell Biology Major Professors: Dr. Thomas Peterson Title: Computational and molecular analysis of Myb gene family Abstract: Myb proteins are defined by a highly conserved DNA-specific binding domain termed Myb, which is composed of approximately 50 amino acids with constantly spaced tryptophan residues. Multiple copies of Myb domains often exist as tandem repeats within a single protein. There are up to four tandem Myb repeats present in Myb proteins identified to date (termed R0R1R2R3 hereafter). In our study, we collected additional Myb genes, and performed a series of phylogenetic analyses to explore the evolutionary origin of Myb genes. The results suggest that the Myb gene family originated from an ancient one Myb-box gene. One and two intragenic duplications produced R2R3 and R1R2R3 Myb genes, respectively, which then co-existed in the primitive eukaryotes and gave rise to the currently extant Myb genes. Based on our results, we proposed that plant R1R2R3 Myb genes were derived from R2R3 Myb genes by gain of the R1 repeat through an ancient intragenic duplication; this gain model is more parsimonious than the previous proposal that plant R2R3 Myb genes were derived from R1R2R3 Myb genes by loss of the R1 repeat. The phylogenetic analysis of isolated individual Myb repeats indicates that R2 repeat has evolved more slowly than the R1 and R3 repeats. However, it is not clear which repeat is the most ancient one. Another goal of our project is to classify and predict functions of Myb genes. We clustered the closely-related Myb genes into subgroups from Arabidopsis and rice on a basis of sequence similarity and phylogeny. The gene structure analysis revealed that both the positions and phases of introns are conserved in the same subgroup, although these differ between subgroups. Conserved motifs were detected in C-terminal coding regions within subgroups, and these motifs exist specifically in Myb genes. We also found that Myb genes with similar functions are clustered together. In contrast, no conserved regulatory elements were identified in the divergent non-coding regions. Additionally, the distribution pattern of introns in the phylogenetic tree indicates that Myb domains originally had a compact size without introns. Non-coding sequences were inserted and the splicing sites were conserved during evolution. Brent KronmillerHome Department: Plant Pathology Major Professors: Dr. Roger Wise Title: Assembly And Annotation Tools For Analysis Of Large Contiguous Regions Of The Maize Genome Abstract: LTR retrotransposons make up significant portions of many of the longer grass genomes, their repeat sequences across the genome, their terminal repeats, and their nested cluster configuration make assembly of sequence clones challenging and identification of gene regions difficult. In this thesis I provide tools necessary for both assembly and annotation of highly repetitive genomes and use these tools to construct the currently two longest maize sequence contigs. Alain LaederachHome Department: Chemical and Biological Engineering Major Professor: Dr. Peter Reilly Title: Protein-Carbohydrate and Protein-Protein interactions: Using models to better understand and predict specific molecular recognition Abstract: Any molecular recognition event results in a change in the free energy of the system. The extent of this change is related to the association constant, such that the more negative the free energy change is, the tighter the interaction between receptor and ligand. Protein-carbohydrate interactions play a critical role in signal transduction, innate immunity and metabolism. Modeling these interactions is somewhat complicated by the inherent flexibility of carbohydrates as well as their relatively large number of functional groups. An empirical scoring function for docking carbohydrates to proteins will be presented specifically tailored to predict both the correct binding orientation and free energy of binding of the carbohydrate-ligand/protein-receptor complex. This new scoring function can predict free energies of binding to within 1.1 kcal/mol residual standard error, a definite improvement over existing scoring functions which result in standard errors well over 2 kcal/mol. Application of automated docking methodology to determine carbohydrate recognition specificity of the c-type Lectin, human Surfactant Protein D will also be presented. In the second part of the thesis, the role of p-stacking interactions (e.g. between Tyr side chains) in stabilizing protein folds will be discussed. A 17-residue peptide derived from the naturally occurring anti-microbial peptide Tachyplesin I is investigated using NMR spectroscopy. NOE cross peaks were observed confirming the existence of this interaction in solution. In the final part of the thesis, a quantitative NMR investigation into the self-association behavior of the regulatory domains of several Tec family member kinases will be presented. Of particular interest, self-association within Bruton's Tyrosine Kinase (Btk) regulatory domains occurs through the formation of an asymmetric homodimer. Together this work demonstrates the importance of rigorous biophysical characterization of bio-molecular recognition events and how interdependent computational modeling and experimentation are. Michael LawrenceHome Department: Statistics Major Professor: Dr. Dianne Cook Title: Interactive graphics, graphical user interfaces and software interfaces for the analysis of biological experimental data and networks Abstract: Biologists need to analyze and comprehend increasingly large and more complex experimental data. These experimental data are multivariate, where each row corresponds to a biological entity, and each column corresponds to the level of an experimental treatment. Biological experiments often produce multiple data sets, each describing one aspect of the system, such as the transcriptome recorded by a microarray or metabolome recorded using gas chromatography mass spectrometry (GC-MS). A biochemical network model provides a conceptual system-level framework for integrating data from different sources. Effective use of graphics enhances the comprehension of data, and interactive graphics permit the analyst to actively explore data, check its integrity, satiate curiosities and reveal the unexpected. Interactive graphics have not been widely applied as a means for understanding data from biological experiments. This thesis addresses these needs by providing new methods and software that apply interactive graphics in coordination with numerical methods to the analysis of biological data, in a manner that is accessible to biologists. Nicole LeahyHome Department: Genetics, Development & Cell Biology Major Professor: Dr. Daniel Ashlock Title: Pseudophyte evolutionary algorithm: A simple computational model of parapatric speciation s Abstract: The Pseudophyte Evolutionary Algorithm (PEA) is an individual-based computer model of a population of haploid, annual plants used to examine the process of speciation in a patchy environment. The model incorporated both pre-mating and post-zygotic mechanisms for the evolution of reproductive isolation via pollen selection and offspring inviability, respectively. The PEA allows speciation as an emergent property rather than an explicit feature of the model to understand how environmental patchiness, number and arrangement of loci, and reproductive output of individuals affected the strength of isolating mechanisms as well as the rate at which these evolve. The effect of how genotypes were mapped to phenotypes was also explored to examine the sensitivity of the PEA to alternate representations.
Jae-Hyung LeeHome Department: Genetics, Development & Cell BiologyMajor Professor: Dr. Drena Dobbs Title: Analysis of protein-RNA and protein-peptide interactions in Equine Infectious Anemia Virus (EIAV) infection Abstract: Macromolecular interactions are essential for virtually all cellular functions including signal transduction processes, metabolic processes, regulation of gene expression and immune responses. This dissertation focuses on the characterization of two important macromolecular interactions involved in the relationship between Equine Infectious Anemia Virus (EIAV) and its host cell in horse: i) the interaction between the EIAV Rev protein and its binding site, the Rev-responsive element (RRE) and ii) interactions between equine MHC class I molecules and epitope peptides derived from EIAV proteins. EIAV, one of the most divergent members of the lentivirus family, has a single-stranded RNA genome and carries several regulatory and structural proteins within its viral particle. Rev is an essential EIAV regulatory encoded protein that interacts with the viral RRE, a specific binding site in the viral mRNA. Using a combination of experimental and computational methods, the interactions between EIAV Rev and RRE were characterized in detail. EIAV Rev was shown to have a bipartite RNA binding domain containing two arginine rich motifs (ARMs). The RRE secondary structure was determined and specific structural motifs that act as cis-regulatory elements for EIAV Rev-RRE interaction were identified. Interestingly, a structural motif located in the high affinity Rev binding site is well conserved in several diverse lentiviral genomes, including HIV-1. Macromolecular interactions involved in the immune response of the horse to EIAV infection were investigated by analyzing complexes between MHC class I proteins and epitope peptides derived from EIAV Rev, Env and Gag proteins. Computational modeling results provided a mechanistic explanation for the experimental finding that a single amino acid change in the peptide binding domain of the equine MHC class I molecule differentially affects the recognition of specific epitopes by EIAV-specific CTL. Together, the findings in this dissertation provide novel insights into the strategy used by EIAV to replicate itself, and provide new details about how the host cell responds to and defends against EIAV upon the infection. Moreover, they have contributed to our understanding of the macromolecular recognition events that regulate these processes. Darrin LemmerHome Department: Biochemistry, Biophysics & Molecular BiologyMajor Professor: Dr. Gloria Culver Title: CAVEMol: an immersive 3D molecule viewer Abstract: As the number of solved molecular structures deposited with the Protein Data Bank (PDB) increases, so too does the desire for more advanced ways of using this data. Traditional applications for viewing and manipulating molecular structures create a computer-generated model on a standard desktop computer screen. The display may employ some method of stereography to create the illusion of depth, but generally the user just sees a flat image. The user is able to interact with the molecule by magnifying it to get closer look at a particular area of interest, or by rotating it along an arbitrary axis, thus allowing all sides of the molecule to be seen, though only one side is in view at any given time. The user may also be able to see changes in the molecule over time whereby each conformation of the molecule is a separate frame of an animation, or they may even be able to make modifications to the structure in real time. Regardless of the amount of control the user has over the molecule, however, one thing remains the same: the user experiences the molecule as though it were an object floating behind the monitor screen which they can indirectly control using a mouse or other pointing device. This thesis presents the design and implementation of CAVEMol, a molecular visualization application for immersive environments. I will also give an overview of molecular visualization and immersive environments, and then discuss future work that can be done in this area as well as applications where molecular visualization in an immersive environment can be particularly valuable. Haining LinHome Department: Computer Science Major Professor: Dr. Xiaoqiu Huang Title: BACAP: An assembly program for heirarchial shotgun sequencing Abstract: We propose a sequence-based algorithm BACAP to assemble BAC sequences generated from hierarchical shotgun sequencing. Our approach relies on sequence similarity rather than physical mapping. It follows the “overlap-layout-consensus” framework used for shotgun sequencing data. BACAP uses heuristic methods to achieve efficiency and accuracy. It was tested on four simulated data sets of 200 BAC-size sequences each and one real data set of 228 rice BACs from TIGR. The average running time was 25 minutes on one 900 MHz IA-64 GenuineIntel Itanium machine. Our results show that BACAP can quickly and accurately accomplish some BAC assembly tasks without physical mapping information. Yuan LinHome Department: Genetics, Development & Cell Biology Major Professor: Dr. Xun Gu Title: The Relationship of Sequence Similarity and Expression Pattern Similarity between Yeast Genes within Gene Families Abstract: After gene duplication, the sequence and expression patterns of duplicated genes diverge. It is known that the function divergence of duplicated genes could be related to the divergence of both their coding sequence and expression profile mainly caused by the sequence change of regulatory region. But it is not known if the sequence divergence and expression pattern divergence are correlated. Former research by Andreas Wagner showed there is at most very weak correlation between them. On the contrary, our research shows there is a strong correlation between the sequence similarity and expression profile similarity if the sequences are quite conserved; the degree of coexpression of duplicated genes is consistent to their duplication order. Patricia LonoskyHome Department: Botany Major Professor: Dr. Steve Rodermel Title: Proteomics of the developing chloroplast in maize Abstract: Chloroplast protein expression profiles during the light-induced biogenesis of the maize plastid were determined from 2D gel analysis. During five time points of this ‘greening’ process (0,2,4,12, and 48 hours post-illumination), maize plant tissue was collected, plastids isolated, and protein precipitated and separated in two dimensions using 2D protein gels. From these proteome maps, quantities of spots were analyzed by: Principal Components Analysis, hierarchical pairwise average linkage cluster analysis, Adaptive Resonance Theory 2 cluster analysis, and Self Organizing Map cluster analysis to determine chloroplast protein expression profiles. 54 spots representing 26 proteins were identified by MALDI-TOF mass spectrometry and used to verify the protein expression profiles. Two main conclusions were drawn from this data: 1) ART2 may be a useful clustering tool for expression data, and 2) different forms or modifications of the same protein show different expression patterns. Wiesia MentzenHome Department: Genetics, Development & Cell BiologyMajor Professor: Dr. Eve Wurtele Abstract: I apply combined bioinformatic approaches using genomic and transcriptomic data to investigate the fatty acid biosynthesis pathway, at the molecular level, and in the context of the system biology of Arabidopsis. Fatty acids are essential components of all known bacterial and eukaryotic cells with critical role in cells as energy reserves and the metabolic precursors for biological membranes. The pathway for fatty acid synthesis seems to be conserved across all living systems. Acetyl-CoA carboxylase, a member of a superfamily of biotin-dependent enzymes, catalyzes the first committed step of the fatty acid biosynthesis pathway. Phylogenetic study exposed complex and intertwined evolutionary histories of this family, with multiple domain fusions and rearrangements. As revealed by meta-analysis of a wide array of Arabidopsis transcriptomic data, fatty acid biosynthesis is transcriptionally regulated, and this regulation not only extends across all pathway reactions, but also some substrate- and cofactor-producing reactions, thus defining a major transcriptionally co-regulated pathway. I extend the meta-analysis of the transcriptome to find groups of coexpressed genes (also called modules, or regulons) in the Arabidopsis genome. Major functionally-coherent gene groups were identified. These comprise development, information processing, defense, and metabolism, as well as tissue- and organelle- specific processes. Erin MyersHome Department: Ecology, Environment and Organismal Biology Major Professor: Dr. Fred Janzen Title: Post-orbital color pattern variation and the evolution of a radiation of turtles (Graptemys) One of the most deeply studied areas in the field of evolutionary biology is the formation and maintenance of new species, as well as the variation in the rate and extent to which taxa radiate. A range of evolutionary processes, from ecological adaptation to sexual selection and reinforcement, can lead to the formation of new species. However, the formation of new species likely results from several isolating mechanisms acting in concert. The map turtle complex (genus: Graptemys) is an excellent model system for exploring the nature of speciation given its exceptional species richness and high levels of morphological diversity, particularly in facial coloration patterns. This research utilizes an integrative approach to establish the role of post- orbital color patterns in species diversification and maintenance. This multi- faceted approach will incorporate aspects of phylogenetics, population and quantitative genetics, morphometrics, and behavior to assess morphological evolution within species and across the genus. The phylogeny of map turtles was characterized by a hard polytomy indicating rapid speciation. Across the genus, morphological evolution occurred in a parsimonious manner. Within species, both morphology and genetics exhibited a pattern of isolation by distance. Temperature significantly influence coloration patterns and multivariate heritability was generally low. Finally, in behavior trials, neither males nor females spent significantly more time with members of their own species. In all projects, the signatures of sexual selection or reinforcement were absent or equivocal where they would be expected if they were the main forces continuing to shape interactions among map turtle species. The results of this research indicate that role of past and on-going selection on coloration pattern within the map turtle clade has been limited, indicating that post-orbital coloration was not the driving factor in the radiation of this turtle clade. Alternative explanations for map turtle species richness are explored. Myron PetoHome Department: Biochemistry, Biophysics and Molecular Biology Major Professor: Dr. Robert Jernigan Title: Studies of Protein Designability using Reduced Models Presentation: July 9, 2007 Abstract: One the most important problems in computational structural biology is protein designability, that is, why protein sequences are not random strings of amino acids but instead show regular patterns that encode protein structures. Many previous studies that have attempted to solve the problem have relied upon reduced models of proteins. In particular, the 2D square and the 3D cubic lattices together with reduced amino acid alphabets have been examined extensively and have lead to interesting results that shed some light on evolutionary relationship among proteins. Here, additionally to the 2D square lattice, we study the 2D triangular and 3D face centered cubic (fcc) lattices, we perform designability studies using different shapes embedded in the 2D square lattice, and we use machine learning algorithms to classify binary sequences folding to highly- or poorly-designable conformations. In the first part of the thesis we extend the transfer matrix method to the 2D triangular lattice. The transfer matrix method is a highly efficient method of enumerating all conformations within a compact lattice area that has earlier been developed for the 2D square and 3D cubic lattices. In addition we also enumerated all compact conformations within simple geometries on the 2D triangular and 3D face centered cubic (fcc) lattices using a standard backtracking algorithm. In the second part of the thesis we described protein designability studies on various shapes in the 2D square lattice using a reduced hydrophobic-polar (HP) amino acid alphabet. We used a simple energy function that counted the number of H-H, H-P and P-P interactions within a restricted set of protein shapes that have the same number of residues and non-bonded contacts. We found a difference in the designabilities of different protein shapes. Finally, in the third part of the thesis we used standard machine learning algorithms to classify two classes of protein sequences. We first performed a designability study for two shapes, using a binary HP alphabet, on the 2D triangular lattice and separated highly- and poorly-designable conformations. Highly-designable conformations had many sequences folding to them with the lowest energy and poorly-designable conformations had few or no sequences folding to them. Sequences were classified as highly- or poorly-designable depending on whether they folded to highly- or poorly-designable structures. Using several machine learning algorithms such as Decision Tree, Naïve Bayes, and Support Vector Machine, we were able to classify highly- and poorly-designable sequences with high accuracy. Bradley PowersHome Department: Mathematics Major Professor: Dr. Dan Ashlock Title: The Effect of Tags on Non-Local Adaptation Abstract: This project investigates in greater depth in phenomenon of non-local adaptation previously observed in an evolutionary model based on the game iterated Prisoner’s Dilemma. Non-local adaptation is the ability of an agent or population of agents to perform well against other agents that share no common history or ancestry with them. Populations of agents both with and without identifying tags are evolved to perform noisy iterated prisoner’s dilemma on a toroidal grid. The agents consist of a finite state machine specialized for playing iterated prisoner’s dilemma and simple tag recognition capability. The populations are allowed to evolve for 10,000 generations and the state of the world is stored every 500 generations. Populations from these samples are placed in competition with populations from generation 10,000. This procedure is repeated for varying levels of overall mutation rate, with and without tags, and varying frequencies of tag related mutations. Non-local adaptation is seen in these populations, however, tags seem to slow the acquisition of non-local adaptation. Although |