Bioinformatics & Computational Biology Bioinformatics & Computational Biology

BCB and IGERT Graduates
Return to Home Iowa State University

Name
&
Email

Degree/Major

Major/Co-
Major
Professors

Dissertation/Thesis Title
(click to see abstract)
Position
Employer
Semester/
Year of Grad

 

Prasith Baccam

Ph.D. in

Applied Math and Immunobiology

James Cornette
Susan Carpenter

Genetic Variation and evolution of equine
infectious anemia virus rev quasispecis during long term persistent infection
Innovative Emergency Mgmt. Inc.
Lead Scientist
Bel Air, MD

Past:

Postdoctoral Research Associate
Iowa State University

http://www.t10.lanl.gov/profiles/baccam.html

http://www.t10.lanl.gov/pbaccam/

Spring, 2000

Lisa Borsuk

Lisa Borsuk

M.S. in BCB

Dr. Patrick Schnable
Dr. Hui-Hsien Chou

To be Determined

Summer, 2007

 

Kara Butterworth

M.S. in

Botany

Jonathan Wendel
Dean Adams

Initiation and early development of fibers in wild
and cultivated cotton
Middle school science teacher
Apache Junction, AZ

 

Fall, 2003

Feng Cui

Feng Cui

Ph.D. - Co-Majors in

BCB and
Physical Chemistry

Dr. Zhijun Wu
Dr. Robert Jernigan
Distance-based NMR Structure determination
and refinement

Visiting Fellow
National Cancer Institute
Center for Cancer Research Nanobiology Program (CCRNP)
Frederick, MD

http://ccr.cancer.gov/Labs/staff.asp?labid=91

 

Summer, 2005

Garrett Dancik

Garrett Dancik

Ph.D. in BCB

Dr. Karin Dorman and Dr. Doug Jones

Exploring host-pathogen relationships through computer simulations of intracellular infection

Assistant Professor
Northwestern State
Departments of Biology and Math Sciences
Louisiana

Will begin bioinformatics concentration there.

Summer, 2008

Amy Determan

Amy Determan

MGET Student -

PhD in CBE

      Fall, 2005

Lixia Diao

Lixia Diao

M.S. in

BCB

David Fernandez-Baca
Xun Gu

Consensus properties of supertree construction methods

Ph.D. Student in Statistics
Iowa State University
Ames, IA

http://perl.hs.iastate.edu/lixia.htm

Summer, 2002

Jing Ding

Jing Ding

Ph.D. Co-major in

BCB and ComE

Dan Berleant
Eve Wurtele

BOW-Based vs. Concept-Based Text Clustering for Functional Analysis of Genes Staff Specialist
Ohio State University
Columbus, OH
Spring, 2006

Pan Du

Pan Du
Ph.D. Co-major
in BCB & EE
Julie Dickerson
Eve Wurtele

Multi-scale Genetic Network Inference based on Time Series Gene Expression Profiles
 

Research Associate / Senior Bioinformatics Analyst position
Robert H. Lurie Comprehensive Cancer Center
Northwestern University
Chicago, IL

Fall, 2005

Tyra Dunn

Tyra Dunn

M.S. in
BCB

PhD in BCB

Xun Gu
Dan Voytas

Greenlee; Honavar

Genomic differences between humans and primates

Characterizing and Influencing Differentiation Of Retinal Progenitor Cells

To be determined

Fall, 2004 - MS

Summer, 2007 - PhD

Scott Emrich

Scott Emrich

Ph.D. in
BCB

Srinivas Aluru
Patrick Schnable

Assembly and Analysis of Complex Plant Genomes

Assistant Professor
University of Notre Dame
Notre Dame, IN

Summer, 2007

Jo Etzel Joset Etzel

Ph.D. in

BCB

Julie Dickerson
Ralph Adolphs

Algorithms and Procedures to Analyze Physiological Signals in Psychophysiological Research

Postdoctoral Fellow
University of Groningan
Netherlands
Spring, 2006

Fang Fang Fang Fang

Ph.D. in

BCB

Karin Dorman Drena Dobbs Virus Recombination: Modeling and Data Analysis

Postdoctoral Fellow
Dr. Arlene Auerbach
Lab of Human Genetics & Hematology
The Rockefeller University
New York City

Spring, 2006

Jianmin Feng

Jianmin Feng

M.S. in

BCB

Volker Brendel
Zhijun Wu

A new approach for discovering protein motifs

Research Scientist
Dr. Ed Yeung
Iowa State University
Ames, IA
Fall, 2002

Xiang Gao

Xiang Gao

PhD in

MCDB and BCB

Dan Voytas
Leslie Miller
Studying the replication mechanism of the yeast retrotransposon Ty5 by molecular and computational approaches Postdoctoral Fellow
With Dr. Michael Lynch
Biology Department
Indiana University
Bloomington, IN
Fall, 2001

Zhong Gao

Zhong Gao

M.S. in

BCB

Vasant Honavar and
Kai-Ming Ho
Genome wide recognition of Tumor Necrosis
Factor (TNF) related ligands in human and
Arabidopsis genomes: A structural genomics
approach

Postdoctoral Fellow
The Center for Cardiovascular Bioinformatics and Modeling
Johns Hopkins University
Baltimore MD

Summer, 2003

Aspen Garry

Aspen Garry

MS in

EEB

Dean Adams
Gavin Naylor
Geometric Morphometric analysis of shark teeth of the genus Rhizoprionodon: The modern, the ancient, and the hypothetical. Modern tooth shape analysis and test of ancestory prediction methods by comparison to fossil shapes   Fall, 2003

Jianying Gu

Jianying Gu

PhD in

BCB

Xun Gu
Dan Nettleton

Functional divergence and genome evolution of vertebrate protein kinases

Assistant Professor
City University of New York
Summer, 2003

Ericka Havecker

Ling Guo

Ph.D. in BCB

Patrick Schnable
Daniel Ashlock
Adaption of Multiclustering to the Analysis of Microarray Data

To Be Determined

Summer, 2007

Ericka Havecker

Ericka Havecker

Ph.D. in IG

(IGERT Fellow)

Dan Voytas
Mei Hong
Characterization of the Sireviruses:  A unique group of Ty1/copia LTR retrotransposons in plants

Postdoctoral Research Associate
David Baulcombe
Sainsbury Lab
Norwich, England

Spring, 2005

Julie Hoy

Julie Hoy

Ph.D. in IG

(IGERT Fellow)

Dan Voytas
Mei Hong
Structural Characterization of Ligand Binding in Hexacoordinate Hemoglobins

Postdoctoral Research Associate
Mark Hargrove Laboratory
Iowa State University

Summer, 2006

LaRon Hughes

LaRon Hughes

M.S. in BCB and a Ph.D. in

BCB

M.S.--Karin Dorman and Susan Carpenter; PhD--Jim Reecy
Vasant Honavar

M.S.- EIA V DB: A comprehensive equine infectious anemia (EIA V) virus database

Ph.D. - Hypothesis building using the Animal Trait Ontology

GenomeQuest
Field Application Scientist
Westborough, MA

Summer 2004;

Summer, 2007

Junli Ji

Junli Ji

M.S. in

Genetics and BCB

Madan Bhattacharyya
Adam Bogdanove
  Pioneer Hi-Bred
Des Moines, IA
Fall, 2004

Cizhong Jiang

Cizhong Jiang

PhD in IG with BCB minor

Tom Peterson
Xun Gu

Computational and molecular analysis of Myb gene family Postdoctoral Research Associate
VCU (Virginia Commonwealth University)
Richmond, VA

Project: SNPs in mammals

Summer, 2004

Brent Kronmiller

Brent Kronmiller

PhD in BCB

Dr. Roger Wise and Dr. Xun Gu

Assembly And Annotation Tools For Analysis Of Large Contiguous Regions Of The Maize Genome

 

Summer, 2008

Alain Laederach

Alain Laederach

PhD - Co-Major in

BCB and Chemical Engineering

Peter Reilly
Amy Andreotti

Protein-carbohydrate and protein-protein interactions: Using models to better understand and predict specific molecular recognition

Postdoctoral Fellow
Dr. Russ Altman, MD, PhD
Helix Bioinformatics Group
Department of Genetics
Stanford School of Medicine
CA

http://helix-web.stanford.edu/people/alain/

Stanford School of Medicine, Department of Genetics

Summer, 2003

Michael Lawrence

Michael Lawrence

PhD in BCB Dianne Cook
Eve Wurtele
Interactive graphics, graphical user interfaces and software interfaces for the analysis of biological experimental data and networks

Postdoctoral Fellow
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024
Seattle, WA 98109
http://www.fhcrc.org/

Fred Hutchinson Cancer Research Center is a world leader in research to understand, treat and prevent cancer, HIV/AIDS and other life-threatening diseases. Founding members of the center are credited with pioneering bone-marrow transplantation as a successful treatment for leukemia and other blood diseases.

Spring, 2008

Nicole Leahy

Nicole Leahy

PhD in

BCB

Daniel Ashlock
John Mayfield
  Postdoctoral Fellow
Genetics Department
University of GA
Athens, GA
Spring, 2004

Jae-Hyung

Jae-Hyung Lee

PhD in BCB

Drena Dobbs
Kai-Ming Ho

Analysis of protein-RNA and protein-peptide interactions in Equine Infectious Anemia Virus (EIAV) infection Postdoctoral Fellow
Drena Dobbs Lab
Iowa State University
Fall, 2007

Darrin Lemmer Darrin Lemmer

M.S. in BCB

Gloria Culver
Drena Dobbs

CAVEMol: an immersive 3D molecule viewer IBM
Rochester, MN
Spring, 2006

Yuan Lin

Yuan Lin

M.S. in

BCB

Xun Gu
Xiaoqiu Huang
The relationship of sequence similarity and expression pattern similarity between yeast genes within gene families

Staff
J. Craig Venter Institute
9704 Medical Center Drive
Rockville, MD 20850

phone: 240-268-2767
email: press@venterinstitute.org

Jobs: See Jobs Page

J. Craig Venter Institute is a not-for-profit research institute dedicated to the advancement of the science of genomics; the understanding of its implications for society; and the communication of those results to the scientific community, the public, and policymakers. http://www.venterinstitute.org/

Fall, 2001

Haining Lin

Haining Lin

M.S. in

BCB

Xiaoqiu Huang
Daniel Voytas
BACAP: An assembly program for hierarchical shotgun sequencing

The Institute for Genomic Research (TIGR)
Rockville, MD

And PhD Student in BCB
Iowa State University

Fall, 2004

Patricia Lonosky Patricia Lonosky

M.S. in

Genetics

Steve Rodermel
Vasant Honovar
Proteomics of the developing chloroplast in maize

Scientist
Nanosphere, Inc.
Northbrook, IL

Fall, 2002

Wiesia Mentzen Wiesia Mentzen

Ph.D. in

BCB

Eve Wurtele
Xun Gu

From Pathway to Regulon in Arabidopsis

Senior Scientist with
Alberto de la Fuente at
CRS4 Bioinformatica
Pula, Italy

Summer, 2006

Erin M. Myers

Erin M. Myers

PhD in EEB Major Professors: Fred Janzen & Dean Adams Post-orbital color pattern variation and the evolution of a radiation of turtles (Graptemys)    

Brooke Peterson-Burch

Brooke Peterson-Burch

PhD in

Genetics

Daniel Voytas
Vasant Honavar
Characterization of plant LTR retrotransposon diversity and host genome survival strategies Bioinformatics Scientist
Pioneer Hi-Bred
Des Moines, IA
Spring, 2003

Myron Peto

Myron Peto

Ph.D. in
BCB

Robert Jernigan
Drena Dobbs

Studies of Protein Designability using Reduced Models

Postdoctoral Associate
Crop Genome Informatics Laboratory
USDA Agricultural Research Service
On the Campus of Iowa State University

Summer, 2007

Brad Powers

Bradley Powers

M.S. in

BCB

Daniel Ashlock
Kirk Moloney

The Effect of Tags on Non-Local Adaptation

Bioinformatics Scientist
NewLink Genetics
Ames, IA
Spring, 2004

Justin Recknor

Justin Recknor

Ph.D. in BCB and Co-Major in Statistics Dan Nettleton
Jim Reecy
Identification of Differentially Expressed Functional Categories in Microarray Studies Using Nonparametric Multivariate Analyses Eli Lilly
Associate Statistician
Toxicology Department
Working with Microarray Analysis
Indianapolis, IN
Fall, 2006

Kyoungmin Roh Kyoungmin Roh

M.S. in PhD

Steve Proulx

Evolutionary variance of gene network via simulated annealing algorithm

Ph.D. Student
University of California

Summer, 2008

Ph.D. in

BCB

Volker Brendel
Randy Shoemaker
Plant genome informatics: evaluation and analysis of genomic DNA features involved in the transcriptional processing of protein coding genes Assistant Professor
Department of Computer and Information Technology
Purdue University
West Lafayette, IN
Fall, 2006

Justin Schonfeld Justin Schonfeld

Ph.D. in

BCB

Dan Ashlock
Dan Voytas

A modular data analysis pipeline for the discovery of novel RNA motifs Postdoctoral Fellow
Cognitive Information Processing group
Computer Science and Engineering Department
University of Nevada
Reno, NV
Spring, 2006

Sachet Shukla

Sachet Shukla

M.S. in

BCB

Srinivas Aluru
Charles Link

Identification of regional motifs in the 5' UTR and their implication in translational control mechanisms

Bioinformatics Scientist
NewLink Genetics
Ames, IA
Summer, 2003

Michael Sparks

Michael Sparks

Ph.D.
in BCB

MGET Fellow

Volker Brendel
Jonathan Wendel

Computational annotation of eukaryotic gene structures: algorithms
development and software systems

Postdoctoral Fellow
Volker Brendel's Lab
Iowa State University
Fall, 2007

Robert Thompson

Robert Thompson

M.S. in

Genetics

Susan Carpenter
Daniel Ashlock

Application of computational tools to analyze evolution of equine infectious anemia virus   Spring, 2001

Pete Vedell

Peter Vedell
PhD in BCB
Co-Major in Math
Zhijun Wu
Robert Jernigan
Boundary Value Approaches To Molecular Dynamics Simulation

Postdoctoral Fellow
The Jackson Laboratory
Bar Harbor, Maine

The Jackson Laboratory is designated by the National Cancer Institute as "Cancer Centers" to conduct basic cancer research. At the time of the Laboratory's initial designation in 1983, NCI noted, "The Jackson Laboratory is not only important to the national cancer effort but critical to its success."

http://www.jax.org/about/jax_facts.html

Spring, 2007

Kent Vander Velden

Kent Vander Velden

M.S. in

BCB

Gavin Naylor
Vasant Honavar
Spatial Clustering of differences in measured homoplasy with respect to protein structure

Current PhD student in BCB
Research Scientist,
Pioneer Hi-Bred
Des Moines, IA

Spring, 2002

Thomas Vigdal

Thomas Vigdal

M.S. in

BCB

Daniel Voytas
Volker Brendel
Insertion site similarities in the Tc1/mariner element family

Law Student
UC, Davis

Recently received an MS at Stanford

Summer, 2001

Jianmin Wang

Jianmin Wang

Ph.D. in BCB Xiaoqiu Huang
Xun Gu

Computational studies of ESTs: assembly, SNP detection, and applications in alternative splicing

Staff
Roswell Park Cancer Institute
Buffalo, NY

Roswell Park Cancer Institute (RPCI), is America's first cancer center founded in 1898 by Dr. Roswell Park. RPCI holds the National Cancer Center designation of "comprehensive cancer center" and serves as a member of the prestigious National Comprehensive Cancer Network.

Over its long history, Roswell Park Cancer Institute has made fundamental contributions to reducing the cancer burden and has successfully maintained an exemplary leadership role in setting the national standards for cancer care, research and education.

The campus spans 25 acres in downtown Buffalo and consists of 15 buildings with about one million square feet of space. A new hospital building, completed in 1998, houses a comprehensive diagnostic and treatment center. In addition, the Institute built a new medical research complex and renovated existing education and research space to support its future growth and expansion. http://www.roswellpark.org/

For more information about Roswell Park and cancer in general, please contact the Cancer Call Center at 1-877-ASK-RPCI (1-877-275-7724).

Summer, 2006

Xiangyun Wang

Xiangyun Wang

M.S. in

BCB

Vasant Honavar
Drena Dobbs
Data-driven discovery of rules for protein function classification based on sequence motifs Postdoctoral Research Associate
AstraZeneca Pharmaceutical
Wilmington, DE
Spring, 2002

Yingchun Wang

Yingchun Wang

PhD in

Genetics and
BCB

Parag Chitnis
Suresh Kothari

Identification and functional analysis of thylakoid membrane proteome

Research Associate
Klemke Laboratory
Scripps Research Institute
La Jolla, CA

http://www.scripps.edu/imm/klemke/barry.htm

The role of the SDF-1/CXCR-4 receptor system in breast cancer metastasis.

In May 2005, received a three year Fellowship from Susan Komen Breast Cancer Foundation to continue his research in proteomics and cancer metastasis.

Fall, 2003

Yufeng Wang

Yufeng Wang

Ph.D. in

BCB

Xun Gu
Daniel Ashlock
Functional divergence and age distribution of vertebrate gene families

Assistant Professor
Bioinformatics and Computational Biology
Department of Biology
University of Texas
San Antonio, TX
(210) 458-6492

http://www.bio.utsa.edu/faculty/wang.html

Research in my laboratory focuses on the comparative genomics, molecular evolution, and population genetics of gene families. 

Summer, 2001

Yufeng Wang

Matthew Wilkerson

PhD in BCB Volker Brendel and
Thomas Peterson
Genesis of gene structures and computational analysis of U12-type introns Matt Wilkerson
Postdoctoral Research Associate
D. Neil Hayes Laboratory
Lineberger Comphrehensive Cancer Center
The University of North Carolina at Chapel Hill
Chapel Hill, North Carolina
Fall, 2007

Di Wu

Di Wu

PhD Co-major in

BCB and Math

Zhijun Wu and
Robert Jernigan
Distance-based Protein Structure Modeling

Assistant Professor
Department of Mathematics
Western Kentucky University
Bowling Green, KY

 

Summer, 2006

Shiquan Wu

Shiquan Wu

PhD in

BCB

Xun Gu
Zhijun Wu
Comparative genomics: Multiple genome rearrangement and efficient algorithm development

Postdoctoral Research Associate
Virtual Reality Application Center
with Dr. Zhijun Wu
Iowa State University
Ames, IA

 

Fall, 2004

Wu Xu

Wu Xu

M.S. in

BCB

Parag Chitnis
Suresh Kothari
DNA sequence-specific recognition by transcriptional factors Postdoctoral Fellow
Biochemistry Department
St. Jude Hospital
Memphis, TN
Summer, 2003

Aimin Yan

Aimin Yan

Ph.D. in BCB

Dr. Robert Jernigan; Dr. Zhijun Wu

Analysis on protein structures using statistical and computational methods

Postdoctoral Associate
Dr. Jack Dekkers
Department of Animal Science
Iowa State University
Summer, 2008

Changhui Yan

Changhui Yan

Ph.D. Co-Major in BCB and Computer Science

Vasant Honavar
Drena Dobbs

Identification of interface residues involved in
protein-protein and protein-DNA interactions from sequence using machine learning approaches

Assistant Professor
Computer Science Department
Utah State University
Logan, UT
Fall, 2005

Lei Yang

Lei Yang

Ph.D. in BCB Robert Jernigan and Zhijun Wu Understanding protein motions by computational modeling and statistical approaches

 

Summer, 2008

Liang Ye

Liang Ye

Ph.D. in BCB Xiaoqiu Huang and Gavin Naylor Sequence comparison methods, statistics, and applications

Senior Scientist
Genome Sequencing Center
School of Medicine
Washington University
St. Louis, MO

Summer, 2006

Hailong Zhang

Hailong Zhang

M.S. in

BCB

Eve Wurtele
Julie Dickerson
MetNet DB: A comprehensive metabolic and regulatory network database Bioinformatics Research Scientist/PhD Student
Chemistry Department
University of New Hampshire
Durham, NH
Summer, 2002

Wuyan Zhang

Wuyan Zhang

Co-Major PhD
Stat and BCB
Alicia Carriquiry
Jack Dekkers
The design and analysis of microarray experiments using pooled samples for the study of quantitative traits Research Statistician
Abbott Laboratory
Chicago, IL
Spring, 2007

Xiaosi Zhang

Xiaosi Zhang

M.S. in

BCB

Vasant Honavar
Xun Gu

Gene expression pattern analysis

 

 

 

Xiaosi Zhang
System Engineer
Meredith Corporation
Des Moines, IA
Fall, 2002

Zhongqi Zhang

Zhongqi Zhang

PhD - Co-Majors:

Statistics and
BCB

Ken Koehler
Xun Gu
Statistical analysis of gene expression profiles

Assistant Professor
Tsinghua University
Tsinghua, PR China

Summer, 2004

Hua Zhou

Hua Zhou

M.S. in

BCB

Karin Dorman
Susan Carpenter
Branching process models for HIV-1 drug resistant mutants Ph.D. Student
Statistics department
Stanford University
CA
Fall, 2003

Huaijun Zhou

Huaijun Zhou

M.S. in

BCB

Xun Gu
Susan Lamont
Statistical Analysis of Functional Divergence in Gene Families

Assistant Professor
Department of Poultry Science
Texas A&M University
College Station, TX

Fall, 2003

Wei Zhu

Wei Zhu

PhD in

BCB

Volker Brendel
Srinivas Aluru
Spliced alignment and its application in Arabidopsis thaliana

TIGR
Rockville, MD

The Institute for Genomic Research (TIGR) is a not-for-profit center dedicated to deciphering and analyzing genomes – the complex molecular chains that constitute each organism’s unique genetic heritage.

Since it was founded in 1992, TIGR has been at the forefront of the genomics revolution, deepening the understanding of life and producing results with wide-ranging applications in medicine, agriculture, energy, the environment and biodefense.

Spring, 2003


Prasith Baccam

Home Departments: Math and Immunobiology

Major Professor: Dr. Cornette
Co-Major Professor: Dr. Susan Carpenter

Title: Genetic Variation and evolution of equine infectious anemia virus rev quasispecis during long term persistent infection

Abstract: Genetic variation has been observed in many viruses. Viruses that carry their genetic information in the form of RNA exhibit high mutation rates because the viral polymerase lacks proof-reading mechanisms commonly found in DNA polymerase complexes. The combination of high mutation rates, small genome size, and high replication rates results in a population of closely related viral genotypes, which are commonly referred to as a quasispecies. A consequence of the genetic variation in viruses is possible variation in viral phenotype of the quasispecies population. Furthermore, changes in viral phenotype may be a biologically important factor in progression of disease. Here, we undertook a longitudinal study to describe the quasispecies nature and genetic variation in a lentivirus regulatory protein, Rev, during the course of disease in a pony experimentally infected with equine infections anemia virus (EIAV). This study examined rev variants that comprised the quasispecies population in sequential sera samples. Over the course of disease, there was continual appearance of novel rev variants, with some variants growing in frequency to predominate certain time points. Phylogenetic and cluster analyses suggested that the Rev quasispecies was comprised of two distinct populations that co-existed during infection. These two quasispecies populations differed in their pattern of evolution, with one population accumulating changes in a linear, time-dependent manner, while the other population evolved radially from a common variant. Changes in the population size of the two Rev quasispecies coincided with changes in the clinical stages of disease. Rev variants from each population were biologically tested, and significant differences in Rev activity were detected between the two populations. Together, these results suggested that the distinct Rev populations differed in selective advantage. A statistical correlation was found between Rev quasispecies activity differed significantly between different stages of clinical disease. This study suggests that distinct quasispecies populations, which differed in patter of evolution and niche advantage, co-existed during long term persistent infection by EIAV. A multi-population quasispecies model challenges our current thinking of viral populations and may have significant biological implications.


Kara Butterworth

Home Department: Botany

Major Professor: Dr. Jonathan Wendel
Co-Major Professor: Dr. Dean Adams

Title: Initiation and early development of fibers in wild and cultivated cotton

Abstract: Gossypium (Malvaceae) is a diverse genus best known for cultivated cotton. It includes about 50 species, 45 diploid and 5 allopolyploid, which occur in arid and semi-arid regions throughout the world (Vollesen, 1987; Fryxell, 1992). The diploids are divided into eight genome groups based on chromosome pairing and size, and fertility between species (Endrizzi, Turcotte, and Kohel, 1985). These groups comprise natural lineages within the genus and correspond to geographic locations: A, B, E, F- Africa and Arabia; C, G, K- Australia; and D- New World. Allopolyploid members are founds in the New World and contain the A and D genomes (Wendel, 1995; Wendel et al., 1998; Brubaker, Bourland, and Wendel, 1999; Percival, Wendel, and Stewart, 1999; Cronn et al., 2002). This understanding of the evolutionary history of the genus allows many aspects of evolutionary differences in development and morphology to be studied in a phylogenetic context.


Feng Cui

Home Department: Mathematics

Major Professor: Dr. Zhijun Wu
Co-Major Professor: Dr. Robert Jernigan

Title: Distance-based NMR Structure determination and refinement

Abstract: X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy are two widely used experimental techniques for protein structure determination. In protein Data Bank (PDB), about 85% of deposited protein structures are determined by X-ray crystallography. The rest of the structures are determined by NMR spectroscopy. The main difference between these two approaches lies in the state of protein samples to which they are applied: for X-ray crystallography, a protein has to be in the crystalline state while in NMR, it may be in the solution state. Both approaches have their own pros and cons. For example, X-ray crystallography is a mature technique capable of providing more objective interpretation of data. This approach has various quality indicators such as resolution and R-factor to assess the structures. It can be applied to large molecules, e.g., virus particles, and produce a single model that is easy to visualize and interpret. Raw data processing is highly automatic. In contrast, NMR is a relatively new technique and provides more subjective interpretation of the data. It lacks established quality indicators of data and models. In addition, it is limited to determination of relatively small proteins (<20kDa) and produces an ensemble of possible structures rather than one model. Data sometimes have to be manually processed. On the other hand, a protein has to form stable crystals for X-ray analysis, which could be time-consuming and often impossible. The crystalline state is not a natural and physiological environment for the protein either. In addition, X-ray crystallography is less useful for large flexible modular proteins. In contrast, the solution state of a protein is closer to biological conditions and relatively easy to prepare. NMR can provide information on dynamics and identify individual side-chain motion, often used to monitor conformational change on ligand binding. With the pros and cons, both approaches have undergone dramatic development during the past five years, especially for NMR. Advances in data collection, spectra assignment and analysis, structure calculation and computer graphics bring no barrier among NMR spectra assignment process, NMR structure assessment and visualization. Many quality indicators such as bond length, angle and NOE violations (inter-atomic distances that lie outside of NOE ranges) have been developed and used for quality assessment of NMR structures. Novel refinement schemes aimed at increasing the accuracy of the resulting structures have been proposed and tested. As a result, nowadays, proteins in size up to 30 kDa (about 260 residues) are routinely accessible by NMR spectroscopy with increased resolution, equivalent to approximately 2.5-A resolution crystal structures.


Garrett Dancik

Home Department: Statistics

Major Professor: Dr. Karin Dorman
Co-Major Professor: Dr. Doug Jones

Title: Exploring host-pathogen relationships through computer simulations of intracellular infection

Abstract: Computer simulations of infectious disease allow for the identification and estimation of important pathogen and immune parameters, the validation of theoretical biological models with experimental data, and the characterization of the host-pathogen interactions that lead to emergent and sometimes counterintuitive behavior. This dissertation describes the development, analysis, and calibration of a computer model of Leishmania major infection, the identification of correlates of escape mutant success and optimal escape strategies in a computer model of a viral infection, and statistical software to aid in computer model analysis and calibration.

In an agent-based model of L. major infection, sensitivity analysis reveals that increasing growth rates can favor, or suppress parasite load, depending on the stage of the infection and the ability of the pathogen to avoid detection. Calibration of the computer model suggests that the pathogen has a relatively slow growth rate and can grow for an extended time before damaging the host cell.

In a computer model of viral infection, we find that the relative overall importance of the cellular (or humoral) response consistently correlates with both the success of immune escape and the optimal escape strategy, and that correlation is relatively robust to the time the escape mutant arises. Mutants that simultaneously escape both responses perform substantially better than humoral or cellular escape mutants alone, highlighting the importance of both responses in controlling infection. Interestingly, loss of infectiousness of humoral escape mutants favors the virus, likely because decreasing infectivity weakens the cellular response.

Finally, Gaussian processes (GP) are commonly used as fast predictors of computer model output and are essential tools for computer model calibration and analysis. We describe the R package mlegp , which fits GPs to scalar or multivariate computer model output and performs sensitivity analysis to identify and characterize the effects of important model parameters.


Lixia Diao

Home Department: Computer Science

Major Professor: Dr. David Fernandez-Baca
Co-Major Professor: Dr. Xun Gu

Title: Consensus properties of supertree construction methods

Abstract: The combination of a set of rooted perfect phylogenetic trees on overlapping leaf sets into one supertree is important and fundamental for evolutionary biology. In this thesis, we will present three supertree techniques – MRP, MRF, MinCutSupertree – and compare the consensus properties of MRP and MRF with some consensus tree criteria.


Jing Ding

Home Department: Electrical and Computer Engineering

Major Professor: Dr. Dan Berleant
Co-Major Professor: Dr. Eve Wurtele

Title: BOW-Based vs. Concept-Based Text Clustering for Functional Analysis of Genes

Abstract: The rapid development in genomic technologies (e.g. microarray) has enabled biologists to simultaneously monitor expression of hundreds or even thousands of genes in a single experiment. To interpret the biological meaning of the expression patterns, it still largely relies on biologists domain knowledge, as well as collected information from literature and/or various public databases. Individual experts domain knowledge is insufficient for large datasets, and manually collecting and analyzing information from literature and/or public databases are tedious and time-consuming. Computer-aided functional analyzing tools are highly desirable. We developed GeneNarrator, a text-mining system for functional analysis of microarray data. Given a list of genes, GeneNarrator collects functional information (MEDLINE citations) from PubMed, and clusters the citations into functional topics. The genes are then mapped to the topics and clustered into groups based on their similarities in topic distribution.


Pan Du

Home Department: Electrical and Computer Engineering

Major Professor: Dr. Julie Dickerson
Co-Major Professor: Dr. Eve Wurtele

Title: Multi-scale Genetic Network Inference based on Time Series Gene Expression Profiles

Abstract: This work integrates multi-scale clustering and short-time correlation to estimate genetic regulatory networks with different time resolutions and detail levels. Gene expression data are noisy and large scale. Clustering is widely used to group genes with similar pattern. The cluster centers can be used to infer the genetic networks among these clusters. This work introduces the Multi-scale Fuzzy K-means clustering algorithm to uncover groups of coregulated genes and capture the networks in different levels of detail.
Time series expression profiles provide dynamic information for inferring gene regulatory relationships. Large scale network inference, identifying the transient interactions and feedback loops as well as differentiating direct and indirect interactions are among the major challenges of genetic network inference. Pairwise time correlation can detect linear interactions between genes. Estimates of the time delay and direction of causality in the inferred network can also be made. Partial correlation and d-separation theory are combined to differentiate the direct and indirect interactions and identify feedback loops. Gene expression regulation can happen in specific time periods and conditions instead of across the whole expression profile. Short-time correlation can capture transient interactions.
The network discovery algorithm was validated using yeast cell cycle data. The algorithm successfully identified the yeast cell cycle development stages, cell cycle and negative feedback loops, and indicated how the networks dynamically changes over time. The inferred network reflects most interactions previously identified by genome-wide location analysis and matches extant literature results. The inferred network provides more detailed information about genes (or clusters) and the interactions among them. Interesting genes, clusters and interactions were identified, which match the literature and the gene ontology information and provide hypotheses for further studies.


Tyra Dunn

Home Department: Genetics, Development & Cell Biology

Major Professor: Dr. Xun Gu
Co-Major Professor: Dr. Daniel Voytas

Title: Genomic differences between humans and primates

Abstract: Scientists around the world have wondered for many years what distinguishes speciation. Of particular interest is the genetic basis for human/primate (chimpanzee or gorilla) separation. Humans and chimpanzees are 99% identical in their genomic DNA sequence, thus making them very closely related. Despite this high degree of sequence similarity, humans and primates have a number of striking phenotypic differences. We hypothesize that sequence changes that have occurred between humans and primates have altered developmental programs. Because transcription factors alter the expression of numerous genes, we also hypothesize that changes in the expression or activity of transcription factors are responsible for the different phenotypic traits among humans and primates.

Using human chromosome 22 as a model for comparison between human and primate DNA, a random selection of noncoding genes approximately 1-2 kilobases (kb) long upstream was sequenced. Focused on promoter regions from the sequence data, significant differences were detected when comparing humans and gorillas (p-value= < 0.01) and gorillas and chimpanzees (p-value= <0.01) suggesting that limited similarities existed between the species. When comparing humans and chimpanzees (p-value= >0.1), no significant difference was detected. Using this information, transcription factors were analyzed between the human and chimpanzee data to determine if transcription regulation was different between the species. The results indicated no significant difference between humans and chimpanzees at the single-nucleotide level even though the species differ at the genetic and phenotypic levels. The results also indicated that changes in transcription regulation have played a major role in determining speciation. This research opens new avenues in investigating how many of the differences have functional consequences and the relative contributions of these transcription factors to expression differences.


Tyra Dunn

Home Department: Genetics, Development & Cell Biology

Major Professor: Dr. Heather Greenlee
Co-Major Professor: Dr. Vasant Honavar

Thesis Presentation: June 12, 2007

Title: Characterizing and Influencing Differentiation Of Retinal Progenitor Cells

Abstract: The vertebrate neural retina is a complex organ that is well suited for studying development of the central nervous system. Blinding degenerative retinal diseases including retinitis pigmentosa, macular degeneration and glaucoma are characterized by loss of retinal neurons. At this time there is no way to replace retinal cell loss due to disease or injury since differentiated retinal cells are unable to regenerate. As a potential approach for treating retinal injury, neural progenitor cells have been proposed as a unique source of transplantable cells to replace lost cells in the damaged retina.

Previous studies have transplanted a variety of neural stem cells to the eye in hopes of developing a therapy to replace retinal neurons lost to disease.  Successful integration, survival and differentiation of the cell types have been variably successful.  At the moment little is known about the fundamental biological differences between stem cell or progenitor cell types.

We have used proteomic profiling to begin to identify unique characteristics of retinal progenitor cells. Our results demonstrate that expanded retinal progenitor cells express higher levels of stress-response proteins compared to their brain-derived counterparts. Further, we have described the dynamic expression of stress-response proteins during in vivo retinal development. Finally, we have demonstrated that changing the oxidative environment by addition of the antioxidant vitamin E to retinal progenitor cells differentiated in vitro decrease expression of stress-response proteins and alter their differentiation. These studies are the first to describe the expression of stress-response proteins during in vitro and in vivo retinal cellular development. Our results demonstrate the importance of understanding the oxidative nature of a host environment and how differentiation of transplanted cells might be affected.


Scott Emrich

Home Department: Electrical and Computer Engineering

Major Professor: Dr. Srinivas Aluru
Co-Major Professor: Dr. Patrick Schnable

Title: Assembly and Analysis of Complex Plant Genomes

Presentation: June 8, 2007

Abstract: Concurrent advances in high-throughput sequencing and assembly have led to the completion of many complex genomes. Even so, these assemblies require substantial computational resources. In this dissertation, we present a massively parallel approach that scales to thousands of processors without duplicating the biological expertise present in conventional assembly software. Additional bioinformatics techniques were required to accurately assemble the maize genome including novel repeat detection, and the resulting framework has been strongly supported by maize experimental data. More recently, this framework has been generalized for fruit fly, sorghum, soybean and environmental sequence assemblies. Questions in plant genome analysis were also addressed. For example, we have discovered an estimated 350 “orphan” maize genes and have shown that approximately 1% of all maize genes were recently duplicated, many of which into at least two functional copies. LCM-454 sequencing is introduced and analyses that indicate this approach can discover rare, potentially tissue-specific transcripts and thousands of SNPs will be presented. This dissertation combines high performance computing, computational biology and high-throughput sequencing for our ongoing work on the maize genome project. We conclude by describing how these contributions can be useful for any species, including non-model organisms that are unlikely to be fully sequenced.


Joset Etzel

Home Department: Electrical and Computer Engineering

Major Professor: Dr. Julie Dickerson
Co-Major Professor: Dr. Ralph Adolphs

Title: Algorithms and Procedures to Analyze Physiological Signals in Psychophysiological Research

Abstract: This dissertation presents analytical techniques which allow more information to be derived from psychophysiological data than otherwise possible. The techniques include an implemented algorithm for chest strain-gauge respiration signal analysis and a permutation testing method for evaluating changes over time in physiological signals. These methods are applied to three data sets, each examining physiological correlates of emotional experience. In the first study physiological correlates of moods induced using music were identified, although respiration entrainment confounds the issue of whether mood or the music caused the observed patterns. The second study examined physiological responses while subjects watched an emotional movie under three conditions; changes relating both to the movie scenes and condition were identified. Finally, the third study evaluates short term changes in heart rate while viewing words in terms of the type of word viewed and later word recall.


Fang Fang

Home Department: Statistics

Major Professor: Dr. Karin Dorman
Co-Major Professor: Dr. Drena Dobbs

Title: Virus Recombination: Modeling and Data Analysis

Abstract: As a key evolutionary process, recombination shapes the genetic structure of virus populations. The dramatic increase of virus full-length sequences provides a chance to study virus recombination through molecular data. Many statistical methods have been developed, and a lot of the methods are phylogenetic-based. My research focuses on recombination modeling and data analysis. I first apply an existing phylogenetic-base method, Bayesian dual change-point model (DMCP), to investigate the role of representative data types for recombination study. We conclude that consensus data is overall the best data type to represent virus genotypes. Using consensus data we studied recombination on all full-length hepatitis B virus (HBV) sequences, and set up a system for using DMCP model for large scale sequence analysis. We discovered that HBV has extremly high recombination rate. For the first time we reported circulating recombination forms of hepatitis B virus, and identified one potential recombination hotspot. One important goal of studying recombination is to find potential recombination hotspot, and to reveal the recombination molecular mechanism. This goal requires identification of all recombinants generated by different recombination events,which is not trivial when recombination sequences have similar mosaic structures. Extending the DMCP model, I developed a metnod to identify the number of recombination event producing multiple recombinants. I apply this method to several HBV recombinants that have identical mosaic structure and find at least two recombinant events.


Jianmin Feng

Home Department: Genetics, Development & Cell Biology

Major Professor: Dr. Volker Brendel
Co-Major Professor: Dr. Zhijun Wu

Title: A new approach for discovering protein motifs

Abstract: Motif recognition is a powerful homology based sequence analysis tool for clustering new protein sequences into different families based on characteristic motifs. Compared to BLAST, these approaches typically have lower false positive rates and can reveal more remotely related family members. However, the current motif databases do not cover all the sequences in protein sequence databases. One of the major reasons for the low coverage of motif databases is that there is only a small set of known member sequences available for constructing protein motifs for many gene families. I have designed a new algorithm, “mFISHER”, to detect protein motifs from only 2-5 known member sequences by artificial evolution of given sequences based on a position specific PAM evolution model. Based on my test results on 160 motif families, the overall average recall rate or sensitivity (true/(true + false negative)) and specificity (true/(true + false positive)) are 88% and 95%, respectively. Compared with MEME (Multiple EM for Motif Extraction), mFISHER is better based on the recall rate, especially when only 2 or 3 members are available. Both approaches have the similar sensitivity. MFISHER is promising for constructing protein motifs when only a few known members.


Xiang Gao

Home Department: Genetics, Development & Cell Biology

Major Professor: Dr. Daniel Voytas
Co-Major Professor: Dr. Les Miller

Title: Studying the replication mechanism of the yeast retrotransposon Ty5 by molecular and computational approaches

Abstract: The yeast retrotransposon Ty5 is a Ty1/copia element. Officially, it is in the Hemivirus genus of the Pseudoviridae family. The ability to genetically manipulate retrotransposons and the yeast host cell was taken advantage of to explore replication mechanisms unique to Ty5 and common to most retrotransposons. Because of the abundance and diversity of retroelement sequences, along with the fact that many retroelement enzymes have evolved unique functional specificities, computational approaches were also developed to study functional divergence in replication. By screening a randomly mutagenized Ty5 library, two mutations (Y68C, D252N) that caused higher transposition frequencies were identified. Both mutations increased Ty5 cDNA levels, but did not have dramatic effects on the steps after cDNA synthesis (i.e. integration and recombination), or protein synthesis, processing, or solubility. The D252N mutation increased the hydrogen bonding potential of the CCHC zinc finger of nucleocapsid protein (NCp), making the Ty5 NCp zinc finger more like Ty1/copia consensus zinc fingers in terms of hydrogen bonding potential. Other mutations that increased the hydrogen bonding potential (D252R, D252K) provided the same fold increase in Ty5 reverse transcription, and natural occurring mutations in the Ty5 zinc finger repress this function. Hydrogen bonding is suggested to be a universal requirement for the function of retroviral type zinc fingers and cellular zinc fingers. A half-tRNA priming mechanism for Ty5 reverse transcription was also demonstrated. Mutations in the anticodon of tRNA (IMT) and the putative PBS of Ty5 decreased transposistion, but transposistion was restored when complementarity between the IMT and PBS was restored. A tree-based method and supplemental Split Tester software were developed to study the functional divergence of reverse transcriptase (RT) with respect to half-tRNA and full-tRNA priming mechanisms. The domains identified by this computational approach were previously experimentally demonstrated to bind with the tRNA primer/template in HIV RT. Using this software, another domain related to integrase functional specificity, namely whether or not integrase carries out 3’-end processing during integration, was also consistently identified in different integrase datasets. A model describing this functional divergence is proposed.


Zhong Gao

Home Department: Computer Science

Major Professor: Dr. Vasant Honavar
Co-Major Professor:
Dr. Kai-Ming Ho

Title: Genome wide recognition of Tumor Necrosis Factor (TNF) related ligands in human and Arabidopsis genomes: A structural genomics approach

Abstract: Tumor necrosis factors (TNFs) play a crucial role in mammalian signal transduction pathways for cell proliferation, survival, and differentiation. Human and other species (such as Arabidopsis) genome sequencing projects provide a unique opportunity for genome-wide recognition of TNF related ligand proteins and discovery of potential TNF-TNFR signal transduction mechanism in plants. Genome-wide recognition of TNF related proteins in human and Arabidopsis was carried out using secondary structure prediction and protein fold recognition. In the protein fold recognition scheme, sequence-structure models are evaluated using contact energy score based on Miyazawa-Jernigan and Li-Tan-Wingreen models. Secondary structure composition based initial screening not only reduces search space of protein fold recognition but also shifts the score distribution of the selected candidates to a higher score region. In order to investigate influence of sequence length on threading results, protein fold recognition was conducted on human and Arabidopsis genome sequences of different length. The test on known TNFs from diverse species indicates that about 83% of TNFs are able to be identified; the test on human genome sequences shows that about 80% of known TNFs can be recognized. Integration of secondary structure profiling into the scheme can improve performance by adjusting local sequence-structure relationship. However, this improvement largely depends on accuracy of secondary structure prediction. Average scoring performs better than maximal scoring in model evaluation and selection. Pattern classification algorithms such as decision tree, neural network, Naïve Bayes classifier, and support vector machine are applied to discriminate TNF related proteins from the competitive false positives which have similar secondary structure composition to known TNFs and also have high fold recognition scores. Both known TNF and false positive sequences are represented with the twenty q values corresponding to twenty amino acids in Li-Tan-Wingreen model. Cross-validation results show that Naïve Bayes classifier performs better than SVM, neural network, and decision tree, and Naïve Bayes classifier is suitable for stringent control of false positive. This genome-wide search scheme was used to search potential TNF-like signal proteins in Arabidopsis genome. Possible role of candidates in human and Arabidopsis genomes is discussed. These results demonstrate that structure based methods can facilitate functional prediction in a genome scale.


Aspen Garry

Home Department: Ecology, Evolution, & Organismal Biology

Major Professor: Dr. Dean Adams
Co-Major Professor: Dr. Gavin Naylor

Title: Geometric Morphometric analysis of shark teeth of the genus Rhizoprionodon: The modern, the ancient, and the hypothetical. Modern tooth shape analysis and test of ancestory prediction methods by comparison to fossil shapes

Abstract: Shark teeth are extremely common in the fossil record, and they can potentially provide insight into the evolutionary history of sharks. However, isolated fossil teeth are difficult to assign to the correct jaw, position, and taxon without organismal context because individual sharks exhibit a variety of tooth shapes. Tooth shape varies across jaws, positions within each jaw, and taxa.

Fortunately, tooth shape is quantifiable, and shapes can be compared using the techniques of geometric morphometrics, which measure shape and its covariation with other variables. Analysis of modern tooth shapes was performed in order to gain understanding of patterns of modern tooth shape variation. These results could then be applied to fossils to provide better identification of fossils in order to make use of sharks’ extensive fossil record.

To quantify modern patterns of tooth shape variation, teeth of five Rhizoprionodon species and representative of three closely related genera (Loxodon,Eusphyra, and Sphyrna) were quantified and analyzed using geometric morphometric methods. Ancestral tooth shapes were estimated using the modern shape data mapped onto a phylogeny created using molecular data, and a Brownian motion model of evolution. These shapes were compared to fossil teeth from Rhizoprionodon sp. and Sphyrna spp. to evaluate the accuracy of the estimated ancestral shapes.

Modern teeth at the front of the jaw displayed the most dramatic shape differences between jaws and positions. Teeth from each genus could be distinguished, but species within Rhizoprionodon could not. Fossil tooth shapes most closely resembled those of modern teeth, indicating that tooth shape did not change according to the Brownian motion model used to predict ancestral shapes.


Jianying Gu

Home Department: Genetics, Development & Cell Biology

Major Professor: Dr. Xun Gu
Co-Major Professor: Dr. Dan Nettleton

Title: Functional divergence and genome evolution of vertebrate protein kinases

Abstract: The emerging complete and nearly complete genome sequences have provided a significant amount of materials for large-scale comparative genomic analysis. Novel methods have been developed to elucidate the function of gene products and functional interacting networks. Many of these post-genomic attempts have focused on unveiling the evolutionary forces that have shaped the network organization. Among various evolutionary forces, duplication of functional domain, individual gene, chromosomal segment, or entire genome has long been thought as primary resource for the function novelties in a vast number of gene families. It is therefore intriguing to quantitatively trace the changes of evolutionary constraints after a duplication event.

This study is focused on the exploitation of the functional divergence and evolutionary patterns in vertebrate kinase complements (denoted as kinomes) and kinase-regulated signaling transduction pathways, using a combinatorial statistical and evolutionary approach. The analysis of an individual kinase gene family (Jak), protein tyrosine kinase superfamily, and a kinase mediated signaling transduction pathway (TGF- b ) showed that functional divergence (altered functional constraint) after (domain or gene) duplication is a general pattern. Moreover, the age distribution of the vertebrate kinomes showed that (1) The major kinase-related animal specific signal-transduction pathways have been generated through an ancient continuous domain shuffling (or duplications) during the time period from early stage of eukaryotes to metazoan evolution; (2) Vertebrate tissue-specificity of signal-transduction is facilitated by large-scale duplication event(s) in the early stage of vertebrates; and (3) The kinase pseudogenes are generated through either segmental duplication or retrotransposition very recently.


Home Department: Genetics, Development & Cell Biology

Major Professor: Dr. Patrick Schnable
Co-Major Professor: Dr. Dan Ashlock

Title: Adaption of Multiclustering to the Analysis of Microarray Data

Presentation Date: Thursday, May 10, 2007

Abstract: Clustering has become an integral part of microarray data analysis and interpretation. It is helpful to reduce the scale of information generated by microarray experiment to the level that biologists can generate hypothesis. There is a danger that artifacts induced by clustering methods can cause misinterpretation of the data. Clustering method that can accurately capture the natural structure of the data would be a useful tool for biologists to discovery the biological meaning buried in the data. To this end, a new clustering algorithm, called K-means multiclustering, is introduced. The method can avoid the artifacts induced by distance or similarity metrics by amalgamating the results of many K-means clusterings.

Results: The multiclustering algorithm is a model-free clustering method. It is found to be reliable and consist in capturing the underlying data structure with high accuracy that is competitive with model based clustering and superior to other methods on synthetic micorarry data generated in a manner consistent with the hypothesis of model based clustering. The algorithm has a high level of immunity to artifacts introduced by the metric used to measure the distance between data points. It can successfully cluster data sets which are designed to have different shapes and variation and cannot be correctly clustered by traditional clustering method. The cut plot computed by this method is a very simple and useful summary of the data structure. A detailed view of the formation of clustering can also be generated by the method to reveal the underlying hierarchical structure of data set.


Home Department: Genetics, Development & Cell Biology

Major Professor: Dr. Daniel Voytas
Co-Major Professor: Dr. Mei Hong

Title: Characterization of the Sireviruses:  A unique group of Ty1/copia LTR retrotransposons in plants

Abstract: Plant genomes have allowed the expansion of many types of mobile genetic elements.  LTR retrotransposons are a subclass of mobile genetic elements that replicate using an RNA intermediate.  The Pseudoviridae (Ty1/copia) are a family of LTR retrotransposons, and the Sireviruses are one of three genera in the Pseudoviridae.  The Sireviruses have features that set them apart from classical retrotransposons.  Different members of the Sireviruses show great variability in their genomic structures and the translational tricks they use to express their encoded proteins.  For example, we have shown that the SIRE1 elements of soybean use stop codon suppression to express their Env-like protein.  Secondly, some monocot members of the Sireviruses may use a bypass mechanism to translate Pol.
 
Another notable feature of the Sireviruses is that most carry additional coding information in the form of an open reading frame (ORF) referred to as an env-like ORF, and all have encoded extra coding information in their gag gene.  The env-like ORF has caused speculation that these elements are plant retroviruses, although no experimental evidence has determined this to be true.  However, using a yeast two-hybrid screen, we have discovered an interaction between multiple Sirevirus Gags and a family of related host cell proteins referred to as dynein light chain LC8 and LC6.  The LC8 and LC6 proteins are highly conserved in eukaryotes and are components of the dynein and myosin-V motors.  LC8 can bind cargo (cell proteins or virus particles) to allow movement along the cytoskeleton.  Thus, one hypothesis is that the interaction of the Sirevirus Gags with LC8 or LC6 may allow for movement of the Sirevirus virus-like particles or transposition intermediates within a cell (for example, from cytoplasmic to nuclear compartments).  If true, this would not only represent the first example of a movement mechanism for any retrotransposon, but it also illustrates how plant retrotransposons and plant viruses use similar mechanisms to achieve a common goal.  In addition, an initial characterization of the expression and localization of the Arabidopsis thaliana LC8/LC6 gene family was completed.


Home Department: Biochemistry, Biophysics and Molecular Biology

Major Professor: Dr. Mark Hargrove

Title: Structural Characterization of Ligand Binding in Hexacoordinate Hemoglobins

Presentation: Thursday, August 17, 2006

Abstract: The goal of biophysics is to study the structures of the components of living organisms and to understand the mechanics of the processes of life. Hemoglobin is a well suited model for this study. As an essential component of the life blood of mammals, and easy to obtain in large quantities, hemoglobin and its monomeric partner myoglobin are two of the most well studied and characterized components of life. Yet hemoglobin studies continue to reveal new forms of hemoglobin, raising new questions, functional possibilities, and research opportunities. My research focuses on hemoglobins classified as hexacoordinate. I have focused particularly on the structural characterization of these proteins upon ligand binding. Included below for your benefit are a list of abbreviations and terms used in my talk along with their definitions.

Hbs -- hemoglobins
hxHbs -- hexacoordinate
hemoglobins trHbs -- truncated hemoglobins
nsHbs -- nonsymbiotic hemoglobins
sHbs -- symbiotic hemoglobins
SynHb -- Hb from Synechocystis
ferric -- oxidized (3+ iron)
ferrous -- reduced (2+ iron)
ligand -- small binding molecule like oxygen
k' -- rate of ligand binding
K -- equilibrium binding association constant
soret -- optical peak around 390-440nm

List of publications: Hoy, J. A., Kundu, S., Trent, J. T., 3rd, Ramaswamy, S., and Hargrove, M. S. (2004). The crystal structure of Synechocystis hemoglobin with a covalent heme linkage. J Biol Chem. 279, 16535-16542. Trent, J. T., 3rd, Kundu, S., Hoy, J. A., and Hargrove, M. S. (2004). Crystallographic analysis of synechocystis cyanoglobin reveals the structural changes accompanying ligand binding in a hexacoordinate hemoglobin. J Mol Biol. 341, 1097-1108. Smagghe, B. J., Kundu, S., Hoy, J. A., Halder, P., Weiland, T. R., Savage, A., Venugopal, A., Goodman, M., Premer, S., Hargrove, M. S. (2006). Role of Phenylalanine B10 in Plant Nonsymbiotic Hemoglobins. Biochemistry Aug 15;45(32):9735-9745. Hoy, J. A., Smagghe, B. J., Halder, P., Hargrove, M. S. (2006). Covalent heme attachement in Synechocystis hemoglobin is required to prevent ferrous heme dissociation. Manuscript in preparation. Hoy, J. A., Robinson, H., Trent, J. T., Kakar, S., Smagghe, B. J., Hargrove, M. S. (2006). Crystal structure of a nonsymbiotic plant hemoglobin; implications for the evolution of oxygen transport. Manuscript in preparation.

Bio: BA in Physics and BA in Humanities from Wartburg College, Waverly, Iowa 1996 MS in Physics from Iowa State University, 1999 Temporary Instructor of Physics, ISU, 1999 - 2000 PhD studies in Biophysics, ISU, 2000 - 2006 Postdoc in Hargrove Lab


LaRon Hughes - M.S.

Home Department: Genetics, Development & Cell Biology

Major Professor: Dr. Karin Dorman
Co-Major Professor: Dr. Susan Carpenter

Title: EIAV DB:  A comprehensive Equine Infectious Anemia (EIAV) Virus database

M.S. Abstract: A major problem in biology is the storage and retrieval of biological data in a meaningful and efficient manner. With the advent of mass sequencing projects, such as the human genome project, the need to store, retrieve, and analyze sequence data is stronger than ever before. The following thesis tackles a small part of this problem by presenting techniques, models, and applications for productively storing and retrieving a set of related viral sequences in a central data bank. The thesis begins by providing an overview of the relational database and its role in storing biological data. The main chapter of the thesis is a description of a novel relational database application (EIAV DB). EIAV DB is a central repository of Equine Infectious Anemia Virus sequence and feature information. The models and application provide insight into technologies that help alleviate the storage and retrieval problem.


LaRon Hughes - PhD

Home Department: Animal Science

Major Professor: Dr. Jim Reecy
Co-Major Professor: Dr. Vasant Honavar

Title: Hypothesis building using the Animal Trait Ontology

PhD Abstract: With the advent of sequencing projects in model organisms, humans, and domesticated livestock species, the need for storage, retrieval, and analysis of genomics information for these animals has become important.  The Animal Trait Ontology (ATO) is an ontology that has been created to store the relationships between farm animal traits for several domesticated farm animals.  The Collaborative Ontology Building (COB) editor was used to create and edit the ATO.  An online ontology browser has been developed to search and browse the ontology and to view the relationships between the terms.  Some of the traits in the ontology are linked to associated quantitative trait loci (QTL) information for each species through a tool called the Comparative Animal QTL (CAQ) tool which allows users to compare QTL experiments in livestock species.  The tool allows QTL experiments to be compared based on 1) one trait given one species, and 2) two traits given one species.  The effectiveness of the tool is recorded in the form of a data and statistical analysis which demonstrates its use in examining pleiotropic effects for traits in the pig.  In addition, the Human and Animal Trait Ontology is discussed and it will form an agglomeration of several different species ontologies, including the ATO, that will form a consensus for describing phenotypes and traits across different disease models.


Cizhong Jiang

Home Department: Genetics, Development & Cell Biology

Major Professors: Dr. Thomas Peterson
Co-Major Professor:
Dr. Xun Gu

Title: Computational and molecular analysis of Myb gene family

Abstract: Myb proteins are defined by a highly conserved DNA-specific binding domain termed Myb, which is composed of approximately 50 amino acids with constantly spaced tryptophan residues. Multiple copies of Myb domains often exist as tandem repeats within a single protein. There are up to four tandem Myb repeats present in Myb proteins identified to date (termed R0R1R2R3 hereafter). In our study, we collected additional Myb genes, and performed a series of phylogenetic analyses to explore the evolutionary origin of Myb genes. The results suggest that the Myb gene family originated from an ancient one Myb-box gene. One and two intragenic duplications produced R2R3 and R1R2R3 Myb genes, respectively, which then co-existed in the primitive eukaryotes and gave rise to the currently extant Myb genes. Based on our results, we proposed that plant R1R2R3 Myb genes were derived from R2R3 Myb genes by gain of the R1 repeat through an ancient intragenic duplication; this gain model is more parsimonious than the previous proposal that plant R2R3 Myb genes were derived from R1R2R3 Myb genes by loss of the R1 repeat. The phylogenetic analysis of isolated individual Myb repeats indicates that R2 repeat has evolved more slowly than the R1 and R3 repeats. However, it is not clear which repeat is the most ancient one.

Another goal of our project is to classify and predict functions of Myb genes. We clustered the closely-related Myb genes into subgroups from Arabidopsis and rice on a basis of sequence similarity and phylogeny. The gene structure analysis revealed that both the positions and phases of introns are conserved in the same subgroup, although these differ between subgroups. Conserved motifs were detected in C-terminal coding regions within subgroups, and these motifs exist specifically in Myb genes. We also found that Myb genes with similar functions are clustered together. In contrast, no conserved regulatory elements were identified in the divergent non-coding regions. Additionally, the distribution pattern of introns in the phylogenetic tree indicates that Myb domains originally had a compact size without introns. Non-coding sequences were inserted and the splicing sites were conserved during evolution.


Brent Kronmiller

Home Department: Plant Pathology

Major Professors: Dr. Roger Wise
Co-Major Professor:
Dr. Xun Gu

Title: Assembly And Annotation Tools For Analysis Of Large Contiguous Regions Of The Maize Genome

Abstract: LTR retrotransposons make up significant portions of many of the longer grass genomes, their repeat sequences across the genome, their terminal repeats, and their nested cluster configuration make assembly of sequence clones challenging and identification of gene regions difficult.  In this thesis I provide tools necessary for both assembly and annotation of highly repetitive genomes and use these tools to construct the currently two longest maize sequence contigs.
      In the first part of the thesis I present TEnest, annotation and visualization software for transposable elements in grass genomes.  TEnest identifies all fragmented transposable elements within the input sequence and reconstructs each to the original insertion state.  This provides a chronological display of the nesting pattern of clustered transposable elements.  For LTR retrotransposons TEnest calculates an estimated age since insertion based on the divergence of its paired LTRs.  I also provide a case study of TEnest on the available maize genome sequence.  TEnest shows the distribution of transposon families, ages of insertion, and frequencies of solo LTRs.  In addition I provide a phylogenetic analysis of retrotransposon families showing the estimated ages since insertion of LTR retrotransposons cluster with their sequence identity, showing that LTR retrotransposons experience specific intervals of extreme proliferation to expand across the genome.
      In the second part of this thesis I introduce our two contiguous maize sequences, rf1-associated contigs rf1-C1 and rf1-C2 sequenced from maize B73.  These are the two longest contiguous maize sequences and provide previously unmatched sequence quality for answering many questions surrounding the makeup of the maize genome.  Here, using TEnest, we propose two maize assembly techniques for highly repetitive regions.  The use of these processes has allowed us to provide the high quality contiguous sequences of the rf1-associated region and will assist researchers with assembly of difficult sequence clones.  We show definite separation between gene and repeat regions.  The rf1-associated contigs, when compared to the rice and sorghum genomes, show conserved macro-colinearity between genes across the long sequences.  But at a closer look at individual gene islands show there is micro-non-colinearity across the analyzed grass species.
      The third section of this thesis compares the B73 rf1-associated sequence contigs with two BACs sequenced from Wf9-BG, an Rf1 containing maize line.  Here we identify four genes in an island corresponding to a similar gene island in B73, however a fifth gene is missing from Wf9-BG.  Two repeat clusters surround the gene island; one matches its counterpart in B73, the second repeat cluster does not align to B73.  Leading up to this area of recombination we observe a drastically increased frequency of polymorphisms.


Alain Laederach

Home Department: Chemical and Biological Engineering

Major Professor: Dr. Peter Reilly
Co-Major Professor: Dr. Amy Andreotti

Title: Protein-Carbohydrate and Protein-Protein interactions: Using models to better understand and predict specific molecular recognition

Abstract: Any molecular recognition event results in a change in the free energy of the system. The extent of this change is related to the association constant, such that the more negative the free energy change is, the tighter the interaction between receptor and ligand. Protein-carbohydrate interactions play a critical role in signal transduction, innate immunity and metabolism. Modeling these interactions is somewhat complicated by the inherent flexibility of carbohydrates as well as their relatively large number of functional groups. An empirical scoring function for docking carbohydrates to proteins will be presented specifically tailored to predict both the correct binding orientation and free energy of binding of the carbohydrate-ligand/protein-receptor complex. This new scoring function can predict free energies of binding to within 1.1 kcal/mol residual standard error, a definite improvement over existing scoring functions which result in standard errors well over 2 kcal/mol. Application of automated docking methodology to determine carbohydrate recognition specificity of the c-type Lectin, human Surfactant Protein D will also be presented. In the second part of the thesis, the role of p-stacking interactions (e.g. between Tyr side chains) in stabilizing protein folds will be discussed. A 17-residue peptide derived from the naturally occurring anti-microbial peptide Tachyplesin I is investigated using NMR spectroscopy. NOE cross peaks were observed confirming the existence of this interaction in solution. In the final part of the thesis, a quantitative NMR investigation into the self-association behavior of the regulatory domains of several Tec family member kinases will be presented. Of particular interest, self-association within Bruton's Tyrosine Kinase (Btk) regulatory domains occurs through the formation of an asymmetric homodimer. Together this work demonstrates the importance of rigorous biophysical characterization of bio-molecular recognition events and how interdependent computational modeling and experimentation are.


Michael Lawrence

Home Department: Statistics

Major Professor: Dr. Dianne Cook
Co-Major Professor: Dr. Eve Wurtele

Title: Interactive graphics, graphical user interfaces and software interfaces for the analysis of biological experimental data and networks

Abstract: Biologists need to analyze and comprehend increasingly large and more complex experimental data. These experimental data are multivariate, where each row corresponds to a biological entity, and each column corresponds to the level of an experimental treatment. Biological experiments often produce multiple data sets, each describing one aspect of the system, such as the transcriptome recorded by a microarray or metabolome recorded using gas chromatography mass spectrometry (GC-MS). A biochemical network model provides a conceptual system-level framework for integrating data from different sources. Effective use of graphics enhances the comprehension of data, and interactive graphics permit the analyst to actively explore data, check its integrity, satiate curiosities and reveal the unexpected. Interactive graphics have not been widely applied as a means for understanding data from biological experiments. This thesis addresses these needs by providing new methods and software that apply interactive graphics in coordination with numerical methods to the analysis of biological data, in a manner that is accessible to biologists.


Nicole Leahy

Home Department: Genetics, Development & Cell Biology

Major Professor: Dr. Daniel Ashlock
Co-Major Professor: Dr. John Mayfield

Title: Pseudophyte evolutionary algorithm: A simple computational model of parapatric speciation s

Abstract: The Pseudophyte Evolutionary Algorithm (PEA) is an individual-based computer model of a population of haploid, annual plants used to examine the process of speciation in a patchy environment. The model incorporated both pre-mating and post-zygotic mechanisms for the evolution of reproductive isolation via pollen selection and offspring inviability, respectively. The PEA allows speciation as an emergent property rather than an explicit feature of the model to understand how environmental patchiness, number and arrangement of loci, and reproductive output of individuals affected the strength of isolating mechanisms as well as the rate at which these evolve. The effect of how genotypes were mapped to phenotypes was also explored to examine the sensitivity of the PEA to alternate representations.


Jae-Hyung Lee

Home Department: Genetics, Development & Cell Biology

Major Professor: Dr. Drena Dobbs
Co-Major Professor: Dr. Kai-Ming Ho

Title: Analysis of protein-RNA and protein-peptide interactions in Equine Infectious Anemia Virus (EIAV) infection

Abstract: Macromolecular interactions are essential for virtually all cellular functions including signal transduction processes, metabolic processes, regulation of gene expression and immune responses. This dissertation focuses on the characterization of two important macromolecular interactions involved in the relationship between Equine Infectious Anemia Virus (EIAV) and its host cell in horse: i) the interaction between the EIAV Rev protein and its binding site, the Rev-responsive element (RRE) and ii) interactions between equine MHC class I molecules and epitope peptides derived from EIAV proteins. EIAV, one of the most divergent members of the lentivirus family, has a single-stranded RNA genome and carries several regulatory and structural proteins within its viral particle. Rev is an essential EIAV regulatory encoded protein that interacts with the viral RRE, a specific binding site in the viral mRNA. Using a combination of experimental and computational methods, the interactions between EIAV Rev and RRE were characterized in detail. EIAV Rev was shown to have a bipartite RNA binding domain containing two arginine rich motifs (ARMs). The RRE secondary structure was determined and specific structural motifs that act as cis-regulatory elements for EIAV Rev-RRE interaction were identified. Interestingly, a structural motif located in the high affinity Rev binding site is well conserved in several diverse lentiviral genomes, including HIV-1. Macromolecular interactions involved in the immune response of the horse to EIAV infection were investigated by analyzing complexes between MHC class I proteins and epitope peptides derived from EIAV Rev, Env and Gag proteins. Computational modeling results provided a mechanistic explanation for the experimental finding that a single amino acid change in the peptide binding domain of the equine MHC class I molecule differentially affects the recognition of specific epitopes by EIAV-specific CTL. Together, the findings in this dissertation provide novel insights into the strategy used by EIAV to replicate itself, and provide new details about how the host cell responds to and defends against EIAV upon the infection. Moreover, they have contributed to our understanding of the macromolecular recognition events that regulate these processes.


Darrin Lemmer

Home Department: Biochemistry, Biophysics & Molecular Biology

Major Professor: Dr. Gloria Culver
Co-Major Professor: Dr. Drena Dobbs

Title: CAVEMol: an immersive 3D molecule viewer

Abstract: As the number of solved molecular structures deposited with the Protein Data Bank (PDB) increases, so too does the desire for more advanced ways of using this data. Traditional applications for viewing and manipulating molecular structures create a computer-generated model on a standard desktop computer screen. The display may employ some method of stereography to create the illusion of depth, but generally the user just sees a flat image. The user is able to interact with the molecule by magnifying it to get closer look at a particular area of interest, or by rotating it along an arbitrary axis, thus allowing all sides of the molecule to be seen, though only one side is in view at any given time. The user may also be able to see changes in the molecule over time whereby each conformation of the molecule is a separate frame of an animation, or they may even be able to make modifications to the structure in real time. Regardless of the amount of control the user has over the molecule, however, one thing remains the same: the user experiences the molecule as though it were an object floating behind the monitor screen which they can indirectly control using a mouse or other pointing device.
An immersive environment, on the other hand, provides a new paradigm for molecular visualization, allowing the user a much more realistic interaction with the molecule. The user becomes part of the viewing experience, traversing a molecule as though walking or flying within it. The molecule can completely surround them on all sides, giving them a true sense of the size and shape of the molecule in three dimensions. The user may also interact with the object directly, moving and rotating it with their hands rather than a mouse.
This approach should prove particularly valuable for operations such as “interactive docking,” which allows a user to manipulate the interface between two molecules to identify favorable interaction sites. While this can be done to a degree in today’s desktop molecule viewers, the operation is difficult and time consuming. Because today’s viewers are limited to a flat screen display, a user can only attempt to dock two molecules in two dimensions at a time. When the structure is rotated, more often than not the third dimension is not properly aligned. Realigning the third dimension invariably breaks one or both of the first two. The result is a long and frustrating cycle of alignment rotation and realignment. By allowing direct manipulation in all three dimensions simultaneously, the immersive perspective eliminates this cycle.

This thesis presents the design and implementation of CAVEMol, a molecular visualization application for immersive environments. I will also give an overview of molecular visualization and immersive environments, and then discuss future work that can be done in this area as well as applications where molecular visualization in an immersive environment can be particularly valuable.


Haining Lin

Home Department: Computer Science

Major Professor: Dr. Xiaoqiu Huang
Co-Major Professor: Dr. Daniel Voytas

Title: BACAP: An assembly program for heirarchial shotgun sequencing

Abstract: We propose a sequence-based algorithm BACAP to assemble BAC sequences generated from hierarchical shotgun sequencing. Our approach relies on sequence similarity rather than physical mapping. It follows the “overlap-layout-consensus” framework used for shotgun sequencing data. BACAP uses heuristic methods to achieve efficiency and accuracy. It was tested on four simulated data sets of 200 BAC-size sequences each and one real data set of 228 rice BACs from TIGR. The average running time was 25 minutes on one 900 MHz IA-64 GenuineIntel Itanium machine. Our results show that BACAP can quickly and accurately accomplish some BAC assembly tasks without physical mapping information.


Yuan Lin

Home Department: Genetics, Development & Cell Biology

Major Professor: Dr. Xun Gu
Co-Major Professor: Dr. Xiaoqiu Huang

Title: The Relationship of Sequence Similarity and Expression Pattern Similarity between Yeast Genes within Gene Families

Abstract: After gene duplication, the sequence and expression patterns of duplicated genes diverge. It is known that the function divergence of duplicated genes could be related to the divergence of both their coding sequence and expression profile mainly caused by the sequence change of regulatory region. But it is not known if the sequence divergence and expression pattern divergence are correlated. Former research by Andreas Wagner showed there is at most very weak correlation between them. On the contrary, our research shows there is a strong correlation between the sequence similarity and expression profile similarity if the sequences are quite conserved; the degree of coexpression of duplicated genes is consistent to their duplication order.


Patricia Lonosky

Home Department: Botany

Major Professor: Dr. Steve Rodermel
Co-Major Professor: Dr. Vasant Honavar

Title: Proteomics of the developing chloroplast in maize

Abstract: Chloroplast protein expression profiles during the light-induced biogenesis of the maize plastid were determined from 2D gel analysis. During five time points of this ‘greening’ process (0,2,4,12, and 48 hours post-illumination), maize plant tissue was collected, plastids isolated, and protein precipitated and separated in two dimensions using 2D protein gels. From these proteome maps, quantities of spots were analyzed by: Principal Components Analysis, hierarchical pairwise average linkage cluster analysis, Adaptive Resonance Theory 2 cluster analysis, and Self Organizing Map cluster analysis to determine chloroplast protein expression profiles. 54 spots representing 26 proteins were identified by MALDI-TOF mass spectrometry and used to verify the protein expression profiles. Two main conclusions were drawn from this data: 1) ART2 may be a useful clustering tool for expression data, and 2) different forms or modifications of the same protein show different expression patterns.


Wiesia Mentzen

Home Department: Genetics, Development & Cell Biology

Major Professor: Dr. Eve Wurtele
Co-Major Professor: Dr. Xun Gu

Title: From Pathway to Regulon in Arabidopsis

Abstract: I apply combined bioinformatic approaches using genomic and transcriptomic data to investigate the fatty acid biosynthesis pathway, at the molecular level, and in the context of the system biology of Arabidopsis.  Fatty acids are essential components of all known bacterial and eukaryotic cells with critical role in cells as energy reserves and the metabolic precursors for biological membranes. The pathway for fatty acid synthesis seems to be conserved across all living systems. Acetyl-CoA carboxylase, a member of a superfamily of biotin-dependent enzymes, catalyzes the first committed step of the fatty acid biosynthesis pathway. Phylogenetic study exposed complex and intertwined evolutionary histories of this family, with multiple domain fusions and rearrangements. As revealed by meta-analysis of a wide array of Arabidopsis transcriptomic data, fatty acid biosynthesis is transcriptionally regulated, and this regulation not only extends across all pathway reactions, but also some substrate- and cofactor-producing reactions, thus defining a major transcriptionally co-regulated pathway. I extend the meta-analysis of the transcriptome to find groups of coexpressed genes (also called modules, or regulons) in the Arabidopsis genome. Major functionally-coherent gene groups were identified. These comprise development, information processing, defense, and metabolism, as well as tissue- and organelle- specific processes.


Erin Myers

Home Department: Ecology, Environment and Organismal Biology

Major Professor: Dr. Fred Janzen
Co-Major Professor: Dr. Dean Adams

Title: Post-orbital color pattern variation and the evolution of a radiation of turtles (Graptemys)

One of the most deeply studied areas in the field of evolutionary biology is the formation and maintenance of new species, as well as the variation in the rate and extent to which taxa radiate. A range of evolutionary processes, from ecological adaptation to sexual selection and reinforcement, can lead to the formation of new species. However, the formation of new species likely results from several isolating mechanisms acting in concert. The map turtle complex (genus: Graptemys) is an excellent model system for exploring the nature of speciation given its exceptional species richness and high levels of morphological diversity, particularly in facial coloration patterns. This research utilizes an integrative approach to establish the role of post- orbital color patterns in species diversification and maintenance. This multi- faceted approach will incorporate aspects of phylogenetics, population and quantitative genetics, morphometrics, and behavior to assess morphological evolution within species and across the genus. The phylogeny of map turtles was characterized by a hard polytomy indicating rapid speciation. Across the genus, morphological evolution occurred in a parsimonious manner. Within species, both morphology and genetics exhibited a pattern of isolation by distance. Temperature significantly influence coloration patterns and multivariate heritability was generally low. Finally, in behavior trials, neither males nor females spent significantly more time with members of their own species. In all projects, the signatures of sexual selection or reinforcement were absent or equivocal where they would be expected if they were the main forces continuing to shape interactions among map turtle species. The results of this research indicate that role of past and on-going selection on coloration pattern within the map turtle clade has been limited, indicating that post-orbital coloration was not the driving factor in the radiation of this turtle clade. Alternative explanations for map turtle species richness are explored.


Myron Peto

Home Department: Biochemistry, Biophysics and Molecular Biology

Major Professor: Dr. Robert Jernigan
Co-Major Professor: Dr. Drena Dobbs

Title: Studies of Protein Designability using Reduced Models

Presentation: July 9, 2007

Abstract: One the most important problems in computational structural biology is protein designability, that is, why protein sequences are not random strings of amino acids but instead show regular patterns that encode protein structures. Many previous studies that have attempted to solve the problem have relied upon reduced models of proteins. In particular, the 2D square and the 3D cubic lattices together with reduced amino acid alphabets have been examined extensively and have lead to interesting results that shed some light on evolutionary relationship among proteins. Here, additionally to the 2D square lattice, we study the 2D triangular and 3D face centered cubic (fcc) lattices, we perform designability studies using different shapes embedded in the 2D square lattice, and we use machine learning algorithms to classify binary sequences folding to highly- or poorly-designable conformations. In the first part of the thesis we extend the transfer matrix method to the 2D triangular lattice. The transfer matrix method is a highly efficient method of enumerating all conformations within a compact lattice area that has earlier been developed for the 2D square and 3D cubic lattices. In addition we also enumerated all compact conformations within simple geometries on the 2D triangular and 3D face centered cubic (fcc) lattices using a standard backtracking algorithm. In the second part of the thesis we described protein designability studies on various shapes in the 2D square lattice using a reduced hydrophobic-polar (HP) amino acid alphabet. We used a simple energy function that counted the number of H-H, H-P and P-P interactions within a restricted set of protein shapes that have the same number of residues and non-bonded contacts. We found a difference in the designabilities of different protein shapes. Finally, in the third part of the thesis we used standard machine learning algorithms to classify two classes of protein sequences. We first performed a designability study for two shapes, using a binary HP alphabet, on the 2D triangular lattice and separated highly- and poorly-designable conformations. Highly-designable conformations had many sequences folding to them with the lowest energy and poorly-designable conformations had few or no sequences folding to them. Sequences were classified as highly- or poorly-designable depending on whether they folded to highly- or poorly-designable structures. Using several machine learning algorithms such as Decision Tree, Naïve Bayes, and Support Vector Machine, we were able to classify highly- and poorly-designable sequences with high accuracy.


Bradley Powers

Home Department: Mathematics

Major Professor: Dr. Dan Ashlock
Co-Major Professor: Dr. Kirk Moloney

Title: The Effect of Tags on Non-Local Adaptation

Abstract: This project investigates in greater depth in phenomenon of non-local adaptation previously observed in an evolutionary model based on the game iterated Prisoner’s Dilemma. Non-local adaptation is the ability of an agent or population of agents to perform well against other agents that share no common history or ancestry with them. Populations of agents both with and without identifying tags are evolved to perform noisy iterated prisoner’s dilemma on a toroidal grid. The agents consist of a finite state machine specialized for playing iterated prisoner’s dilemma and simple tag recognition capability. The populations are allowed to evolve for 10,000 generations and the state of the world is stored every 500 generations. Populations from these samples are placed in competition with populations from generation 10,000. This procedure is repeated for varying levels of overall mutation rate, with and without tags, and varying frequencies of tag related mutations. Non-local adaptation is seen in these populations, however, tags seem to slow the acquisition of non-local adaptation. Although