The MDIBL Bioinformatics Core facilitates data management and analysis for comparative functional genomics research by providing:
The Core also supports communication networks to facilitate collaborative research and resource sharing and the confidential exchange of data.
Significant sequence analysis software resources are available to MDIBL and visiting investigators.
Online Sequence Analysis Server
The Decypher suite of sequence analysis tools from TimeLogic® are publicly available at: http://decypher.mdibl.org. Custom nucleotide or protein sequence data sets may be included upon request (Contact Dr. Carolyn Mattingly). The suite includes the following software programs:
| Tool |
Description |
BLAST |
Sequence similarity search tools |
Smith-Waterman |
Sequence similarity search tool |
HMM (Hidden-Markov Model) |
Domain search tool |
HMM Frame Search |
Domain search tool that allows for frame shifts in nucleotide sequences |
ClustalW |
Sequence alignment tool |
Gene Detective |
Genomic sequence analysis tool |

Other Sequence Analysis Software (local installations available at MDIBL) (return to navigation)
Sequence Analysis Tools |
Description |
Washington University BLAST |
A freely available sequence analysis program used for data curation. |
NCBI BLAST |
A freely available sequence analysis program used for data curation. |
TimeLogic DeCypher |
Hardware accelerated sequence analysis software (BLAST, HMM, and Smith-Waterman) used for data curation. |
ClustalW |
A freely available multiple sequence alignment program used for data curation. |
ClustalX |
A freely available graphical user interface for ClustalW. |
MUSCLE |
A freely available multiple sequence alignment program used for data curation. |
T-coffee |
A freely available multiple sequence alignment software package used for data curation. |
PHYLIP |
A freely available phylogenetic analysis software package used for data curation. |
Lasergene |
Comprehensive sequence analysis software package. |
PAUP |
A phylogenetic analysis software package used for data curation. |
TreeView |
A freely available phylogenetic tree display and editing software package used for data curation. |
Vector NTI Suite |
A sequence analysis software package used for data curation. |
SeaView |
graphic tools for sequence alignment and molecular phylogeny. |
MEME/MAST |
MEME is a motif detection program using expectation/maximization algorithms. MAST is a program that will search databases for a particular motif. Also available online at: MEME |
PipMaker |
PipMaker computes alignments of similar regions in two DNA sequences. Also available online at: Multipipmaker |
RepeatMasker |
RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. |
Taverna |
Provides a language and software tools to facilitate easy use of workflow and distributed compute technology. |
Gibbs |
Allows for the detection of multiple conserved regions called motifs within either DNA or protein sequences. |
Weeder |
A program for detecting transcription factor binding sites (TFBS) in coregulated genes. |
Phrap |
A program for base calling, sequence comparisons and sequence assembly. |
Phred |
A program that reads DNA sequencing trace files, calls bases and assigns a quality value to each called base. |
trace2dbest |
A tool for processing EST sequencing traces. |
Partigene |
A tool for analysis of EST data sets. |
Contact Dr. Carolyn Mattingly if you have questions about sequence analysis software use or availability.
Tool |
Description |
Basic Local Alignment Search Tool [BLAST] |
A sequence similarity search program (NCBI). |
VISTA [VISTA] |
A suite of programs and databases for comparative analysis of genomic sequences. |
Z-Picture [ZPicture] |
A dynamic alignment and visualization tool that is based on the blastz alignment program used by PipMaker. |
Multi/PipMaker [Pipmaker] |
A tool to computes alignments of similar regions in two DNA sequences. |
UCSC Genome Browser [UCSC] |
A tool to visualize and explore cross-species genomic sequences. |
Ensembl Genome Browser [Ensembl] |
A tool to visualize and explore cross-species genomic sequences. |
RepeatMasker [RepeatMasker] |
A tool to screen DNA sequences in FASTA format against a library of repetitive elements and returns a masked query sequence ready for database searches. |
TRANSFAC [TRANSFAC] |
A database of eukaryotic transcription factors, their genomic binding sites and DNA-binding profiles. |
Biological Databases (return to navigation)
The following is a guide to many valuable, publicly available biological databases that support aspects of comparative functional genomics research.
Sequences
DDBJ/EMBL/GenBank® [DDBJ] [EMBL] [GenBank] |
Mirror database repositories for nucleotide sequences. |
Ensembl [Ensembl] |
A resource for annotated eukaryotic genomes. |
NCBI Gene [NCBI Gene] |
A gateway that integrates information from LocusLink and from genes annotated on Reference Sequences from completely sequenced genomes. |
NCBI GenPept [GenPept] |
A database of automated translations of GenBank® nucleotide sequences. |
NCBI Reference Sequences [RefSeq] |
NCBI Reference Sequences. |
TIGR Gene Indices [TIGR Gene Indices] |
EST clusters for diverse species. |
UniGene [UniGene] |
An experimental system that automatically partitions GenBank® sequences into non-redundant sets of gene-oriented clusters. |
UniProt [UniProt] |
UniProt (Universal Protein Resource) is a comprehensive catalogue of information on proteins. It is a central repository of protein sequence and function created by joining the information contained in UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, and PIR. |
Domains and Structures
InterPro [InterPro] |
A database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences. |
PROSITE [PROSITE] |
A database of protein families and domains consisting of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family a new sequence belongs. |
Protein Data Bank [PDB] |
A repository for processing and distribution of 3-D biological macromolecular structure data. |
Protein Domain Families [ProDom] |
A comprehensive set of protein domain families automatically generated from the SWISS-PROT and TrEMBL sequence databases. |
Protein Families Database of Alignments and HMMs [Pfam] |
A large collection of multiple sequence alignments and hidden Markov models covering common protein domains and families. |
Protein Fingerprints [PRINTS] |
A compendium of protein fingerprints or groups of conserved motifs used to characterize a protein family. |
Simple Modular Architecture Research Tool [SMART] |
A resource that allows the identification and annotation of genetically mobile domains and the analysis of domain architectures. |
Protein-Protein Interactions
Biomolecular Interaction Network Database (BIND) |
A resource for curated protein-protein interactions. |
Database of Interacting Proteins (DIP) |
A catalog of experimentally determined interactions between proteins. |
Environmental/Toxicology
Comparative Toxicogenomics Database [CTD] |
A resource to enhance understanding about the effects of environmental chemicals on human health. |
Environment, Drugs and Gene Expression [EDGE] |
A scientific resource for toxicology-related gene expression information. |
NLM® Toxicogenomics [Toxicogenomics] |
A compendium of Internet resources related to toxicogenomics. |
Species Resources
ArkDB [ArkDB] |
A comprehensive public repository for genome mapping data from farmed and other animal species. |
Ciona intestinalis [Ciona intestinalis] |
DOE Joint Genome Institute Ciona intestinalis genome project. |
FlyBase [FlyBase] |
Database of the Drosophila Genome. |
Fugu Genome Project [Fugu Genome Project] |
Provides access to the whole-genome sequence of the Fugu. |
The Fugu Genomics Project [The Fugu Genomics Project] |
Access to resources geared towards understanding sequence data generated by the human genome project by studying the Fugu ribripes genome. |
Genomic Research on Atlantic Salmon Project [GRASP] |
A project that aims to coordinate all aspects of genomics research on Atlantic salmon. |
Medakafish Homepage [Medakafish Homepage] |
A database for the medaka model organism. |
Mouse Genome Informatics [MGI] |
A resource that provides integrated access to data on the genetics, genomics and biology of the laboratory mouse. |
Online Mendelian Inheritance in Man [OMIM] |
A catalog of human genes and genetic disorders. |
Rat Genome Database [RGD] |
A resource that curates and integrates rat genetic and genomic data and provides access to this data to support research using the rat as a genetic model for the study of human disease. |
Wanda [Wanda] |
A database of duplicated genes in fish. |
Xiphophorus [Xiphophorus] |
A resource dedicated to the Poeciliid fish genus Xiphophorus. |
The Zebrafish Information Network [ZFIN] |
The zebrafish model organism database. |
Other
Gene Ontology Consortium [GO] |
A collaborative effort to provide structured vocabularies that describe gene products consistently across organisms and databases. |
HomoloGene [HomoloGene] |
A resource of curated and calculated orthologs for genes as represented by UniGene or by annotation of genomic sequences. |
Homologous Vertebrate Genes Database [HOVERGEN] |
A database of homologous vertebrate genes allowing visualization of multiple alignments and phylogenetic trees. |
National Center for Biotechnology Information [NCBI] |
A US national resource for molecular biology information. |
NCBI Taxonomy [NCBI Taxonomy] |
The taxonomy of organisms represented in NCBI sequence databases. |
Biomedical Journals (return to navigation)
Subscribed
Electronic subscriptions to the Nature suite of journals are available on the MDIBL campus.
Publicly Available Reference Resources
PubMed
PubMed includes over 16 million citations from MEDLINE and other life science journals for biomedical articles back to the 1950s.
- Academic Search Premier
- Biomedical Reference Collection: Basic
- Business Source Premier
- Clinical Pharmacology
- ERIC
- Health Source – Consumer Edition, MasterFILE Premier
- MEDLINE
- Nursing and Allied Health Collection: Basic
- Regional Business News
Data storage resources are available to facilitate data-heavy research programs (e.g., computational biology and microscopy). Please contact Dr. Carolyn Mattingly for more information.
MDIBL Bioinformatics resources are supported by grants from the National Institute of Environmental Health Sciences (ES014065 and ES003828) and the National Center for Research Resources (RR016463) of the National Institutes of Health.
return to top of page |