Available Academic Analysis Software
Duke Software for Gene Expression Data Analysis
ChipComparer
This program is designed to identify common genesets on different microarrays. The program will first map each probeset ID in your selected micorarray chips (A and B) to corresponding LocusID using LocusLink and UniGene dbs, then report the probeset ID pair (from A and B) that refer to the same gene locus (if same organism) or the orthlogs (if different organisms, using NCBI-HomoloGene). To access Chip comparer, go to this web site.
Duke Integrated Genomics (DIG) Annotation System
A web-based data management and information system for retrieval of a variety of functional information sources linked to the genes included on most microarrays utilized within the Duke Microarray Center. The system also provides access to a powerful method for literature searching.
File Merger
This program will merge the contents from Source and Target Files, according to the shared identifiers, or the correlationship in the Bridging file.
GATHER
Gene Annotation Tool to Help Explain Relationships is a computational tool that analyzes lists of genes identified in high throughput experiments. It will identify significant Gene Ontology functions, biological pathways, interacting proteins, microRNA regulation, transcription factor regulation, or other biological systems to develop a deeper insight into the biology underlying the gene signature. It can infer novel functions and successfully predicted 90% of the functions in an evaluation over a broad range of gene groups.
Profiler
Work in programs within the Center for Applied Genomics and Technology has focused on the development of statistical methodologies for supervised analysis to classify and predict breast cancer outcomes using gene expression data. * One of these, named Profiler, was developed to identify gene expression profiles that correlate with the phenotype of interest. This group of genes is then employed for a binary regression analysis to identify gene expression patterns, expressed as principal components, that represent underlying structure in the data. The goal is to identify the patterns of gene expression that most highly correlate with and define the cellular state of interest. Programming work in the Duke center has focused on the generation of a graphical user interface that allows investigators to access the program. A tab-delimited text file of the raw expression values and another text file of the gene names is loaded into Profiler. All samples are normalized within the program and a binary re gression analysis is established through testing numerous genes that define principal components which predict a phenotype of interest.
Tree Profiler
Uses classification and regression tree methods for binary classification. One approach that has been found useful in a number of studies in cancer and other contexts is the use of multiple metagene summaries as predictors of a phenotype. The metagenes are simply gene expression signatures representing patterns of co-expression generated by initial clustering of expression data. The classification tree strategy provides a mechanism to sample many sources of data to predict a phenotype, such as ER status in breast tumors. The advantage in this approach is the ability to utilize multiple forms of data; this could be multiple metagenes (clusters), and other genomic data such as DNA methylation patterns or DNA copy number patterns, protein profiles, or other biological and clinical data.
Other Academic Software for Gene Expression Data Analysis
Cluster
A program developed by Michael Eisen for the analysis of gene expression data using hierarchical clustering, self-organizing maps (SOMs), k-means clustering, and principal component analysis. Hierarchical clustering methods described in Eisen et al. (1998) PNAS 95:14863.
dChip
A program developed by Wing Wang's lab at Harvard known as DNA-Chip Analyzer (dChip). dChip is a model-based analysis of oligonucleotide expression arrays that uses a probe-sensitivity index to capture the response characteristic of a specific probe pair and calculates model-based expression indexes. dChip is described in Genome Biol. 2001;2(8):RESEARCH0032 and J Cell Biochem Suppl 2001;Suppl 37:120-5.
Gene Set Enrichment Analysis (GSEA)
The BROAD Institute has developed a computational method (GSEA) that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes). A quick tutorial and further information can be found at their web site.
PAM
Prediction Analysis for Microarrays. This software was developed at Stanford. It provides class prediction and survival analysis for genomic expression data mining. Performs sample classification from gene expression data, via "nearest shrunken centroid method'' of Tibshirani, Hastie, Narasimhan and Chu (2002): "Diagnosis of multiple cancer types by shrunken centroids of gene expression" (PNAS website). PNAS 2002 99:6567-6572 (May 14). For survival outcomes, implements 'supervised principal components' method. See Semi-supervised methods for predicting patient survival from gene expression papers (Bair and Tibshirani) PLOS Biology, and Prediction by supervised principal components (Bair, Hastie, Paul, Tibshirani) Stanford tech report Version 2.0 (Mar 7, 2005) featuring: survival analysis via supervised principal components, Estimates prediction error via cross-validation Provides a list of significant genes whose expression characterizes each diagnostic class. Works with data from both cDNA and oligo microarrays. Can also be applied to protein expression data and SNP chip data.
SAM
This software was developed at Stanford and is known as Significance Analysis of Microarrays. SAM identifies genes with statistically significant changes in expression by assimilating a set of gene specific t tests (click here for more information about SAM).
TIGR TM4 Microarray Software Suite (developed at TIGR)
The TM4 suite of tools consist of four major applications, Microarray Data Manager (MADAM), TIGR_Spotfinder, Microarray Data Analysis System (MIDAS), and Multiexperiment Viewer (MeV), as well as a Minimal Information About a Microarray Experiment (MIAME)-compliant MySQL database, all of which are freely available to the scientific research community at TIGR's Software Download Site. Although these software tools were developed for spotted two-color arrays, many of the components can be easily adapted to work with single-color formats such as filter arrays and GeneChips™(Affymetrix).
TreeView
A program developed by Michael Eisen for visualization of the results of microarray data analysis. One can graphically browse results of clustering and other analyses from Cluster. Supports tree-based and image based browsing of hierarchical trees. Multiple output formats for generation of images for publications.



