Title: | Demo Data for the NGCHM R Package |
---|---|
Description: | Package of demo data for NGCHM vignettes. |
Authors: | Bradley M Broom [aut] , Mary A Rohrdanz [aut, cre], Chris Wakefield [ctb], James Melott [ctb], MD Anderson Cancer Center [cph] |
Maintainer: | Mary A Rohrdanz <[email protected]> |
License: | GPL-3 |
Version: | 1.0.0 |
Built: | 2024-11-08 06:13:37 UTC |
Source: | https://github.com/MD-Anderson-Bioinformatics/NGCHMDemoData |
This package provides several relatively large datasets that can be used to demostrate the capabilities of the Next-Generation Clustered Heat Map (NG-CHM) system.
The included data is a small subset of data from The Cancer Genome Atlas (TCGA) project. There are five data files from three groups of cancer samples:
200 breast cancer (BRCA) samples.
169 glioblastoma (GBM) samples.
547 additional glioblastoma (GBM) samples.
The two Glioblastoma groups were characterized using different technologies (RNASeq and microarrays, respectively).
Note: the NG-CHM system can work with data from any domain, not just biological data.
Note: the included data has been been preprocessed, subsetted, and manipulated in multiple, undocumented ways. It should only be used for evaluating and demonstrating the NG-CHM system and not for deriving any scientific conclusions.
The different data sets overlap with each other in several ways that are documented in the data sets concerned. These overlaps can be easily used to generate NG-CHMs that integrate multiple data sets in a variety of ways to further demonstrate the capabilities of NG-CHMs.
Installation
This package can be installed from MD Anderson Bioinformatics R-universe repository:
install.packages("NGCHMDemoData",
repos = c("https://md-anderson-bioinformatics.r-universe.dev",
"https://cran.r-project.org"))
Maintainer: Mary A Rohrdanz [email protected]
Authors:
Bradley M Broom [email protected] (ORCID)
Other contributors:
Chris Wakefield [email protected] [contributor]
James Melott [email protected] [contributor]
MD Anderson Cancer Center [copyright holder]
TCGA.BRCA.Demo, TCGA.GBM.Demo, TCGA.GBM.EXPR
This dataset is loaded automatically when the package is loaded. It consists of two related parts:
A matrix of gene expression data.
A vector containing the TP53 mutation status of each sample in the matrix.
TCGA.BRCA.ExpressionData, TCGA.BRCA.TP53MutationData
A subset of the breast cancer (BRCA) expression data from TCGA
A numeric data matrix with 3437 rows and 200 columns.
Row labels are gene symbols (e.g. TSPAN6). The NG-CHM label type is bio.gene.hugo
.
Column labels are TCGA barcodes up to the sample/vial field (16 characters total, e.g. TCGA-AO-A0JJ-01A). The NG-CHM label type is bio.tcga.barcode.sample.vial
.
Data has been log-transformed (min 1, max 21.75322)
TCGA.BRCA.Demo, TCGA.BRCA.TP53MutationData
TP53 mutation data for TCGA breast cancer (BRCA) samples
A length 200 character vector.
Each element of the vector is either "WT" or "MUT".
Element names are TCGA barcodes up to the sample/vial field (16 characters total, e.g. TCGA-AO-A0JJ-01A)
TCGA.BRCA.Demo, TCGA.BRCA.ExpressionData
This dataset is loaded by calling data(TCGA.GBM.Demo)
.
The loaded data consists of two related parts:
A matrix of gene expression data.
A vector containing the TP53 mutation status of each sample in the matrix.
TCGA.GBM.ExpressionData, TCGA.GBM.TP53MutationData
Load using data('TCGA.GBM.EXPR')
.
A numeric data matrix with 2000 rows and 547 columns.
The data was generated using microarray platforms.
Row labels are gene symbols (e.g. KRT19). The NG-CHM label type is bio.gene.hugo
.
Column labels are TCGA barcodes up to the center field (28 characters total, e.g. TCGA-02-0001-01C-01R-0177-01). The NG-CHM label type isbio.tcga.barcode.sample.vial.portion.analyte.aliquot
.
Data has been log-transformed (min 2.196606, max 14.41321).
The data has no column labels in common with the data in TCGA.GBM.ExpressionData (as expected), but at the participant level (first 12 characters) there are 158 columns in common and at the vial level (first 16 characters) there are 152 in common. There are 1098 genes in common. This permits several types of NG-CHMs integrating the two data sets.
Load using data('TCGA.GBM.Demo')
.
A numeric data matrix with 3540 rows and 169 columns.
Row labels are gene symbols (e.g. SYK). The NG-CHM label type is bio.gene.hugo
.
Column labels are TCGA barcodes up to the center field (28 characters total, e.g. TCGA-06-0178-01A-01R-1849-01). The NG-CHM label type isbio.tcga.barcode.sample.vial.portion.analyte.aliquot
.
Data has been log-transformed and row centered (min -6.373672, max 9.701261).
This data set and TCGA.BRCA.ExpressionData have 1225 genes (rows) in common.
See TCGA.GBM.EXPR for commonalities with that data set.
TCGA.GBM.Demo, TCGA.GBM.TP53MutationData, TCGA.GBM.EXPR
Load using data('TCGA.GBM.Demo')
.
A length 169 character vector.
Each element of the vector is either "WT" or "MUT".
Element names are TCGA barcodes up to the center field (28 characters total, e.g. TCGA-06-0178-01A-01R-1849-01).
TCGA.GBM.Demo, TCGA.GBM.ExpressionData