We use MRI and statistics to identify neuroanatomical regions that may be involved with the effects of genotype, treatment, etc. Our goal is to identify candidate genes that may have a role in driving these effects. We designed the ABIgeneRMINC package for this goal: to integrate gene expression data from the Allen Brain Institute and neuroanatomy data with the RMINC package. We identify candidate genes by finding which genes have preferential expression in neuroanatomical regions of interest. These regions are decided based on thresholding statistical significance in RMINC. Although the package was designed to work harmoniously with RMINC, it is itself flexible enough to handle general manipulation of Allen Brain Institute gene expression data, including functions to query Allen's API and download/read data – all within the R environment.
Easiest way to install is using devtools
ABIgeneRMINC uses several tidyverse packages. It also works best with MINC toolkit version 1.9.18 () and the RMINC package.
Neuroanatomy statistics can be performed usual way: MRI to obtain images, MBM.py to register to consensus (Iterative Model Building (MBM 2.0)(MBM.py)), RMINC to compute statistics on the log determinants at every voxel (RMINC). We can also load some useful libraries.
library(tidyverse) # for convenient syntax library(RMINC) # for conducting neuroanatomy statistics library(MRIcrotome) # for visualisation library(ABIgeneRMINC) # for gene expression analysis
The following is data from Scholz et al. (2015) looking at the neuroanatomical effect of environmental enrichment.
# Anatomy file and mask. This is typically the consensus average in the study anatfile_study = "/projects/egerek/jscholz/enriched/Exp2/hr_reg/hr_masks/template.mnc" maskfile_study = "/projects/egerek/jscholz/enriched/Exp2/hr_reg/hr_masks/mask_dil.mnc" # Statistic file. This is in the same space as the anatomy file whose numerical value at each voxel represents a quantity of statistical interest (i.e. t-statistic) statsfile = "/projects/egerek/jscholz/enriched/Exp2/hr_stats/standard_vs_maze_0.2_tvalue-conditionMaze.mnc" # Visualise the statistics to illustrate the effect of environmental enrichment on the mouse brain. dev.new(width=7.0, height=4.5) sliceSeries(nrow = 5, ncol=5, begin=54, end=254) %>% anatomy( mincArray(mincGetVolume(anatfile_study)), low=300, high=5600) %>% overlay( mincArray(mincGetVolume(statsfile)), low=2.0, high=6.5, symmetric = T) %>% anatomySliceIndicator( mincArray(mincGetVolume(anatfile_study)), low=300, high=5600) %>% legend("t-statistics") %>% draw()
Before running gene expression analysis, we need to register the consensus average (defining the study space) to the Allen Gene Expression Atlas template from the Allen Brain Institute (ABI). The transformation mapping the consensus average to the ABI template is stored as an XFM file.
# Register anatomy to ABI template anatomy_transform_path = ABI_template_align( anatfile_study, maskfile_study, xfm_files_dir = "study_to_abi_transforms", keep_files = T ) anatomy_transform_path # "study_to_abi_transforms/anatomy_to_abi.xfm"
There is also an image created to evaluate the quality of the registration. Contours of the ABI template are mapped onto the resampled consensus average.
The statistics can now be transformed to the gene expression atlas space.
resampled_statistics_50um = ABI_resample( source_volume = statsfile, xfm_transform = anatomy_transform_path, source_volume_resampled_file = 'resampled_statistics_50um.mnc' )
Remember that ABI gene expression data is in 200um voxels, where as the template we downloaded and used for registration is 50um. Thus, we need to downsample the statistics to 200um before we can run gene expression analysis. Just as the " target_space_name = 'adult gene expression' " parameter to the above command and you are good to go for gene expression analysis.
resampled_statistics = ABI_resample( source_volume = statsfile, target_space_name = 'adult gene expression', xfm_transform = anatomy_transform_path, source_volume_resampled_file = 'resampled_statistics.mnc' )
Note that we can save the resampled statistics in a file (ex. 'resampled_statistics.mnc') so that, in the future, we can simply start the gene expression analysis from here.
Let us run the gene expression analysis. If you are running this at MICe, all the gene expression files have been downloaded and saved. Use the following commands to get the paths from the CSV file.
gene_expression_df = read_csv('/projects/egerek/matthijs/2015-07-Allen-Brain/Allen_Gene_Expression/adult_gene_expression.csv') gene_expression_paths = gene_expression_df$zip_files
We will set the target threshold to be 2 (i.e. regions with statistics above 2 are part of the ROI). We will also consider the statistics to be symmetric (i.e. -2 and below is also part of the ROI). We will use the whole brain as the contrast region. We will also run the jobs locally and in parallel on 4 cores.
gene_fold_change = adult_gene_expression_analysis( anatomy_statistics = 'resampled_statistics.mnc' , gene_expression_paths = gene_expression_paths , target_threshold = 2.0 , contrast_threshold = NULL , symmetric_statistics = T , parallel = c('local',4) ) # If you are NOT at MICe, set "gene_expression_paths = NULL" and the function with automatically download the data from the ABI
To run Gene Ontology Enrichment Analysis, use the GOrilla_run function. In this example, we will run ontology on the top 7000 genes with the highest fold-change.
gene_fold_change_df = gene_expression_df %>% select(acronym,Section_Data_ID) %>% mutate(fold_change = gene_fold_change) background_genes = gene_fold_change_df %>% arrange(desc(fold_change)) %>% pull(acronym) target_genes = background_genes[1:7000] gene_ontology_enrichment_df = GOrilla_run( target.genes=target_genes, background.genes=background_genes )
You can also find human diseases enriched in these mouse genes using the disease_ontology_enrichment_analysis function.
disease_ontology_enrichment_df = disease_ontology_enrichment_analysis( genes_target=target_genes, genes_background=background_genes ) disease_ontology_enrichment_df # # A tibble: 11,703 x 6 # disease_id disease p q `Enrichment (N, B,... Genes # <chr> <chr> <dbl> <dbl> <chr> <list> # 1 C0596887 mathematical abili... 9.36e-8 0.000583 1.29 (12174,679,39... <chr [... # 2 C0557874 Global development... 9.96e-8 0.000583 1.19 (12174,1434,3... <chr [... # 3 C1305855 Body mass index 2.09e-7 0.000817 1.25 (12174,843,39... <chr [... # 4 C0454644 Delayed speech and... 5.47e-7 0.00160 1.31 (12174,517,39... <chr [... # 5 C4048268 Cortical visual im... 7.13e-7 0.00167 1.68 (12174,115,39... <chr [... # 6 C3810365 Central visual imp... 1.29e-6 0.00252 1.72 (12174,98,397... <chr [... # 7 C1858120 Generalized hypoto... 3.97e-6 0.00664 1.21 (12174,899,39... <chr [... # 8 C0344482 Hypoplasia of corp... 4.97e-6 0.00708 1.35 (12174,335,39... <chr [... # 9 C0232466 Feeding difficulti... 5.94e-6 0.00708 1.3 (12174,445,397... <chr [... # 10 C0856975 Autistic behavior 6.05e-6 0.00708 1.44 (12174,222,39... <chr [... # # ... with 11,693 more rows
If you want to download them for yourself, you can query the ABI API and download them. The ABIgeneRMINC package has a couple convenient tricks though you may want to try. To find all the gene in the Allen Brain Institute you can use the "list.all.allen.genes" function. The output is a list of gene acronyms that the Allen Brain Institute has data for.
library(ABIgeneRMINC) ## List all the genes in the Allen Gene Expression dataset ## Save the output in "genes.txt" ## Perform API queries (needs xml2 package) in parallel (parallel package) list.all.allen.genes(outfile="genes.txt",parallel=4) #should take about 10min
Then, you can use the "find.gene.experiment" function, to find all the experiments regarding the genes and the URL to download them. For example, if I want all gene expression data associated with Brain-Derived Neurotrophic factor (BDNF), I do the following.
df=find.gene.experiment('Bdnf') df # gene slices ExperimentID URLs # 1 Bdnf coronal 79587720 http://api.brain-map.org/grid_data/download/79587720?include=energy # 2 Bdnf sagittal 75695642 http://api.brain-map.org/grid_data/download/75695642?include=energy
Simply copy the URL into a browser to download the gene expression file. Unzip the downloaded file and the gene expression data is in the "energy.raw" file. Using both the "list.all.allen.genes" and "find.gene.experiment", you can download all the gene expression data (~17G).
In the slices column, "coronal" are experiments generated from taking coronal slices across the whole brain, and sagittal are experiments generated from taking sagittal slices across half the brain.
We can use the "read.raw.gene" function to read gene expression data into R. For example, if I am interested in Bdnf expression, I know where the gene expression data is located online. At MICe, the data has been downloaded and unzipped. The filenames are:
### Reading Gene Expression from downloaded and unzipped data files library(ABIgeneRMINC) genefile1="/projects/egerek/matthijs/2015-07-Allen-Brain/Allen_Gene_Expression/zip_files/Bdnf_sid79587720.zip" genefile2="/projects/egerek/matthijs/2015-07-Allen-Brain/Allen_Gene_Expression/zip_files/Bdnf_sid75695642.zip" ## Read gene expression data as a 1D-vector genedata1=read.raw.gene(genefile1) genedata2=read.raw.gene(genefile2)
If you do not want to manually download and unzip the data, "read.raw.genes" function will read data from URLs as well. However, it is recommended you download the data to make it easier to debug.
### Reading Gene Expression from URL library(ABIgeneRMINC) geneURL1="http://api.brain-map.org/grid_data/download/79587720?include=energy" geneURL2="http://api.brain-map.org/grid_data/download/75695642?include=energy" ## Read gene expression data as a 1D-vector genedata1=read.raw.gene(geneURL1,url=TRUE) genedata2=read.raw.gene(geneURL2,url=TRUE)
Finally, to analyse gene expression we need to convert from the orientation used in the Allen Brain Institute, to the orientation used in MINC files. This can be done quickly using the "allenVectorTOmincVector" function. For those who want to know exactly how the orientations differ, please read the info following the code.
## Convert Allen orientation to MINC orientation genedata1=allenVectorTOmincVector(genedata1) genedata2=allenVectorTOmincVector(genedata2)
Allen Gene Expression Data is stored as a 1-D array going from X=Anterior-to-Posterior, Y=Superior-to-Inferior, and Z=Left-to-Right (dimensions written from fastest changing index to slowest).
MINC Data is stored as a 1-D array going from X=Left-to-Right, Y=Posterior-to-Anterior, Z=Inferior-to-Superior (dimensions written from fastest changing index to slowest)
Converting between the two orientations simply requires reordering of indices which is done by the following function and its inverse:
allenVectorTOmincVector : To go from Allen orientation to MINC orientation
mincVectorTOallenVector : To go from MINC orientation to Allen orientation
Let's use RMINC to visualise the gene expression data.
anatfile = system.file('extdata/Dorr_resampled_200um.mnc',package="ABIgeneRMINC") mincPlotSliceSeries( anatomy=mincArray(mincGetVolume(anatfile)), statistics=mincArray(genedata1), col=colorRampPalette(c("darkgreen","yellowgreen"))(255), symmetric=FALSE,legend="Bdnf Expression",low=2,high=6.5)
Bdnf Expression (Allen Experiment 79587720)
mincPlotSliceSeries( anatomy=mincArray(mincGetVolume(anatfile)), statistics=mincArray(genedata2), col=colorRampPalette(c("darkgreen","yellowgreen"))(255), symmetric=FALSE,legend="Bdnf Expression",low=2,high=6.5)
Bdnf Expression (Allen Experiment 75695642)
Notice, how the data from Experiment 75695642, encompasses only half the brain. This is true of most of the gene expression data from ABI, which were generated from sagittal slices spanning half the brain. A subset of genes have expression data generated from coronal slices spanning the whole brain (like Experiment 79587720). While gene expression in the left and right hemispheres of the mouse brain is likely not completely identical, it might be useful to reflect the data to fill in the missing hemisphere in the sagittal experiments for exploratory analysis. This can be done using the 'midplane.reflect' function. Below is code that reads Experiment 75695642, and reflects it across the sagittal midplane.
## Read Gene Data genedata=read.raw.gene("http://api.brain-map.org/grid_data/download/75695642?include=energy",url=TRUE) ## Reflect it across sagittal midplane genedata.reflected=midplane.reflect(genedata) ## Make image mincPlotSliceSeries( anatomy=mincArray(mincGetVolume(anatfile)), statistics=mincArray(allenVectorTOmincVector(genedata.reflected)), col=colorRampPalette(c("darkgreen","yellowgreen"))(255), symmetric=FALSE,legend="Bdnf Expression",low=2,high=6.5)
Figure 3: Bdnf Expression (Allen Experiment 75695642) after sagittal reflection
## Write Reflected Data write.raw.gene(genedata.reflected,filename="Bdnf_sid75695642_reflected.raw")
If you prefer to do gene expression analysis by structure instead of voxels, you can try the "unionize" function. This function computes the sum, mean, and standard deviations of gene expression for annotated structures.
## Find which structures have highest gene expression ## Read annotations (can be downloaded from ABI) labelfile="/projects/egerek/matthijs/2015-07-Allen-Brain/Allen_Gene_Expression/labels/allen_grid_labels.raw" labels=read.raw.gene(labelfile,labels=TRUE) ## unionize overall the labels. Sort by highest mean expression labels.to.sum=sort(unique(labels)) labels.to.sum=labels.to.sum[labels.to.sum!=0] df=unionize(grid.data=genedata.reflected, #vector to unionize labels.to.sum=labels.to.sum, #sum all labels labels.grid=labels #the vector of labels ) df=df[order(df$mean,decreasing=TRUE),] ## Read csv which contains definition of labels (can be downloaded from ABI) labeldefs=read.csv("/projects/egerek/matthijs/2015-07-Allen-Brain/Allen_Gene_Expression/labels/allen_gridlabels_structures.csv") ## Sort sort structures by highest gene expression df$structures=labeldefs[match(labeldefs$id,df$labels),'name'] ## View head(df) ##Return # labels sum mean stdev structures #555 1059 190.89058 10.605032 5.197725 Field CA2, pyramidal layer #154 267 45.97817 9.195634 4.589667 Orbital area, lateral part, layer 2/3 #258 479 312.93908 7.632661 4.986437 Piriform area, polymorph layer #625 10704 684.76912 7.284778 4.978902 Posterior parietal association areas, layer 6b #237 440 113.66420 7.104013 2.756221 posteromedial visual area, layer 1 #553 1054 340.99053 7.103969 4.993027 Visceral area, layer 5
Dorr, A. E., Lerch, J. P., Spring, S., Kabani, N., & Henkelman, R. M. (2008). High resolution three-dimensional brain atlas using an average magnetic resonance image of 40 adult C57Bl/6J mice. Neuroimage, 42(1), 60-69.
Scholz, J., Allemang-Grand, R., Dazai, J., & Lerch, J. P. (2015). Environmental enrichment is associated with rapid volumetric brain changes in adult mice. NeuroImage, 109, 190-198.