We use MRI and statistics to identify neuroanatomical regions that may be involved with the effects of genotype, treatment, etc. Our goal is to identify candidate genes that may have a role in driving these effects. We designed the ABIgeneRMINC package for this goal: to integrate gene expression data from the Allen Brain Institute and neuroanatomy data with the RMINC package. We identify candidate genes by finding which genes have preferential expression in neuroanatomical regions of interest. These regions are decided based on thresholding statistical significance in RMINC. Although the package was designed to work harmoniously with RMINC, it is itself flexible enough to handle general manipulation of Allen Brain Institute gene expression data, including functions to query Allen's API and download/read data – all within the R environment.


Easiest way to install is using devtools



ABIgeneRMINC uses several tidyverse packages. It also works best with MINC toolkit version 1.9.18 (https://bic-mni.github.io/) and the RMINC package.


Neuroanatomy statistics can be performed usual way: MRI to obtain images, MBM.py to register to consensus (Iterative Model Building (MBM 2.0)(MBM.py)), RMINC to compute statistics on the log determinants at every voxel (RMINC). We can also load some useful libraries.

library(tidyverse)       # for convenient syntax
library(RMINC)           # for conducting neuroanatomy statistics
library(MRIcrotome)      # for visualisation
library(ABIgeneRMINC)    # for gene expression analysis

The following is data from Scholz et al. (2015) looking at the neuroanatomical effect of environmental enrichment.

# Anatomy file and mask. This is typically the consensus average in the study
anatfile_study = "/projects/egerek/jscholz/enriched/Exp2/hr_reg/hr_masks/template.mnc"
maskfile_study = "/projects/egerek/jscholz/enriched/Exp2/hr_reg/hr_masks/mask_dil.mnc"

# Statistic file. This is in the same space as the anatomy file whose numerical value at each voxel represents a quantity of statistical interest (i.e. t-statistic)
statsfile = "/projects/egerek/jscholz/enriched/Exp2/hr_stats/standard_vs_maze_0.2_tvalue-conditionMaze.mnc"

# Visualise the statistics to illustrate the effect of environmental enrichment on the mouse brain.
dev.new(width=7.0, height=4.5)
sliceSeries(nrow = 5, ncol=5, begin=54, end=254) %>%
           low=300, high=5600) %>%
           low=2.0, high=6.5, symmetric = T) %>%
           low=300, high=5600) %>% 
       legend("t-statistics") %>%

Before running gene expression analysis, we need to register the consensus average (defining the study space) to the Allen Gene Expression Atlas template from the Allen Brain Institute (ABI). The transformation mapping the consensus average to the ABI template is stored as an XFM file.

# Register anatomy to ABI template
anatomy_transform_path = ABI_template_align(
    anatfile_study, maskfile_study,
    xfm_files_dir = "study_to_abi_transforms", 
    keep_files = T

# "study_to_abi_transforms/anatomy_to_abi.xfm"

There is also an image created to evaluate the quality of the registration. Contours of the ABI template are mapped onto the resampled consensus average.

The statistics can now be transformed to the gene expression atlas space.

resampled_statistics_50um = ABI_resample(
         source_volume = statsfile, 
         xfm_transform = anatomy_transform_path, 
         source_volume_resampled_file = 'resampled_statistics_50um.mnc'

Remember that ABI gene expression data is in 200um voxels, where as the template we downloaded and used for registration is 50um. Thus, we need to downsample the statistics to 200um before we can run gene expression analysis. Just as the " target_space_name = 'adult gene expression' " parameter to the above command and you are good to go for gene expression analysis.

resampled_statistics = ABI_resample(
         source_volume = statsfile, 
         target_space_name = 'adult gene expression',
         xfm_transform = anatomy_transform_path, 
         source_volume_resampled_file = 'resampled_statistics.mnc'

Note that we can save the resampled statistics in a file (ex. 'resampled_statistics.mnc') so that, in the future, we can simply start the gene expression analysis from here.

Let us run the gene expression analysis. If you are running this at MICe, all the gene expression files have been downloaded and saved. Use the following commands to get the paths from the CSV file.

gene_expression_df = read_csv('/projects/egerek/matthijs/2015-07-Allen-Brain/Allen_Gene_Expression/adult_gene_expression.csv')
gene_expression_paths = gene_expression_df$zip_files

We will set the target threshold to be 2 (i.e. regions with statistics above 2 are part of the ROI). We will also consider the statistics to be symmetric (i.e. -2 and below is also part of the ROI). We will use the whole brain as the contrast region. We will also run the jobs locally and in parallel on 4 cores.

gene_fold_change = adult_gene_expression_analysis(
   anatomy_statistics = 'resampled_statistics.mnc' ,
   gene_expression_paths = gene_expression_paths   ,
   target_threshold = 2.0                          ,
   contrast_threshold = NULL                       ,
   symmetric_statistics = T                        ,
   parallel = c('local',4)
# If you are NOT at MICe, set "gene_expression_paths = NULL" and the function with automatically download the data from the ABI

To run Gene Ontology Enrichment Analysis, use the GOrilla_run function. In this example, we will run ontology on the top 7000 genes with the highest fold-change.

gene_fold_change_df = gene_expression_df %>% select(acronym,Section_Data_ID) %>% mutate(fold_change = gene_fold_change)
background_genes = gene_fold_change_df %>% arrange(desc(fold_change)) %>% pull(acronym)
target_genes = background_genes[1:7000]

gene_ontology_enrichment_df = GOrilla_run(

You can also find human diseases enriched in these mouse genes using the disease_ontology_enrichment_analysis function.

disease_ontology_enrichment_df = disease_ontology_enrichment_analysis(

# # A tibble: 11,703 x 6
#    disease_id disease                     p        q `Enrichment (N, B,... Genes  
#    <chr>      <chr>                   <dbl>    <dbl> <chr>               <list> 
#  1 C0596887   mathematical abili...   9.36e-8 0.000583 1.29 (12174,679,39... <chr [...
#  2 C0557874   Global development...   9.96e-8 0.000583 1.19 (12174,1434,3... <chr [...
#  3 C1305855   Body mass index       2.09e-7 0.000817 1.25 (12174,843,39... <chr [...
#  4 C0454644   Delayed speech and...   5.47e-7 0.00160  1.31 (12174,517,39... <chr [...
#  5 C4048268   Cortical visual im...   7.13e-7 0.00167  1.68 (12174,115,39... <chr [...
#  6 C3810365   Central visual imp...   1.29e-6 0.00252  1.72 (12174,98,397... <chr [...
#  7 C1858120   Generalized hypoto...   3.97e-6 0.00664  1.21 (12174,899,39... <chr [...
#  8 C0344482   Hypoplasia of corp...   4.97e-6 0.00708  1.35 (12174,335,39... <chr [...
#  9 C0232466   Feeding difficulti...   5.94e-6 0.00708  1.3 (12174,445,397... <chr [...
# 10 C0856975   Autistic behavior     6.05e-6 0.00708  1.44 (12174,222,39... <chr [...
# # ... with 11,693 more rows

Other useful functions

Downloading ABI gene expression data

If you want to download them for yourself, you can query the ABI API and download them. The ABIgeneRMINC package has a couple convenient tricks though you may want to try. To find all the gene in the Allen Brain Institute you can use the "list.all.allen.genes" function. The output is a list of gene acronyms that the Allen Brain Institute has data for.

## List all the genes in the Allen Gene Expression dataset
## Save the output in "genes.txt"
## Perform API queries (needs xml2 package) in parallel (parallel package)

list.all.allen.genes(outfile="genes.txt",parallel=4) #should take about 10min

Then, you can use the "find.gene.experiment" function, to find all the experiments regarding the genes and the URL to download them. For example, if I want all gene expression data associated with Brain-Derived Neurotrophic factor (BDNF), I do the following.


#   gene   slices ExperimentID URLs
# 1 Bdnf  coronal     79587720 http://api.brain-map.org/grid_data/download/79587720?include=energy
# 2 Bdnf sagittal     75695642 http://api.brain-map.org/grid_data/download/75695642?include=energy

Simply copy the URL into a browser to download the gene expression file. Unzip the downloaded file and the gene expression data is in the "energy.raw" file. Using both the "list.all.allen.genes" and "find.gene.experiment", you can download all the gene expression data (~17G). 

In the slices column, "coronal" are experiments generated from taking coronal slices across the whole brain, and sagittal are experiments generated from taking sagittal slices across half the brain.

Reading gene expression data

We can use the "read.raw.gene" function to read gene expression data into R. For example, if I am interested in Bdnf expression, I know where the gene expression data is located online. At MICe, the data has been downloaded and unzipped. The filenames are: 

The following commands reads them into R.

### Reading Gene Expression from downloaded and unzipped data files



## Read gene expression data as a 1D-vector


If you do not want to manually download and unzip the data, "read.raw.genes" function will read data from URLs as well. However, it is recommended you download the data to make it easier to debug.

### Reading Gene Expression from URL



## Read gene expression data as a 1D-vector


Finally, to analyse gene expression we need to convert from the orientation used in the Allen Brain Institute, to the orientation used in MINC files. This can be done quickly using the "allenVectorTOmincVector" function. For those who want to know exactly how the orientations differ, please read the info following the code.

## Convert Allen orientation to MINC orientation

Allen Gene Expression Data is stored as a 1-D array going from X=Anterior-to-Posterior, Y=Superior-to-Inferior, and Z=Left-to-Right (dimensions written from fastest changing index to slowest).

MINC Data is stored as a 1-D array going from X=Left-to-Right, Y=Posterior-to-Anterior, Z=Inferior-to-Superior (dimensions written from fastest changing index to slowest)

Converting between the two orientations simply requires reordering of indices which is done by the following function and its inverse:

allenVectorTOmincVector : To go from Allen orientation to MINC orientation

mincVectorTOallenVector : To go from MINC orientation to Allen orientation

Visualising gene expression

Let's use RMINC to visualise the gene expression data.

anatfile = system.file('extdata/Dorr_resampled_200um.mnc',package="ABIgeneRMINC")

     symmetric=FALSE,legend="Bdnf Expression",low=2,high=6.5)

Bdnf Expression (Allen Experiment 79587720)

     symmetric=FALSE,legend="Bdnf Expression",low=2,high=6.5)

Bdnf Expression (Allen Experiment 75695642)

Gene expression data processing

Notice, how the data from Experiment 75695642, encompasses only half the brain. This is true of most of the gene expression data from ABI, which were generated from sagittal slices spanning half the brain. A subset of genes have expression data generated from coronal slices spanning the whole brain (like Experiment 79587720). While gene expression in the left and right hemispheres of the mouse brain is likely not completely identical, it might be useful to reflect the data to fill in the missing hemisphere in the sagittal experiments for exploratory analysis. This can be done using the 'midplane.reflect' function. Below is code that reads Experiment 75695642, and reflects it across the sagittal midplane.

## Read Gene Data
## Reflect it across sagittal midplane
## Make image
     symmetric=FALSE,legend="Bdnf Expression",low=2,high=6.5)

Figure 3: Bdnf Expression (Allen Experiment 75695642) after sagittal reflection

It is worth noting there is also a "write.raw.gene" function which does the opposite as "read.raw.gene". To write the reflected file:

## Write Reflected Data

If you prefer to do gene expression analysis by structure instead of voxels, you can try the "unionize" function. This function computes the sum, mean, and standard deviations of gene expression for annotated structures.

## Find which structures have highest gene expression
## Read annotations (can be downloaded from ABI)

## unionize overall the labels. Sort by highest mean expression
df=unionize(grid.data=genedata.reflected,     #vector to unionize
           labels.to.sum=labels.to.sum,      #sum all labels
           labels.grid=labels               #the vector of labels
## Read csv which contains definition of labels (can be downloaded from ABI)
## Sort sort structures by highest gene expression
## View
#    labels       sum      mean    stdev                                     structures
#555   1059 190.89058 10.605032 5.197725                     Field CA2, pyramidal layer
#154    267  45.97817  9.195634 4.589667          Orbital area, lateral part, layer 2/3
#258    479 312.93908  7.632661 4.986437                 Piriform area, polymorph layer
#625  10704 684.76912  7.284778 4.978902 Posterior parietal association areas, layer 6b
#237    440 113.66420  7.104013 2.756221             posteromedial visual area, layer 1
#553   1054 340.99053  7.103969 4.993027                         Visceral area, layer 5


Dorr, A. E., Lerch, J. P., Spring, S., Kabani, N., & Henkelman, R. M. (2008). High resolution three-dimensional brain atlas using an average magnetic resonance image of 40 adult C57Bl/6J mice. Neuroimage42(1), 60-69.

Scholz, J., Allemang-Grand, R., Dazai, J., & Lerch, J. P. (2015). Environmental enrichment is associated with rapid volumetric brain changes in adult mice. NeuroImage109, 190-198.

  • No labels