Child pages
  • Preferential Spatial Gene Expression in Neuroanatomy
Skip to end of metadata
Go to start of metadata

We use MRI and statistics to identify neuroanatomical regions that may be involved with the effects of genotype, treatment, etc. Our goal is to identify candidate genes that may have a role in driving these effects. We designed the ABIgeneRMINC package for this goal: to integrate gene expression data from the Allen Brain Institute and neuroanatomy data with the RMINC package. We identify candidate genes by finding which genes have preferential expression in neuroanatomical regions of interest. These regions are decided based on thresholding statistical significance in RMINC. Although the package was designed to work harmoniously with RMINC, it is itself flexible enough to handle general manipulation of Allen Brain Institute gene expression data (steps 2-6), including functions to query Allen's API and download/read data – all within the R environment.

Step 0: Obtain statistics map and visualize

Our first step is to look at the effect on neuroanatomy by treatment, genotype etc. We do this the usual way: MRI to obtain images, MBM.py to register to consensus (Iterative Model Building (MBM 2.0)(MBM.py)), RMINC to compute statistics on the log determinants at every voxel (RMINC). The following is data from Scholz et al.[1] looking at the neuroanatomical effect of environmental enrichment. At MICe, the MRI template can be found here–"/projects/egerek/jscholz/enriched/Exp2/hr_reg/hr_masks/template.mnc"–and the statistics (positive is bigger in enriched mice) is here–"/projects/egerek/jscholz/enriched/Exp2/hr_stats/standard_vs_maze_0.2_tvalue-conditionMaze.mnc".

Let's take a look at what we got. You can use any visualization tools, but in this demo I'll be using RMINC's handy mincPlotSliceSeries function:

library(plotrix)
library(RMINC)
anatfile="/projects/egerek/jscholz/enriched/Exp2/hr_reg/hr_masks/template.mnc"
statsfile="/projects/egerek/jscholz/enriched/Exp2/hr_stats/standard_vs_maze_0.2_tvalue-conditionMaze.mnc"

mincPlotSliceSeries(
     anatomy=mincArray(mincGetVolume(anatfile)),
     statistics=mincArray(mincGetVolume(statsfile)),
     anatLow=1500,anatHigh=7592,symmetric=TRUE,
     legend="t-statistics",low=2,high=6.5)

Step 1: Transform to Allen Brain Atlas

To find genes with preferential spatial expression, we need to incorporate gene expression data from the Allen Brain Institute. To transform our consensus average to the Allen Brain Institute space, we concatenate two transforms.

TRANSFORM 1: Consensus average to Dorr et al[2] atlas. You can use a program of your choice to register; I used the "atlas_to_atlas.pl" script (minctracc program).

> atlas_to_atlas.pl /projects/egerek/jscholz/enriched/Exp2/hr_reg/hr_masks/template.mnc /axiom2/projects/software/mouse-brain-atlases/Dorr_2008/ex-vivo/Dorr_2008_average.mnc

TRANSFORM 2: Dorr et al[2] atlas to Allen Nissl Atlas. This transform can be found here–/projects/egerek/dfernandes/allenCCFV3_to_dorr_registration/dorr_to_allen.xfm

By concatenating the 2 transforms, we get the transform from our registration to the allen atlas.

> xfmconcat [TRANSFORM 1] [TRANSFORM 2] reg_to_allen.xfm

Now that we have the transformation, we can transform our statistics data to Allen Space. I included these transformed statistics in the 'ABIgeneRMINC' package, so you can continue with this demo.

> mincresample -like /projects/egerek/matthijs/2015-07-Allen-Brain/Allen_Gene_Expression/labels/allen_grid_mask.mnc -transform reg_to_allen.xfm /projects/egerek/jscholz/enriched/Exp2/hr_stats/standard_vs_maze_0.2_tvalue-conditionMaze.mnc stats.mnc

For good measure, let us transform the template in the same way as well.

> mincresample -like /projects/egerek/matthijs/2015-07-Allen-Brain/Allen_Gene_Expression/labels/allen_grid_mask.mnc -transform reg_to_allen.xfm /axiom2/projects/software/mouse-brain-atlases/Dorr_2008/ex-vivo/Dorr_2008_average.mnc template.mnc

Lets take a look at our transformed template and statistics:

library(plotrix)
library(RMINC)
anatfile="template.mnc"
statsfile="stats.mnc"

mincPlotSliceSeries(
     anatomy=mincArray(mincGetVolume(anatfile)),
     statistics=mincArray(mincGetVolume(statsfile)),
     anatLow=1500,anatHigh=7592,symmetric=TRUE,
     legend="t-statistics",low=2,high=6.5)

Figure 1: Neuroanatomical Changes due to Environmental Enrichment

Looks a little blurrier, doesn't it? That is because the Allen gene expression data has an isotropic resolution of 200 microns, versus our MRI data with 40-56 microns. It is worth keeping in mind the spatial resolution we will be limited by.

Step 2: Loading ABIgeneRMINC package

With our neuroanatomy data in the same space as the Allen Gene Expression data, we can now find preferential gene expression. A convenient package for this is the "ABIgeneRMINC" package (https://github.com/DJFernandes/ABIgeneRMINC). You can install it any number of ways. I like using the "devtools" package.

library(devtools)
install_github(repo="DJFernandes/ABIgeneRMINC")

 

Step 3: Download Gene Expression Data

We need to download gene expression data from the Allen Brain Institute (ABI). If you are at MICe, the data was downloaded a year ago and can be found here – /projects/egerek/matthijs/2015-07-Allen-Brain/Allen_Gene_Expression/raw_data/ .

If you want to download them for yourself, you can query the ABI API and download them. The ABIgeneRMINC package has a couple convenient tricks though you may want to try. To find all the gene in the Allen Brain Institute you can use the "list.all.allen.genes" function. The output is a list of gene acronyms that the Allen Brain Institute has data for.

library(ABIgeneRMINC)
## List all the genes in the Allen Gene Expression dataset
## Save the output in "genes.txt"
## Perform API queries in parallel
## Parallel requires foreach & doMC packages

list.all.allen.genes(outfile="genes.txt",parallel=4)

Then, you can use the "find.gene.experiment" function, to find all the experiments regarding the genes and the URL to download them. For example, if I want all gene expression data associated with Brain-Derived Neurotrophic factor (BDNF), I do the following.

df=find.gene.experiment('Bdnf')

The result is the following data frame.

  gene   slices ExperimentID URLs
1 Bdnf  coronal     79587720 http://api.brain-map.org/grid_data/download/79587720?include=energy
2 Bdnf sagittal     75695642 http://api.brain-map.org/grid_data/download/75695642?include=energy

Simply copy the URL into a browser to download the gene expression file. Unzip the downloaded file and the gene expression data is in the "energy.raw" file. Using both the "list.all.allen.genes" and "find.gene.experiment", you can download all the gene expression data (~17G). 

In the slices column, "coronal" are experiments generated from taking coronal slices across the whole brain, and sagittal are experiments generated from taking sagittal slices across half the brain.

Step 4: Reading Raw Gene Expression Data into R

We can use the "read.raw.gene" function to read gene expression data into R. For example, if I am interested in Bdnf expression, from step 3, I know where the gene expression data is located online. At MICe, the data has been downloaded and unzipped. The filenames are: 

/projects/egerek/matthijs/2015-07-Allen-Brain/Allen_Gene_Expression/raw_data/coronal/Bdnf_sid79587720/energy.raw
and 
/projects/egerek/matthijs/2015-07-Allen-Brain/Allen_Gene_Expression/raw_data/sagittal/Bdnf_sid75695642/energy.raw
The following commands reads them into R.

 

### Reading Gene Expression from downloaded and unzipped data files
library(ABIgeneRMINC)

genefile1="/projects/egerek/matthijs/2015-07-Allen-Brain/Allen_Gene_Expression/raw_data/coronal/Bdnf_sid79587720/energy.raw"

genefile2="/projects/egerek/matthijs/2015-07-Allen-Brain/Allen_Gene_Expression/raw_data/sagittal/Bdnf_sid75695642/energy.raw"

## Read gene expression data as a 1D-vector
genedata1=read.raw.gene(genefile1)

genedata2=read.raw.gene(genefile2)

 

If you do not want to manually download and unzip the data, "read.raw.genes" function will read data from URLs as well. However, it is recommended you download the data to make it easier to debug.

### Reading Gene Expression from URL
library(ABIgeneRMINC)

geneURL1="http://api.brain-map.org/grid_data/download/79587720?include=energy"

geneURL2="http://api.brain-map.org/grid_data/download/75695642?include=energy"

## Read gene expression data as a 1D-vector
genedata1=read.raw.gene(geneURL1,url=TRUE)

genedata2=read.raw.gene(geneURL2,url=TRUE)

 

Finally, to analyse gene expression we need to convert from the orientation used in the Allen Brain Institute, to the orientation used in MINC files. This can be done quickly using the "allenVectorTOmincVector" function. For those who want to know exactly how the orientations differ, please read the info following the code.

## Convert Allen orientation to MINC orientation
genedata1=allenVectorTOmincVector(genedata1)
genedata2=allenVectorTOmincVector(genedata2)

Allen Gene Expression Data is stored as a 1-D array going from X=Anterior-to-Posterior, Y=Superior-to-Inferior, and Z=Left-to-Right (dimensions written from fastest changing index to slowest).

MINC Data is stored as a 1-D array going from X=Left-to-Right, Y=Posterior-to-Anterior, Z=Inferior-to-Superior (dimensions written from fastest changing index to slowest)

Converting between the two orientations simply requires reordering of indices which is done by the following function and its inverse:

allenVectorTOmincVector : To go from Allen orientation to MINC orientation

mincVectorTOallenVector : To go from MINC orientation to Allen orientation

Step 5: Basic Gene Expression Data manipulations (not necessary)

Let's use RMINC to visualise the gene expression data, on our MRI template (from step 1). 

mincPlotSliceSeries(
     anatomy=mincArray(mincGetVolume(anatfile)),
     statistics=mincArray(genedata1),
     col=colorRampPalette(c("darkgreen","yellowgreen"))(255),
     anatLow=1500,anatHigh=7592,symmetric=FALSE,
     legend="Bdnf Expression",low=2,high=6.5)

 

Figure 2: Bdnf Expression (Allen Experiment 79587720)

 

mincPlotSliceSeries(
     anatomy=mincArray(mincGetVolume(anatfile)),
     statistics=mincArray(genedata2),
     col=colorRampPalette(c("darkgreen","yellowgreen"))(255),
     anatLow=1500,anatHigh=7592,symmetric=FALSE,
     legend="Bdnf Expression",low=2,high=6.5)

Figure 3: Bdnf Expression (Allen Experiment 75695642)

 

 

Notice, how the data from Experiment 75695642, encompasses only half the brain. This is true of most of the gene expression data from ABI, which were generated from sagittal slices spanning half the brain. A subset of genes have expression data generated from coronal slices spanning the whole brain (like Experiment 79587720). While gene expression in the left and right hemispheres of the mouse brain is likely not completely identical, it might be useful to reflect the data to fill in the missing hemisphere in the sagittal experiments for exploratory analysis. This can be done using the 'midplane.reflect' function. Below is code that reads Experiment 75695642, and reflects it across the sagittal midplane.

## Read Gene Data
genedata=read.raw.gene("http://api.brain-map.org/grid_data/download/75695642?include=energy",url=TRUE)
 
## Reflect it across sagittal midplane
genedata.reflected=midplane.reflect(genedata,reflect.dim=1)
 
## Make image
mincPlotSliceSeries(
     anatomy=mincArray(mincGetVolume(anatfile)),
     statistics=mincArray(allenVectorTOmincVector(genedata.reflected)),
     col=colorRampPalette(c("darkgreen","yellowgreen"))(255),
     anatLow=1500,anatHigh=7592,symmetric=FALSE,
     legend="Bdnf Expression",low=2,high=6.5)

 

Figure 3: Bdnf Expression (Allen Experiment 75695642) after sagittal reflection


It is worth noting there is also a "write.raw.gene" function which does the opposite as "read.raw.gene". To write the reflected file:

 

## Write Reflected Data
write.raw.gene(genedata.reflected,filename="Bdnf_sid75695642_reflected.raw")

 

If you prefer to do analysis by structure instead of voxels, you can try the "unionize" function. This function computes the sum, mean, and standard deviations of gene expression for annotated structures.

## Find which structures have highest gene expression
 
## Read annotations (can be downloaded from ABI)
labelfile="/projects/egerek/matthijs/2015-07-Allen-Brain/Allen_Gene_Expression/labels/allen_grid_labels.raw"  

labels=read.raw.gene(labelfile,labels=TRUE)
 
 
## unionize overall the labels. Sort by highest mean expression
labels.to.sum=sort(unique(labels))
labels.to.sum=labels.to.sum[labels.to.sum!=0]
df=unionize(grid.data=genedata.reflected,     #vector to unionize
           labels.to.sum=labels.to.sum,      #sum all labels
           labels.grid=labels               #the vector of labels
           )
df=df[order(df$mean,decreasing=TRUE),]
 
## Read csv which contains definition of labels (can be downloaded from ABI)
labeldefs=read.csv("/projects/egerek/matthijs/2015-07-Allen-Brain/Allen_Gene_Expression/labels/allen_gridlabels_structures.csv")
 
## Sort sort structures by highest gene expression
df$structures=labeldefs[match(labeldefs$id,df$labels),'name']
 
## View
head(df)
##Return
#    labels       sum      mean    stdev                                     structures
#555   1059 190.89058 10.605032 5.197725                     Field CA2, pyramidal layer
#154    267  45.97817  9.195634 4.589667          Orbital area, lateral part, layer 2/3
#258    479 312.93908  7.632661 4.986437                 Piriform area, polymorph layer
#625  10704 684.76912  7.284778 4.978902 Posterior parietal association areas, layer 6b
#237    440 113.66420  7.104013 2.756221             posteromedial visual area, layer 1
#553   1054 340.99053  7.103969 4.993027                         Visceral area, layer 5

 

Step 6: Preferential Expression

BDNF is known to be associated with learning in mice and humans. We also know environmental enrichment is also associated with better learning performance in mice. We can then ask the question: is neuroanatomical changes in mice (Figure 1) have preferential expression of BDNF (Figure 2-4)? We can quantify preferential expression as a fold change metric: 

<math>[Fold Change]=frac{[mean expression in target region]}{[mean expression in contrast region]}</math>

Fold-change above 1 indicates preferential expression in target region. The function to compute this value is "geneFoldChange". You must also know two things before analysis. The first is the brain mask. The Allen annotations can be found here (http://download.alleninstitute.org/informatics-archive/current-release/mouse_annotation/P56_Mouse_gridAnnotation.zip) and I define the mask as anywhere annotations exist. The 'read.raw.gene' can read annotations just fine, provided you give it a 'label' flag. You should download the annotations for yourself and then read them into R. At MICe, I stored the annotations filename in:

"/projects/egerek/matthijs/2015-07-Allen-Brain/Allen_Gene_Expression/labels/allen_grid_labels.raw"

The second is Allen Brain Institute gives voxels where data doesn't exist a value of -1. It is important to know this detail, although 'geneFoldChange' handles this detail for you as it ONLY considers positive gene expression voxels (i.e where data exists) for analysis.

To answer the question at hand, the following code is used:

## Find Brain Mask
labelfile="/projects/egerek/matthijs/2015-07-Allen-Brain/Allen_Gene_Expression/labels/allen_grid_labels.raw"

brainmask=read.raw.gene(labelfile,labels=TRUE)>0
 
## Read BDNF Gene Data
genedata=read.raw.gene("http://api.brain-map.org/grid_data/download/79587720?include=energy",url=TRUE)
 
## Read Neuroanatomical statistics data (from step 1)
statsdata=mincGetVolume("stats.mnc")
 
## Convert it from MINC orientation to Allen Orientation
statsdata=mincVectorTOallenVector(statsdata)
 
## Find preferential expression 
##   Target regions |t-statistic|>2
##   Contrast regions: whole brain
geneFoldChange(abs(statsdata),genedata,maskvector=brainmask,tgt.thresh=2)
 

 

The fold-change value returned is 1.13, indicating that there is 1.13 times greater expression in regions with neuroanatomical differences that the mean expression in the whole brain.  

You can repeat this command in a loop for ALL the genes in the Allen Brain Institute. I did so in the example below, where I had previously downloaded all the gene expression data from the Allen brain Institute and put the filenames in a csv file. In the example, I read the csv file, extract the filenames vector, and apply the "geneFoldChange" function to every gene. Furthermore, I only take coronal experiments, or reflect the sagittal experiments across the sagittal midplane so it covers the whole brain (I had previously done this for all the sagittal experiments and included them in the csv file).

## Find Brain Mask and convert it into MINC orientation
labelfile="/projects/egerek/matthijs/2015-07-Allen-Brain/Allen_Gene_Expression/labels/allen_grid_labels.raw"

brainmask=read.raw.gene(labelfile,labels=TRUE)>0
brainmask=allenVectorTOmincVector(brainmask)

## Read CSV containing filenames
df=read.csv("/projects/egerek/matthijs/2015-07-Allen-Brain/Allen_Gene_Expression/gene_ids_stats2.csv")
 
## Subset to extract only the coronal and reflected sagittal experiments
df=subset(df,cutting_plane %in% c('sagittal_reflected','coronal'))
 
## Extract filenames, and convert to characters
genefiles=as.character(df$raw_data)
 
## Read Neuroanatomical statistics data (from step 1)
statsdata=mincGetVolume("stats.mnc")
 
## Find preferential expression 
##   Target regions |t-statistic|>2
##   Contrast regions: whole brain
## NOTE: when "geneFoldChange" receives filenames instead of vectors,
## It reads the filenames and does computations in MINC space
 
## Apply "geneFoldChange" on all genes. 
## I like using the "pbapply" library because it gives a progress bar
## but you don't need to. Just use the "lapply" instead of "pbapply"
## if you don't feel like loading another library
## This will take about 40 minutes.

library(pbapply)
fcs=pblapply(genefiles,function(file)
            geneFoldChange(abs(statsdata),file,maskvector=brainmask,tgt.thresh=2))
 
## Create Dataframe with the gene acronym, experiment ID, and
##  and foldchange value for easy viewing
## Order by fold change
retdf=data.frame(gene=df$acronym,experiment=df$Section_Data_ID,foldchange=unlist(fcs))
retdf=retdf[order(retdf$foldchange,decreasing=TRUE),]

 

Voilà! A candidate list of genes with preferential expression in neuroanatomy.

References

[1] Scholz, J., Allemang-Grand, R., Dazai, J., & Lerch, J. P. (2015). Environmental enrichment is associated with rapid volumetric brain changes in adult mice. NeuroImage109, 190-198.

[2]Dorr, A. E., Lerch, J. P., Spring, S., Kabani, N., & Henkelman, R. M. (2008). High resolution three-dimensional brain atlas using an average magnetic resonance image of 40 adult C57Bl/6J mice. Neuroimage42(1), 60-69.
  • No labels