Capcalc

capcalc is a suite of python programs used to perform coactivation pattern analysis on time series data. It uses K-Means clustering to find a set of “activation states” that represent the covarying patterns in the data.

HTML documentation is here: http://capcalc.readthedocs.io/en/latest/

https://img.shields.io/pypi/v/capcalc.svg https://img.shields.io/pypi/pyversions/capcalc.svg https://img.shields.io/badge/License-Apache%202.0-blue.svg https://readthedocs.org/projects/capcalc/badge/?version=dev https://zenodo.org/badge/83364565.svg https://img.shields.io/badge/NIH-R01--NS097512--01A1-yellowgreen.svg

Citing capcalc

Frederick, B, capcalc [Computer Software] (2016-2022). Available from https://github.com/bbfrederick/capcalc. doi:10.5281/zenodo.7035806

Contents

Introduction

Capcalc

capcalc is a suite of python programs used to perform coactivation pattern analysis on time series data. It uses K-Means clustering to find a set of “activation states” that represent the covarying patterns in the data.

HTML documentation is here: http://capcalc.readthedocs.io/en/latest/

NOTE

This is an evolving code base. I’m constantly tinkering with it. That said, now that I’m releasing this to the world, I’m being somewhat more responsible about locking down stable release points. In between releases, however, I’ll be messing with things. It’s very possible I could break something while doing this, so check back for status updates if you download the code in between releases. I’ve finally become a little more modern and started adding automated testing, so as time goes by hopefully the “in between” releases will be somewhat more reliable. Check back often for exciting new features and bug fixes!

Ok, I’m sold. What’s in here?

  • roidecompose - This program uses an atlas to extract timecourses from a 4D nifti file, producing a text file with the averaged timecourse from each region in the atlas (each integral value in file) in each column. This can be input to capfromtcs. There are various options for normalizing the timecourses.

  • capfromtcs - This does the actual CAP calculation, performing a k-means cluster analysis on the set of timecourses to find the best representitive set of “states” in the file. Outputs the states found and the dominant state in each timepoint of the timecourse.

  • maptoroi - The inverse of roidecompose. Give it a set of cluster timecourses and a template file, and it maps the values back onto the rois

  • statematch - Use this for aligning two state output files. Takes two state timecourse files, and determines which states in the second correspond to which states in the first. Generates a new ‘remapped’ file with the states in the second file expressed as states in the first.

Installation

Required dependencies

capcalc requires some external libraries to be installed first:

  • Python 3.x

  • numpy

  • scipy

  • matplotlib

  • scikit-learn

  • statsmodels

  • nibabel

Installing from pypi

I’ve finally gotten pypi deployment working, so the new easiest way to install capcalc is to simply type:

pip install capcalc

That’s it, I think.

Installing with conda

The other simple way to get this all done is to use Anaconda python from Continuum Analytics. It’s a free, curated scientific Python distribution that is easy to maintain and takes a lot of headaches out of maintaining a distribution. It also already comes with almost all of the dependancies for capcalc installed by default. You can get it here: https://www.continuum.io. You should download the most recent Python 3 version.

After installing Anaconda python, install the remaining dependency To do this most easily, you should have conda-forge as one of your source channels. To add conda-forge, type:

conda config --add channels conda-forge

Then install the additional dependencies:

conda install pyqtgraph nibabel pillow

Done.

Installing capcalc

Once you have installed the prerequisites, cd into the package directory, and type the following:

python setup.py install

to install all of the tools in the package. You should be able to run them from the command line then (after rehashing).

Updating

If you’ve previously installed capcalc and want to update, cd into the package directory and do a git pull first:

git pull
python setup.py install

Docker installation

There is a Docker container with a full capcalc installation. To use this, first make sure you have docker installed and properly configured, then run the following:

docker pull fredericklab/capcalc:VERSIONNUMBER

This it will download the docker container from dockerhub. It’s around 2GB, so it may take some time, but it caches the file locally, so you won’t have to do this again unless the container updates. To use a particular version, replace VERSIONNUMBER with the version of the with container you want (currently the newest version is 1.0.0rc2).

If you like to live on the edge, just use:

docker pull fredericklab/capcalc:latest

This will use the most recent version on dockerhub.

Now that the file is downloaded, you can run any capcalc command in the Docker container. For example, to run capfromtcs itself, you would use the following command (you can do this all in one step - it will just integrate the first pull into the run time if the version you request hasn’t already been downloaded).

Docker runs completely in it’s own selfcontained environment. If you want to be able to interact with disks outside of container, you map the volume to a mount point in the container using the –volume=EXTERNALDIR:MOUNTPOINT[,ANOTHERDIR:ANOTHERMOUNTPOINT] option to docker.

One complication of Docker - if you’re running a program that displays anything (and we do), you’ll have to add a few extra arguments to the docker call. Docker is a little weird about X forwarding - the easiest thing to do is find the IP address of the machine you’re running on (lets call it MYIPADDRESS), and do the following:

xhost +

This disables X11 security - this is almost certainly not the best thing to do, but I don’t have a better solution at this time, and it works.

If you’re on a Mac using Xquartz, prior to this you’ll also have to do three more things.

  1. In Xquartz, go into the security preferences, and make sure “Allow connections from network hosts” is checked.

  2. Tell Xquartz to listen for TCP connections (this is not the default). Go to a terminal window and type:

defaults write org.macosforge.xquartz.X11 nolisten_tcp 0
  1. Log out and log back in again (you only need to do this once - it will stay that way until you change it.)

Then you should be good to go, with the following command:

docker run \
    --network host\
    --volume=INPUTDIRECTORY:/data_in,OUTPUTDIRECTORY:/data_out \
    -it \
    -e DISPLAY=MYIPADDRESS:0 \
    -u capcalc \
    fredericklab/capcalc:VERSIONNUMBER \
        capfromtcs \
            -i manyfiles.txt \
            -o output/manyfiles \
            --sampletime=0.72 \
            --varnorm \
            -m \
            -b 4800 \
            -S 1200 \
            --quality \
            -E default \
            --minout=2 \
            [otheroptions]

You can replace the capfromtcs blah blah blah command with any other program in the package (currently only “grader”, which classifies timecourses) - after the fredericklab/capcalc:latest, just specify the command and arguments as you usually would.

Singularity installation

Many times you can’t use Docker, because of security concerns. Singularity, from LBL, offers containerized computing that runs entirely in user space, so the amount of mischief you can get up to is significantly less. Singularity containers can be created from Docker containers as follows (stealing from the fMRIprep documentation):

singularity build /my_images/capcalc-VERSIONNUMBER.simg docker://fredericklab/capcalc:VERSIONNUMBER

Running the container is similar to Docker. The “-B” option is used to bind filesystems to mountpoints in the container.

singularity run

–cleanenv -B INPUTDIRECTORY:/data_in,OUTPUTDIRECTORY:/data_out capcalc-VERSIONNUMBER.simg

capfromtcs

-i manyfiles.txt -o output/manyfiles –sampletime=0.72 –varnorm -m -b 4800 -S 1200 –quality -E default –minout=2 [otheroptions]

Usage information

roidecompose

Description:

Inputs:

Outputs:

Usage:

capfromtcs

Description:

This scripts takes a file containing a number of timecourses, does some preprocessing on them, then calculates the coactivation patterns found in the data.

Inputs:

There are three required inputs:

The input file is a text file containing 2 or more timecourses in columns separated by whitespace. Each row is a time point.

The outputname is the prefix of all output files.

The samplerate (or sampletime) is the frequency (or timestep) of the input data.

Outputs:

Outputs are space or space by time Nifti files (depending on the file), and some text files containing textual information, histograms, or numbers. Output spatial dimensions and file type match the input dimensions and file type (Nifti1 in, Nifti1 out). Depending on the file type of map, there can be no time dimension, a time dimension that matches the input file, or something else, such as a time lag dimension for a correlation map. OUTPUT_normalized.txt

This is a text file containing the input timecourses after applying the selected normalization.

OUTPUT_clustercenters.txt

This is a text file with NUMCLUSTER lines, one for each state, specifying the center location of the cluster (each line has the same number of columns as the number of input timecourses).

OUTPUT_statelabels.txt

This is a text file with one line per timepoint. Each line is the assigned state for the given timepoint in the input data.

OUTPUT_silhouettesegmentstats.csv

OUTPUT_overallsilhouettemean.csv

This text file has one line per cluster, giving the mean of the mean silhouette score of all the segments that spent any time in that state.

OUTPUT_pctsegsinstate.txt

This text file has one line per cluster, indicating what percentage of subjects (segments) spent any time in this state.

OUTPUT_seg_XXXX_instate_YY.txt

This is a text file with one line per timepoint in the segment. The value is 1 if the system is in state YY, 0 otherwise.

OUTPUT_seg_XXXX_statestats.csv

This is a text file with one line per cluster (state), with the following columns:

  • percentage of segment spent in state

  • number of continuous runs in state

  • total number of TRs in state

  • minimum number of TRs spent in state

  • maximum number of TRs spent in state

  • average number of TRs spent in state

  • median number of TRs spent in state

  • standard deviation of the number of TRs spent in state

OUTPUT_seg_XXXX_statetimestats.csv

This is the equivalent of the statestats file, where the units are time in seconds rather than TRs

OUTPUT_seg_XXXX_statelabels.txt

This is a text file with one line per timepoint. Each line is the assigned state for the given timepoint in the segment.

OUTPUT_seg_XXXX_silhouetteclusterstats.csv

This is a text file with one line per cluster. Each line has four columns:

  • the mean silhouette score for that cluster in that segment.

  • the median silhouette score for that cluster in that segment.

  • the minimum silhouette score for that cluster in that segment.

  • the maximum silhouette score for that cluster in thate segment.

OUTPUT_seg_XXXX_rawtransmat.nii.gz

This is a NIFTI file with dimensions n_states by n_states. The number of transitions from state a to state b is in location [a, b]

OUTPUT_seg_XXXX_normtransmat.nii.gz

This is a NIFTI file containing the same information as the rawtransmat file, but each row is normalized to sum to 1, making, so the numbers represent the transition probabilities, rather than the total number of transitions.

OUTPUT_seg_XXXX_offdiagtransmat.nii.gz

This is a NIFTI file containing the same information as the normtransmat file, except that the diagonal elements have been set to zero. This is therefore the relative probability transitioning to each possible destination state in the case where the state does not simply persist.

OUTPUT_seg_XXXX_rawtransmat.csv OUTPUT_seg_XXXX_normtransmat.csv OUTPUT_seg_XXXX_offdiagtransmat.csv

The same data as the above NIFTI files, but as a csv file.

Usage:

capfromtcs - calculate and cluster coactivation patterns for a set of timecourses

usage: capfromtcs -i timecoursefile -o outputfile --samplefreq=FREQ --sampletime=TSTEP
                  [--nodetrend] [-s STARTTIME] [-D DURATION]
                  [-F LOWERFREQ,UPPERFREQ[,LOWERSTOP,UPPERSTOP]] [-V] [-L] [-R] [-C]
                  [-m] [-n NUMCLUSTER] [-b BATCHSIZE] [-S SEGMENTSIZE] [-E SEGMENTTYPE] [-I INITIALIZATIONS]
                  [--noscale] [--nonorm] [--pctnorm] [--varnorm] [--stdnorm] [--ppnorm] [--quality]
                  [--pca] [--ica] [-p NUMCOMPONENTS]

required arguments:
    -i, --infile=TIMECOURSEFILE  - text file mulitple timeseries
    -o, --outfile=OUTNAME        - the root name of the output files

    --samplefreq=FREQ            - sample frequency of all timecourses is FREQ
           or
    --sampletime=TSTEP           - time step of all timecourses is TSTEP
                                   NB: --samplefreq and --sampletime are two ways to specify
                                   the same thing.

optional arguments:

  Data selection/partition:
    -s STARTTIME                 - time of first datapoint to use in seconds in the first file
    -D DURATION                  - amount of data to use in seconds
    -S SEGMENTSIZE,[SEGSIZE2,...SEGSIZEN]
                                 - treat the timecourses as segments of length SEGMENTSIZE for preprocessing.
    -E SEGTYPE,SEGTYPE2[,...SEGTYPEN]
                                 - group subsegments for summary statistics.  All subsegments in the same group must be the same length
                                   If there are multiple, comma separated numbers, treat these as subsegment lengths.
                                   Default segmentsize is the entire length
  Clustering:
    -m                           - run MiniBatch Kmeans rather than conventional - use with very large datasets
    -n NUMCLUSTER                - set the number of clusters to NUMCLUSTER (default is 8)
    -b BATCHSIZE                 - use a batchsize of BATCHSIZE if doing MiniBatch - ignored if not.  Default is 1000
    --dbscan                     - perform dbscan clustering
    --hdbscan                    - perform hdbscan clustering
    -I INITIALIZATIONS           - Restart KMeans INITIALIZATIONS times to find best fit (default is 1000)

  Preprocessing:
    -F                           - filter data and regressors from LOWERFREQ to UPPERFREQ.
                                   LOWERSTOP and UPPERSTOP can be specified, or will be calculated automatically
    -V                           - filter data and regressors to VLF band
    -L                           - filter data and regressors to LFO band
    -R                           - filter data and regressors to respiratory band
    -C                           - filter data and regressors to cardiac band
    --nodetrend                  - do not detrend the data before correlation
    --noscale                    - don't perform vector magnitude scaling
    --nonorm                     - don't normalize timecourses
    --pctnorm                    - scale each timecourse to it's percentage of the mean
    --varnorm                    - scale each timecourse to have a variance of 1.0 (default)
    --stdnorm                    - scale each timecourse to have a standard deviation of 1.0
    --ppnorm                     - scale each timecourse to have a peak to peak range of 1.0
    --pca                        - perform PCA dimensionality reduction prior to analysis
    --ica                        - perform ICA dimensionality reduction prior to analysis
    -p NUMCOMPONENTS             - set the number of p/ica components to NUMCOMPONENTS (default is 8).  Set to -1 to estimate
    --noscale                    - do not apply standard scaler befor cluster fitting

  Other:
    --GBR                        - apply gradient boosting regressor testing on clusters
    -d                           - display some quality metrics
    --quality                    - perform a silhouette test to evaluate fit quality
    -v                           - turn on verbose mode

These options are somewhat self-explanatory. I will be expanding this section of the manual going forward, but I want to put something here to get this out here.

maptoroi

Description:

maptoroi takes ROI values from a text file and maps them back onto a NIFTI image for display.

Inputs:

maptoroi requires an input text file with 1 column per region giving the value of the ROI. If there are multiple rows, each row corresponds to a time point. It also requires a template NIFTI file.

Outputs:

showxcorr outputs everything to standard out, including the Pearson correlation, the maximum cross correlation, the time of maximum cross correlation, and estimates of the significance levels (if specified). There are no output files.

Usage:

usage: maptoroi inputfile templatefile outputroot

required arguments:
    inputfile        - the name of the file with the roi values to be mapped back to image space
    templatefile     - the name of the template region file
    outputfile       - the name of the output nifti file

roidecompose

Description:

Inputs:

Outputs:

Usage:

What’s new

Release history

Version 1.3.6 (5/11/23)

  • (Docker) Updated to python 3.11 basecontainer.

  • (Docker) Fixed testdocker.sh.

  • (package) Modernized install procedure.

Version 1.3.5 (1/11/23)

  • (capfromtcs) Fixed summary statistics outputs.

Version 1.3.4 (1/10/23)

  • (capfromtcs) Revert change to max_iter when specifying initialcenters.

Version 1.3.3 (1/10/23)

  • (capfromtcs) Changed intialization of kmeans from initialcenters, save more intermediate results.

  • (utils.py) Factored out some transition array calculations.

Version 1.3.2 (11/15/22)

  • (capfromtcs) Fixed a typo in specifying the initial k-means++ method.

  • (io) Updated libraries to match some changes in rapidtide.

Version 1.3.1 (11/10/22)

  • (capfromtcs) Added new option “–initialcenters” to allow specification of k-means cluster centers. This lets you reuse the CAPS as previous runs, but gets around the difficulty of reading back old models run under a different version of python or scikit-learn.

Version 1.3.0 (11/10/22)

  • (package) Removed rapidtide as a dependency by copying over the necessary support routines.

  • (Docker) Updated basecontainer to latest, switched over to using pip rather than mamba for env.

  • (docs) Corrected a problem that was causing readthedocs build to fail.

Version 1.2.3 (8/31/22)

  • (capfromtcs, clustercomp) Used newer, non-deprecated method to access nifti files with nibabel.

  • (package) Added docker and singularity test scripts.

  • (package) Reformatted several files with black.

  • (package) Made many (unsuccesful) attempts to get the documentation to build.

Version 1.2.2.5 (8/30/22)

  • (package) Bump to trigger github deployment.

Version 1.2.2.4 (8/30/22)

  • (package) Convert README.md to README.rst

Version 1.2.2.3 (8/30/22)

  • (package) Fixed Development status in setup.py

Version 1.2.2.2 (8/30/22)

  • (package) Syncing with rapidtide to try to get pypi deployment to work

Version 1.2.2.1 (8/29/22)

  • (package) Fixed versioneer installation.

Version 1.2.2 (8/29/22)

  • (package) Updated pyproject.toml and versioneer to try to fix pypi deployment.

Version 1.2.1 (8/22/22)

  • (Docker) Fixed Dockerfile error.

  • (package) Updated pypi authentication information.

Version 1.2.0 (8/22/22)

  • (Docker) Added Docker compatibility.

  • (package) Updated to match the current version of rapidtide.

  • (package) Added “refresh” script to simplify updates.

  • (package) Formatting changes.

  • (capfromtcs) Fixed import of joblib, updated to new NonCausalFilter calling conventions.

  • (clusternifti) Major overhaul. Added normalization, PCA and ICA dimensionality reduction, switched to argparse, added repeats.

  • (clustersort) New program to harmonize multiple cluster solutions.

  • (clusternifti, clustersort, supercluster) Harmonized mask specification arguments and internal variable names.

Version 1.1.0 (8/20/21)

  • (package) Move to versioneer.

Version 1.0.0 (2/15/17)

  • First release

Indices and tables