# Population Value Decomposition (PVD)

Population Value Decomposition is a concept that evolved naturally from our studies where data at the subject level is naturally organized as a matrix. Examples of such data include: 1) fMRI studies, where data can be represented as a V by T matrix, where V is the number of voxels in the brain (typically around 100K 45mm^{3} three dimensional areas) and T is the time in the scanner (typically in the hundreds of 2-second intervals); 2) EEG studies, where long time series can be Fourier transformed into an F by T matrix, where F is the number of frequencies (from 100 to 1000 or more, depending on the sampling rate) and T is the number of time windows where data are assumed to be quasi-stationary (from 100 to 1500 or more.)

Thus, subject specific data can be represented as a matrix Y_{i}, and the sample of these matrices are the data. Some important characteristics of these matrices are that: a) the entries of each matrix has the same interpretation (e.g. the same voxel at the same time point across subjects); b) the two dimensions do not necessarily have the same interpretation (space by time in the fMRI example or frequency by time in the EEG example); and c) have the same dimension. Given a population of matrices Y_{i }we are interested in understading their structure and, possibly, their association with health outcomes. The problem is that the dimensions of Y_{i} are very large, which makes computation and visualization very difficult. One simple solution is to look for a decomposition of the type Y_{i }= P V_{i} D + E_{i}, where the left dimension of P and right dimension of D are very large, but the dimensions of the V_{i} matrix are very small. The advantage of such a decomposition is that the matrices P and D do not depend on the subject and the entire variability of the data is governed by a much smaller space spanned by the entries of V_{i}. Thus, both prediction and variability decomposition can be done in a much smaller space. ANother advantage is that there is a one-to-one relationship between any linear model on V_{i }(the intrinsic space) and the the exact same linear model on Y_{i }(the observed space.) The decomposition is not unique and various P and D matrices can be used. In our rejoinder to our discussed JASA paper on PVD we show that one could actually start with a P and D matrix, calculate the residuals Y_{i }- P V_{i} D and continue the procedure. This procedure is akin to more standard model searches and would result in an "additive" PVD.

Papers that our group wrote on this topic are: Two-stage decompositions for the analysis of functional connectivity for fMRI with application to Alzheimer's disease risk and Population Value Decomposition, a Framework for the Analysis of Image Populations. Interestingly, in the comments of Lock, Nobel and Marron rit is shown that the Candecomp/Parafac and Tucker decompositions for multi-way data can be viewed as particular cases of PVD.