Validating clustering for gene expression data bioinformatics

The number of free parameters in SVMs is not a function of the dimensionality, but instead is upper-bounded by the number of samples, which for microarrays is much smaller (Ramaswamy et al, 2001).

However, whether using linear or nonlinear kernels, SVMs are not immune to the curse of dimensionality.

These observations are consistent with the ‘biasvariance dilemma’ (Jain et al, 2000).Lastly, a separate objective is to identify cancer-associated genes and their joint effects, rather than to simply build a predictive model for the disease.Although feature selection is integral to each of these analytical tasks, an exhaustive search of all 2complexity tradeoff.Computational learning theory provides distribution-free bounds on generalisation accuracy in terms of a classifier's capacity, related to model complexity (Vapnik, 1998). Specifically, the error of model fitting can be decomposed into two components, bias (approximation error) and variance (estimation error).Relevance of these bounds to the microarray domain is discussed e.g. Added dimensions can degrade the prediction performance if the sample size is small relative to the dimensionality.

Validating clustering for gene expression data bioinformatics