Performance comparison of machine learningalgorithms analyzing patterns of neural response toobject categories.U.M. Prakash (Asst. Professor)Department of ComputerScience and EngineeringSRM Institute ofScience and TechnologyChennai, Tamil Nadu, INDIASushant GargDepartment of ComputerScience and EngineeringSRM Institute ofScience and TechnologyChennai, Tamil Nadu, INDIAD. Mohit JainDepartment of ComputerScience and EngineeringSRM Institute ofScience and TechnologyChennai, Tamil Nadu, INDIAAbstract—Machine learning (ML) has turn out to be a establishedtool for decrypting functional neuroimaging data, andthere are now confidences of carrying out such tasks capably inreal-time. Towards this objective, we compared precision of threediverse ML algorithms applied to neuroimaging data obtainedfrom Haxby dataset. Utmost precision was achieved by LogisticRegression in maximum cases, followed by Ridge and Supportvector classifier. For real-time decoding applications, findinga parsimonious subset of diagnostic ICs might be useful. Wethen applied the optimized ML algorithms to these new datainstances, and found that classification accuracy outcome werereproducible. Before applying statistical learning to neuroimagingdata, normal preprocessing must be applied. For fMRI, thisincludes motion correction, slice timing correction, coregistrationwith an anatomical image and normalization to a commontemplate like the MNI (Montreal Neurologic Institute) one ifessential. Reference softwares for these tasks are SPM 1 andFSL 2. A Python interface to these tools is available in NILearnPython library 3 .I. INTRODUCTIONIn the early 1950s, Shannon developed an iterated pennymatchingdevice intended to perform straightforward brainreading tasks 4 Although this device performed just vaguelybetter than chance, it produced a fascination with brain readingtechnology 5. modern advancements in neuroimaging haveprovided a quantitative means for visualizing brain activity thatcorresponds to mental processes 6, and certain brain readingfeats have been accomplished by using pattern classificationtechniques to functional magnetic resonance imaging (fMRI)data 7.The application of machine learning (ML) to fMRIanalysis has become gradually more popular, following itsinitial application to Haxby’s visual object recognition data8. Neural 8, Nave Bayes 8, and support vector machineclassifiers 9 each have yielded different levels of predictivecapability. However, fMRI data sets are extremely large, andone of the key issues in fMRI classification has been to minethese vast data effectively. Neuroimaging data are representedin 4 dimensions: 3 spatial dimensions, and one dimension toindex time or trials. Machine learning algorithms, on the otherhand, only accept 2-dimensional samples features matrices.Depending on the setting, voxels and time series can beconsidered as features or samples. For example, in spatial independentcomponent analysis (ICA), voxels are samples.Figure1. Conversion of brain scans into 2-dimensional data.II. METHODS1) Data Preparation: From MR Volumes to a Data Matrix.2) Decoding the Mental Representation of Objects in theBrain.3) Machine Learning Algorithms.4) Training, Testing, and Cross Validation.5) Comparison across Classifiers.A. Data Preparation: From MR Volumes to a Data Matrix.The reduction process from 4D-images to characteristicvectors comes with the loss of spatial structure (Figure 1).It however allows to discard uninformative voxels, such as theones outside of the brain. Such voxels that only carry noise andscanner artifacts would decrease SNR and affect the quality ofthe estimation. The selected voxels form a brain mask. Such amask is often given along with the datasets or can be computedwith software tools such as FSL or SPM 3 .Fig. 1. Conversion of brain scans into 2-dimensional data.B. Decoding the Mental Representation of Objects in theBrain.In the context of neuroimaging, decoding refers to learninga model that predicts behavioral or phenotypic variables frombrain imaging data. The alternative that consists in predictingthe imaging data given external variables, such as stimulidescriptors, is called encoding 10. It is further discussed inthe next section. First, we illustrate decoding with a simplifiedversion of the experiment presented in Haxby et al. (2001)11.In the original work, visual stimuli from 8 different categoriesare presented to 6 subjects during 12 sessions. The goalis to predict the category of the stimulus presented to thesubject given the recorded fMRI volumes. This example hasalready been widely analyzed 12 , 13,14,15,16 andhas become a reference example in matter of decoding. Forthe sake of simplicity, we restrict the example to one subjectand to two categories, faces and houses.C. Machine Learning Algorithms.1) Support Vector Classifier: The support-vector network isa new learning machine for two-group classification problems.The machine conceptually implements the following idea:input vectors are non-linearly mapped to a very highdimensionfeature space. In this feature space a linear decision surfaceis constructed. Special properties of the decision surfaceensures high generalization ability of the learning machine.The idea behind the support-vector network was previouslyimplemented for the restricted case where the training data canbe separated without errors. We here extend this result to nonseparabletraining data. High generalization ability of supportvectornetworks utilizing polynomial input transformations isdemonstrated.2) Logistic regression: Logistic regression, regardless of itstitle, is a linear model for classification instead of regression.Logistic regression is also identified in the literature as logitregression, maximum-entropy classification (MaxEnt) or thelog-linear classifier. In this model, the probabilities describingthe probable outcomes of a only trial are modeled using alogistic function.The execution of logistic regression in scikit-learn can beaccessed from class LogisticRegression. This implementationcan fit a multiclass (one-vs-rest) logistic regression with optionalL2 or L1 regularization.As an optimization problem, binary class L2 penalizedlogistic regression minimizes the following cost function:Similarly, L1 regularized logistic regression solves the followingoptimization problemLogisticRegressionCV implements Logistic Regressionwith built in cross-validation to find out the ideal C parameter.newton-cg, sag and lbfgs solvers are found to be faster forhigh-dimensional dense data, due to warm-starting. For themulticlass case, if multi class option is set to ovr, an idealC is found for each class and if the multi class option is setto multinomial, an ideal C is found that minimizes the crossentropyloss.3) Ridge Regression: Ridge regression addresses some ofthe problems of Ordinary Least Squares by imposing a penaltyon the size of coefficients. The ridge coefficients minimize apenalized residual sum of squaresOrdinary Least Squares Complexity method calculatesthe least squares solution using a singular value decompositionof X. If X is a matrix of size (n, p) this method has a cost ofO(np2) , assuming that n >= p:Here, >= 0 is a complexity parameter that controls theamount of shrinkage: the larger the value of alpha, the greaterthe amount of shrinkage and thus the coefficients become morerobust to collinearity.RidgeCV implements ridge regression with built-in crossvalidationof the alpha parameter. The object works in the sameway as GridSearchCV except that it defaults to GeneralizedCross-Validation (GCV), an efficient form of leave-one-outcross-validation:D. Training, Testing, and Cross Validation.Learning the parameters of a prediction function and testingit on the same data is a methodological mistake: a model thatwould just replicate the labels of the samples that it has justseen would have a perfect score but would be unsuccessfulin predicting anything practical on yet-unseen data. Thiscondition is called over fitting. To evade it, it is universalpractice when performing a (supervised) machine learningexperiment to grasp out part of the available data as a testset X test, Y test.Cross-validation, sometimes called rotationestimation, is a model validation technique for reviewing onhow the results of a statistical analysis will generalize toan independent data set. It is mainly used in settings wherethe objective is prediction, and one wants to estimate howaccurately a predictive model will perform in practice. In aprediction problem, a model is usually given a dataset ofknown data on which training is run (training dataset), anda dataset of unknown data (or first seen data) against whichthe model is tested (called the validation dataset or testingset).The purpose of cross validation is to define a dataset to”test” the model in the training phase (i.e., the validation set),in order to reduce troubles like over fitting, give an insight onhow the model will generalize to an independent dataset (i.e.,an unknown dataset, for instance from a real problem), etc.E. Comparison across Classifiers.According to the no free lunch theorem 17, there is nosingle learning algorithm that generally performs best acrossall domains. As such, a number of classifiers should be tested.Here, the two best performing classifiers were Logistic Regressionand Ridge with Logistic Regression providing the highestoverall correct classification.Although SVC performance wasnot as high as other classifiers tested here, improvements inSVC performance would likely occur with additional optimizationof hyper parameters. Future optimization work mightinclude implementation of a cross-validated grid search withinthe training set to find most favorable parameter values.Although Ridge performed well, caution should be used whenapplying any regression based classification scheme, as thesemethods are prone to over fitting when the number of attributesis large. Given enough leaves, a regression often can specifyall training data with 100% accuracy, given that impuritygenerally decreases as data becomes more specified. It isoften useful to shorten the process by applying a thresholdor pruning technique. Although Ridge classifiers may producehigh levels of classification accuracy, generalization of theresults often is poor as compared to Logistic Regression,and addition of noise to learning subsets has proven usefulin increasing model generalization 18. The experiment wasconducted on eight different visual stimuli and results obtainedare plotted on a histogram.III. CONCLUSIONIn a current assessment 19 speculated that real-time fMRIclassification might find clinical application in neuro feedbackbasedtherapies. Although the current paper is not a real-timepaper, it presents a number of optimization and analysis stepsthat can be useful in a near real-time protocol. The datasetused in this study is not ideal for real-time classification usingthe traditional machine learning algorithms.ACKNOWLEDGMENTThis work was supported by Department of ComputerScience and Engineering, School of Computing, Faculty ofEngineering and Technology, SRM Institute of Science andTechnology, Chennai, Tamil Nadu, INDIA.REFERENCES1 J. M. Kilner, K. J. Friston, and C. D. Frith, “Predictive coding: anaccount of the mirror neuron system,” Cognitive processing, vol. 8, no. 3,pp. 159–166, 2007.2 M. T. Smith, R. R. Edwards, R. C. Robinson, and R. H. Dworkin,”Suicidal ideation, plans, and attempts in chronic pain patients: factorsassociated with increased risk,” Pain, vol. 111, no. 1-2, pp. 201–208,2004.3 A. Abraham, F. Pedregosa, M. Eickenberg, P. Gervais, A. Mueller,J. Kossaifi, A. Gramfort, B. Thirion, and G. Varoquaux,”Machine learning for neuroimaging with scikit-learn,” Frontiersin Neuroinformatics, vol. 8, p. 14, 2014. Online. Available:https://www.frontiersin.org/article/10.3389/fninf.2014.000144 C. E. Shannon, “Computers and automata,” Proceedings of the IRE,vol. 41, no. 10, pp. 1234–1241, 1953.5 B. Budiansky and N. A. Fleck, “Compressive kinking of fiber composites:a topical review,” Appl. Mech. Rev, vol. 47, no. 6, pp. S246–S270,1994.6 D. D. Cox and R. L. Savoy, “Functional magnetic resonance imaging(fmri)brain reading: detecting and classifying distributed patterns of fmriactivity in human visual cortex,” Neuroimage, vol. 19, no. 2, pp. 261–270, 2003.7 P. D. Gluckman and M. A. Hanson, “Living with the past: evolution,development, and patterns of disease,” Science, vol. 305, no. 5691, pp.1733–1736, 2004.8 M. M. d. Abreu, L. H. G. Pereira, V. B. Vila, F. Foresti, and C. Oliveira,”Genetic variability of two populations of pseudoplatystoma reticulatumfrom the upper paraguay river basin,” Genetics and molecular biology,vol. 32, no. 4, pp. 868–873, 2009.9 R. Arnold, C. Augier, A. Bakalyarov, J. Baker, A. Barabash,P. Bernaudin, M. Bouchel, V. Brudanin, A. Caffrey, J. Cailleret et al.,”Technical design and performance of the nemo 3 detector,” NuclearInstruments and Methods in Physics Research Section A: Accelerators,Spectrometers, Detectors and Associated Equipment, vol. 536, no. 1, pp.79–122, 2005.10 T. Naselaris, K. N. Kay, S. Nishimoto, and J. L. Gallant, “Encoding anddecoding in fmri,” Neuroimage, vol. 56, no. 2, pp. 400–410, 2011.11 J. V. Haxby, M. I. Gobbini, M. L. Furey, A. Ishai, J. L. Schouten, andP. Pietrini, “Distributed and overlapping representations of faces andobjects in ventral temporal cortex,” Science, vol. 293, no. 5539, pp.2425–2430, 2001.12 P. D. Gluckman and M. A. Hanson, “Living with the past: evolution,development, and patterns of disease,” Science, vol. 305, no. 5691, pp.1733–1736, 2004.13 K. A. Norman, S. M. Polyn, G. J. Detre, and J. V. Haxby, “Beyond mindreading:multi-voxel pattern analysis of fmri data,” Trends in cognitivesciences, vol. 10, no. 9, pp. 424–430, 2006.14 G. Hudes, M. Carducci, P. Tomczak, J. Dutcher, R. Figlin, A. Kapoor,E. Staroslawska, J. Sosman, D. McDermott, I. Bodrogi et al., “Temsirolimus,interferon alfa, or both for advanced renal-cell carcinoma,”New England Journal of Medicine, vol. 356, no. 22, pp. 2271–2281,2007.15 S. J. Hanson and Y. O. Halchenko, “Brain reading using full brainsupport vector machines for object recognition: there is no face identificationarea,” Neural Computation, vol. 20, no. 2, pp. 486–503, 2008.16 F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg et al.,”Scikit-learn: Machine learning in python,” Journal of Machine LearningResearch, vol. 12, no. Oct, pp. 2825–2830, 2011.17 D. H. Wolpert, W. G. Macready et al., “No free lunch theorems forsearch,” Technical Report SFI-TR-95-02-010, Santa Fe Institute, Tech.Rep., 1995.18 L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp.5–32, 2001.19 R. Christopher deCharms, “Applications of real-time fmri,” NatureReviews Neuroscience, vol. 9, no. 9, p. 720, 2008.