Model Selection in Regression: Application to Tumours in Childhood

Journal Title: Current Trends on Biostatistics & Biometrics - Year 2018, Vol 1, Issue 1

Abstract

We give a chronological review of the major model selection methods that have been proposed from circa 1960. These model selection procedures include Residual mean square error (MSE), coefficient of multiple determination (R2), adjusted coefficient of multiple determination (Adj R2), Estimate of Error Variance (S2), Stepwise methods, Mallow’s Cp, Akaike information criterion (AIC), Schwarz criterion (BIC). Some of these methods are applied to a problem of developing a model for predicting tumors in childhood using log-linear models. The theoretical review will discuss the problem of model selection in a general setting. The application will be applied to log-linear models in particular. The problem of model selection is at the core of progress in science. Over the decades, scientists have used various statistical tools to select among alternative models of data. A common challenge for the scientist is the selection of the best subset of predictor variables in terms of some specified criterion. Tobias Meyer (1750) established the two main methods, namely fitting linear estimation and Bayesian analysis by fitting models to observation. The 1900 to 1930’s saw a great development of regression and statistical ideas but were based on hand calculations. In 1951 Kullback and Leibler developed a measure of discrepancy from Information Theory, which forms the theoretical basis for criteria-based model selection. In the 1960’s computers enabled scientists to address the problem of model selection. Computer programmes were developed to compute all possible subsets for an example, Stepwise regression, Mallows Cp, AIC, TIC and BIC. During the 1970’s and 1980’s there was huge spate of proposals to deal with the model selection problem. Linhart and Zucchini (1986) provided a systematic development of frequentist criteria-based model selection methods for a variety of typical situations that arise in practice. These included the selection of univariate probability distributions, the regression setting, the analysis of variance and covariance, the analysis of contingency tables, and time series analysis. Bozdogan [1] gives an outstanding review to prove how AIC may be applied to compare models in a set of competing models and define a statistical model as a mathematical formulation that expresses the main features of the data in terms of probabilities. In the 1990’s Hastie and Tibsharini introduced generalized additive models. These models assume that the mean of the dependent variable depends on an additive predictor through a nonlinear link function. Generalized additive models permit the response probability distribution to be any member of the exponential family of distributions. They particularly suggested that, up to that date, model selection had largely been a theoretical exercise and those more practical examples were needed (see Hastie and Tibshirani, 1990).

Authors and Affiliations

Annah Managa

Keywords

Related Articles

Phenotypic Correlation Between Egg Weight and Egg Linear Measurements of the French Broiler Guinea Fowl Raised in the Humid Zone of Nigeria

This study was carried out in Funtua, Kastina State. A total of 119 Eggs of the French broiler guinea fowl were sourced at Songhai Agricultural center Funtua, Kastina State. The eggs were measured for egg linear measurem...

Demand for the Emerging AI, Machine, Deep Learning and Big Data Analytics Skill for 21st Century Jobs

This paper presents recent development, application and potentials of technologies like AI, Machine, Deep Learning and Big Data Analytics and ways in which big data can be leveraged to improve the efficiency and effectiv...

Some Simple Mathematical Models in Epilepsy

Epilepsy is a chronic disorder of the brain that affects people of all ages. Here epileptic seizures equations are related to the telegraph equation. Epilepsy is a chronic disorder of the brain that affects people of all...

Modeling Lifetime Data with the Odd Generalized Exponentiated Inverse Lomax Distribution

We propose a four parameter compound continuous distribution in this study. Simulation studies was carried out to investigate the behavior of the proposed distribution, from which the maximum likelihood estimates for the...

Model Selection in Regression: Application to Tumours in Childhood

We give a chronological review of the major model selection methods that have been proposed from circa 1960. These model selection procedures include Residual mean square error (MSE), coefficient of multiple determinatio...

Download PDF file
  • EP ID EP640187
  • DOI 10.32474/CTBB.2018.01.000101
  • Views 45
  • Downloads 0

How To Cite

Annah Managa (2018). Model Selection in Regression: Application to Tumours in Childhood. Current Trends on Biostatistics & Biometrics, 1(1), 1-12. https://www.europub.co.uk/articles/-A-640187