Documentation For R Package: ada

  • Purpose:

    The package ada provides a straightforward, well-documented, and broad boosting routine for classification, ideally suited for small to moderate-sized data sets.  The package is an R implementation for Discrete, Real, and Gentle Stochastic Boosting under both logistic and exponential loss  (see reference 9 below). ada's extensive documentation and ease of use, provides a natural package for individuals interested in quickly familiarizing themselves with the boosting methodology and assessing how boosting would perform on their data. The diagram below is an interactive R-help archive reserved for the proper usage of the functions in this package.  For additional information and several examples refer to the article "ada: an R Package for Stochastic Boosting" [pdf].   Also, updates on this package will be reported and documented at this site. 

  • Authors:
    • Mark Culp, Kjell Johnson, and George Michailidis
    • Please send comments, suggestions or problems to me, Mark Culp. I will do my best to improve this package or documentation in a reasonable amount of time.

 Figure:

This is an interactive functional flow diagram.  Click on an individual function for R help on that process. 


  • Manual/Paper:
    Culp M, Johnson K, Michailidis G (2006) ada: an R package for stochastic boosting.  Journal of Statistical Software. 17:2  [pdf][code]

    Culp M., Johnson K., Michailidis G. (2006)  On Regularized Stochastic Boosting. In revision.

  • Updates/Package:
    • 11-04-2007: Release of ada-2.0-2 [windows][UNIX]
      • Update: The ada update now incorporates several bug fixes found over the past year and the variable importance function has been improved.
      • Updates in version 2: The algorithm performs Stochastic Boosting with exponential and logistic loss similar to stochastic gradient boosting (SGB) (Friedman, 2002). Discrete, real and gentle boost versions are presented.
    • 9-10-2005: Fixed the problem with the fits in the predict function.
    • 7-13-2005: Release of ada-1.0.0 (no longer available) [windows][UNIX]

  • Additional Boosting Resources:

    Currently, free R packages exist for advanced boosting which efficiently build regression trees, smoothing splines, and additive models (such as gbm, and mboost) [14;20].   The gbm package provides an internal regression tree engine, marginal plots and additional utilities for optimizing a wider range of loss functions outside of classification.   The mboost package is a new advanced boosting tool for processing several base learners and arbitrary loss functions.  In our experience, these packages can be powerful tools for modeling with either a regression or count outcome and are  recommended as additional tools for advanced boosting procedures.   Both of these packages are available for use on R's main web site.


  • References:

    1. Becker R, Chambers J, Wilks A (1988). The new S language: a programming environment for data analysis and graphics. Wadsworth and Brooks/Cole Advanced Books \& Software, Monterey, CA.
    2. Boonyanunta N, Zeephongsekul P (2003). "Improving the Predictive Power of AdaBoost: A Case Study in Classifying Borrowers." In "Proceedings of the 16th International Conference on Developments in Applied Artificial Intelligence", pp. 674--685. Springer Verlag Inc.
    3. Breiman L (1996)."Bagging Predictors." Machine Learning, 24(2), 123--140.
    4. Breiman L (2001). "Random Forests." Machine Learning, 45(1), 5--32.
    5. Breiman L, Friedman J, Olshen R, Stone C (1984). "Classification and Regression Trees. Chapman & Hall, New York.
    6. Cohen J (1960). "A Coefficient of Agreement for Nominal Data." Education and Psychological Measurement, 20, 37--46.
    7. Dettling M (2004). "BagBoosting for Tumor Classification with Gene Expression Data." Bioinformatics, 20(18), 3583--3593.
    8. Freund Y, Schapire R (1996). "Experiments with a New Boosting Algorithm." In "International Conference on Machine Learning," pp. 148--156.
    9. Freund Y, Schapire R (1997). "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting." Journal Computer and System Sciences, \textbf{55}(1), 119--139.
    10. Friedman J (2001). "Greedy Function Approximation: A Gradient Boosting Machine." The Annals of Statistics, 29(5), 1189--1232.
    11. Friedman J (2002). "Stochastic Gradient Boosting." Computational Statistics & Data Analysis, 38(4), 367--378.
    12. Friedman J, Hastie T, Tibshirani R (2000). "Additive Logistic Regression: A Statistical View of Boosting." The Annals of Statistics, 28(2), 337--407.
    13. Hastie T, Tibshirani R, Friedman J (2001). The Elements of Statistical Learning (Data Mining, Inference and Prediction). Springer Verlag.
    14. Hothorn T, Bühlmann P (2006). mboost: Model-Based Boosting. R package version 0.4-13.
    15. Huang K, Murphy R (2004). "Boosting Accuracy of Automated Classification of Fluorescence Microscope Images for Location Proteomics." BMC Bioinformatics, 5, 78.
    16. Kawakita M, Minami M, Eguchi S, Lennert-Cody C (2005). "An Introduction to the Predictive Technique AdaBoost with a Comparison to Generalized Additive Models." Fisheries Research, 76(6), 323--343.
    17. Lemmens A, Croux C (2005). "Bagging and Boosting Classification Trees to Predict Churn." Journal of Marketing Research, 43(2), 276--268.
    18. Liaw A, Wiener M (2002). "Classification and Regression by randomForest." R News, 2(3), 18--22. http://CRAN.R-project.org/doc/Rnews/.
    19. R Development Core Team (2006). "R: A Language and Environment for Statistical Computing". R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org.
    20. Ridgeway G (2006). gbm: Generalized Boosted Regression Models. R package version 1.5-7, http://www.i-pensieri.com/gregr/gbm.shtml.
    21. Rosset S, Zhu J, Hastie T (2004). "Boosting as a Regularized Path to a Maximum Margin Classifier." Journal of Machine Learning Research, 5, 941--973.
    22. Schapire R (1990). "The Strength of Weak Learnability." Machine Learning, 5(2), 197--227.
    23. Segal M (2004). "Machine Learning Benchmarks and Random Forest Regression." Technical report, Center for Bioinformatics \& Molecular Biostatistics, University of California, San Francisco, CA. href=http://repositories.cdlib.org/cbmb/bench_rf_regn.
    24. Sugata S, Abe Y (2001). "Computer Simulation of Hydrodynamic Models for Chemical/Pharmaco-Kinetics." Journal of Chemical Software, 7(2).
    25. Therneau T, Atkinson B (2005). rpart: Recursive Partitioning Software. R package version 3.1-32.
    26. Ulintz P, Zhu J, Qin Z, Andrews P (2006). "Improved Classification of Mass Spectrometry Database Search Results Using Newer Machine Learning Approaches." Molecular and Cellular Proteomics, 5(3), 497--509.
    27. Valiant L (1984). "A Theory of The Learnable." In "Proceedings of the 16th Annual ACM Symposium on Theory of Computing," pp. 436--445. ACM Press, New York, NY.

***Disclaimer: This code provides illustrative performance of this technique. Note that, the published results may have used modifications of this code. Modify and use as desired. Also the user is encouraged to send an email to Mark Culp if there are any noticeable errors in this code or questions about usage.