Predictive Model Markup Language

The Predictive Model Markup Language ( PMML ) is an XML- based predictive model interchange format conceived by Dr. Robert Lee Grossman , then the director of the National Center for Data Mining at the University of Illinois at Chicago . PMML provides a way for analytic applications to describe and exchange predictive models produced by data mining and machine learningalgorithms. It supports common models Such As logistic regression and feedforward neural networks . Version 0.9 was published in 1998.

Since PMML is an XML-based standard, the specification comes in the form of an XML schema . PMML is a mature standard with over 30 organisms. [1]

Components

A PMML file can be described by the following components: [2] [3]

  • Header : contains general information about the PMML document, such as copyright information for the model, its description, and information about the application. It also contains an attribution for a timestamp which can be used to specify the date of model creation.
  • Data Dictionary : contains definitions for all possible fields used by the model. Categorical, or ordinal (attribute optype). Depending on this definition, the appropriate value ranges are then defined as well as the data type (such as, string or double).
  • Data Transformations : Transformations for the mapping of user data into a more desirable form to be used by the mining model. PMML defines several kinds of simple data transformations.
    • Normalization: the data can be continuous or discrete.
    • Discretization.
    • Value mapping: map discrete values ​​to discrete values.
    • Functions (custom and built-in): derive a value by applying a function to one or more parameters.
    • Aggregation: used to summarize or collect groups of values.
  • Model : contains the definition of the data mining model. Eg, A multi-layered feedforward neural network is represented in PMML by a “NeuralNetwork” element which contains attributes such as:
    • Model Name (attribute modelName)
    • Function Name (attribute functionName)
    • Algorithm Name (attribute algorithmName)
    • Activation Function (attribute activationFunction)
    • Number of Layers (attribute numberOfLayers)
This information is then followed by three kinds of neural layers, which specify the architecture of the neural network. These attributes are NeuralInputs, NeuralLayer, and NeuralOutputs. Besides neural networks, PMML Allows for the representation of Many other kinds of models Including support vector machines , rules Association , Naive Bayes classification , clustering models, text models , decision trees , and different regression models .
  • Mining Schema : a list of all fields used in the model. This can be a subset of the fields as defined in the data dictionary. It contains specific information about each field, such as:
    • Name (attribute name): must refer to a field in the data dictionary
    • Typical use (attribute usageType): defines the way in which it is used in the model. Typical values ​​are: active, predicted, and supplementary. Predicted fields are predicted by the model.
    • Outlier Treatment (outlet treatment): defines the outlier treatment to be use. In PMML, outliers can be treated as missing values, as extreme values ​​(based on the definition of high and low values ​​for a particular field), or as is.
    • Missing Value Replacement Policy (attribute missingValueReplacement): if this attribute is specified then a missing value is automatically replaced by the given values.
    • Missing Value Treatment (attribute missingValueTreatment): indicates how the missing value has been derived (eg as value, mean or median).
  • Targets : allows for the post-processing of the predicted value in the format of scaling if the output of the model is continuous. Targets can also be used for classification tasks. In this case, the attribute priority is defined as a default probability for the target category. It is used if the prediction logic itself did not produce a result. This can happen, eg, if an input value is missing and there is no other method for treating missing values.
  • Output : this element can be used to name all the required output fields expected from the model. These are the predicted fields, and they are typically the predicted value itself, the probability, cluster affinity (for clustering models), standard error, etc. The latest release of PMML, PMML 4.1, extended Output to allow for generic post-processing of model outputs. In PMML 4.1, all the build-in and custom functions that were originally available only for pre-processing became available for post-processing too.

PMML 4.0, 4.1 and 4.2

PMML 4.0 was released on June 16, 2009. [4] [5] [6]

Examples of new features included:

  • Improved Pre-Processing Capabilities: Adds Boolean operations and an If-Then-Else function.
  • Time Series Models: New exponential Smoothing models; aussi Place holders for ARIMA , Seasonal Trend Decomposition , and Spectral density estimation , qui are to be supported in the near future.
  • Model Explanation: Modeling and modeling.
  • Multiple Models: Capabilities for model composition, sets, and segmentation (eg, combining of regression and decision trees).
  • Extensions of Existing Elements: Addition of multi-class classification for Support Vector Machines , improved representation for Association Rules , and the addition of Cox Regression Models .

PMML 4.1 was released on December 31, 2011. [7] [8]

New features included:

  • Scorecards, k-Nearest Neighbors ( KNN ) and Baseline Models.
  • Simplification of multiple models. In PMML 4.1, the same element is used to represent model segmentation, set, and chaining.
  • Overall definition of field and field names.
  • A new attribute that identifies for each model model is ready or not for production deployment.
  • Enhanced post-processing capabilities (via the Output element).

PMML 4.2 was released on February 28, 2014. [9] [10]

New features include:

  • Transformations: New elements for text mining
  • New build-in functions for performing regular expressions: matches, concat, and replace
  • Simplified outputs for post-processing
  • Enhancements to Scorecard and Naive Bayes model elements

PMML 4.3 was released on August 23, 2016. [11] [12]

New features include:

  • New Model Types:
    • Gaussian Process
    • Bayesian Network
  • New built-in functions
  • Usage clarifications
  • Documentation improvements

Release history

Version Release date
Version 0.7 July 1997
Version 0.9 July 1998
Version 1.0 August 1999
Version 1.1 August 2000
Version 2.0 August 2001
Version 2.1 March 2003
Version 3.0 October 2004
Version 3.1 December 2005
Version 3.2 May 2007
Version 4.0 June 2009
Version 4.1 December 2011
Version 4.2 February 2014
Version 4.2.1 March 2015
Version 4.3 August 2016

Data Mining Group

The Data Mining Group is a joint venture of the United States of America and the United States of America. [13] The Data Mining Group also developed a standard for Portable Format for Analytics, or PFA, which is complementary to PMML .

References

  1. Jump up^ “The management and mining of multiple predictive models using the predictive modeling markup language” . ResearchGate . Doi : 10.1016 / S0950-5849 (99) 00022-1 . Retrieved 2015-12-21 .
  2. Jump up^ A. Guazzelli, M. Zeller, W. Chen, and G. Williams. PMML: An Open Standard for Sharing Models. The R Journal, Volume 1/1, May 2009.
  3. Jump up^ A. Guazzelli, W. Lin, T. Jena (2010). PMML in Action (2nd Edition): Unleashing the Power of Open Standards for Data Mining and Predictive Analytics. CreateSpace.
  4. Jump up^ Data Mining Group website | PMML 4.0 – Changes from PMML 3.2
  5. Jump up^ Zementis website | PMML 4.0 is here!
  6. Jump up^ R. Pechter. What’s PMML and What’s New in PMML 4.0? The ACM SIGKDD Explorations Newsletter, Volume 11/1, July 2009.
  7. Jump up^ Data Mining Group website | PMML 4.1 – Changes from PMML 4.0
  8. Jump up^ Predictive Analytics Info website | PMML 4.1 is here!
  9. Jump up^ Data Mining Group website | PMML 4.2 – Changes from PMML 4.1
  10. Jump up^ Predictive Analytics Info website | PMML 4.2 is here!
  11. Jump up^ Data Mining Group website | PMML 4.3 – Changes from PMML 4.2.1
  12. Jump up^ Predictive Model Markup Language product website | Project activity
  13. Jump up^ “2008 EO 990” . Retrieved 16 Oct 2014

Leave a Comment

Your email address will not be published. Required fields are marked *