Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Scoring ML.NET models in NimbusML

Najeeb Kazmi edited this page Aug 15, 2019 · 1 revision

While scoring NimbusML models in ML.NET (the moset common scenario) is fully supported, the converse i.e. scoring ML.NET models in NimbusML comes with a few caveats.

ML.NET models are produced in two different formats: the legacy PredictorModel format and the new TransformerChain format. The legacy format is fully supported for scoring in NimbusML. The TransformerChain model format has limited support in NimbusML. Below are the limitations and considerations to keep in mind when scoring this model format in NimbusML.

ML.NET models can be loaded into NimbusML as follows:

pipeline = Pipeline()
pipeline.load_model(mlnet_model_path)

Once the model is loaded, the following operations can be performed on datasets in NimbusML:

  1. Pipeline.predict(): This is fully supported on data loaded as a FileDataStream:

    data = FileDataStream.read_csv(data_path)
    scores = pipeline.predict(data)

    If, instead, the data is loaded as a pandas.DataFrame, the dtypes for the columns must be explicitly specified, and should correspond to the types that were used to define the model input in ML.NET. For example, if the age column in the UCI Adult Income dataset was loaded as float or Single in ML.NET, it must be explicitly specified to be of dtype numpy.float32, otherwise the dtype inferred by pandas will be used and cause a type mismatch in ML.NET.

    data_df = pandas.read_csv(data_path, dtype={'age': numpy.float32})
    scores = pipeline.predict(data_df)
  2. Pipeline.predict_proba(): An ML.NET TransformerChain model will only produce a Probability column if a calibrator is explicitly added to it. Otherwise, there will only be the PredictedLabel and Score columns.

    Moreover, ML.NET does not automatically add a normalizer for the features, while NimbusML does. If there is no normalizer present in the TransformerChain model, the scores produced by it will have very large magnitudes, and consequently the calibrated Probability will be close to 0 or 1, which is not very meaningful. Therefore, it is a good idea to add a normalizer when training your ML.NET model if you want to use predict_proba in NimbusML.

    ML.NET code:

    var trainingPipeline = mlContext.Transforms.Concatenate("Features", new[] { "age" })
                 // Add normalizer to features
                 .Append(mlContext.Transforms.NormalizeMinMax(outputColumnName: "Features"))
                 .Append(mlContext.BinaryClassification.Trainers.AveragedPerceptron(labelColumnName: "label", 
                                                                                    featureColumnName: "Features"))
                 // Add calibrator to produce Probability column from Score column
                 .Append(mlContext.BinaryClassification.Calibrators.Platt(labelColumnName: "label"));

    The model file saved after fitting the above ML.NET pipeline can be used with predict_proba on FileDataStream data without any caveats, and on pandas.DataFrame as long as the correct dtypes are passed to pandas when loading the data (as described in [1]).

  3. Pipeline.decision_function(): The model file saved after fitting the above ML.NET pipeline can be used with decision_function on FileDataStream data without any caveats, and on pandas.DataFrame as long as the correct dtypes are passed to pandas when loading the data (as described in [1]).

  4. Pipeline.get_feature_contributions(): This does not work with TransformerChain models loaded into NimbusML.