-
Notifications
You must be signed in to change notification settings - Fork 62
Scoring ML.NET models in NimbusML
While scoring NimbusML models in ML.NET (the moset common scenario) is fully supported, the converse i.e. scoring ML.NET models in NimbusML comes with a few caveats.
ML.NET models are produced in two different formats: the legacy PredictorModel
format and the new TransformerChain
format. The legacy format is fully supported for scoring in NimbusML. The TransformerChain
model format has limited support in NimbusML. Below are the limitations and considerations to keep in mind when scoring this model format in NimbusML.
ML.NET models can be loaded into NimbusML as follows:
pipeline = Pipeline()
pipeline.load_model(mlnet_model_path)
Once the model is loaded, the following operations can be performed on datasets in NimbusML:
-
Pipeline.predict()
: This is fully supported on data loaded as aFileDataStream
:data = FileDataStream.read_csv(data_path) scores = pipeline.predict(data)
If, instead, the data is loaded as a
pandas.DataFrame
, the dtypes for the columns must be explicitly specified, and should correspond to the types that were used to define the model input in ML.NET. For example, if theage
column in the UCI Adult Income dataset was loaded asfloat
orSingle
in ML.NET, it must be explicitly specified to be of dtypenumpy.float32
, otherwise the dtype inferred by pandas will be used and cause a type mismatch in ML.NET.data_df = pandas.read_csv(data_path, dtype={'age': numpy.float32}) scores = pipeline.predict(data_df)
-
Pipeline.predict_proba()
: An ML.NETTransformerChain
model will only produce a Probability column if a calibrator is explicitly added to it. Otherwise, there will only be the PredictedLabel and Score columns.Moreover, ML.NET does not automatically add a normalizer for the features, while NimbusML does. If there is no normalizer present in the
TransformerChain
model, the scores produced by it will have very large magnitudes, and consequently the calibrated Probability will be close to 0 or 1, which is not very meaningful. Therefore, it is a good idea to add a normalizer when training your ML.NET model if you want to usepredict_proba
in NimbusML.ML.NET code:
var trainingPipeline = mlContext.Transforms.Concatenate("Features", new[] { "age" }) // Add normalizer to features .Append(mlContext.Transforms.NormalizeMinMax(outputColumnName: "Features")) .Append(mlContext.BinaryClassification.Trainers.AveragedPerceptron(labelColumnName: "label", featureColumnName: "Features")) // Add calibrator to produce Probability column from Score column .Append(mlContext.BinaryClassification.Calibrators.Platt(labelColumnName: "label"));
The model file saved after fitting the above ML.NET pipeline can be used with
predict_proba
onFileDataStream
data without any caveats, and onpandas.DataFrame
as long as the correct dtypes are passed to pandas when loading the data (as described in [1]). -
Pipeline.decision_function()
: The model file saved after fitting the above ML.NET pipeline can be used withdecision_function
onFileDataStream
data without any caveats, and onpandas.DataFrame
as long as the correct dtypes are passed to pandas when loading the data (as described in [1]). -
Pipeline.get_feature_contributions()
: This does not work withTransformerChain
models loaded into NimbusML.