ML_SCORE
scores a model by generating predictions using the feature
columns in a labeled dataset as input and comparing the
predictions to ground truth values in the target column of the
labeled dataset.
The dataset used with
ML_SCORE
should have the same feature columns as the dataset used to
train the model but the data sample should be different from
the data used to train the model; for example, you might
reserve 20 to 30 percent of a labeled dataset for scoring.
ML_SCORE
returns a computed metric indicating the quality of the model.
A value of None
is reported if a score for
the specified or default metric cannot be computed. If an
invalid metric is specified, the following error message is
reported: Invalid data for the metric. Score could
not be computed.
Models with a low score can be expected to perform poorly, producing predictions and explanations that cannot be relied upon. A low score typically indicates that the provided feature columns are not a good predictor of the target values. In this case, consider adding more rows or more informative features to the training dataset.
You can also run
ML_SCORE
on the training dataset and a labeled test dataset and compare
results to ensure that the test dataset is representative of
the training dataset. A high score on a training dataset and
low score on a test dataset indicates that the test data set
is not representative of the training dataset. In this case,
consider adding rows to the training dataset that better
represent the test dataset.
HeatWave ML supports a variety of scoring metrics to help you
understand how your model performs across a series of
benchmarks. For
ML_SCORE
parameter descriptions and supported metrics, see
Section 3.10.8, “ML_SCORE”.
Before running
ML_SCORE
,
ensure that the model you want to use is loaded; for example:
CALL sys.ML_MODEL_LOAD(@census_model, NULL);
For information about loading models, see Section 3.9.3, “Loading Models”.
The following example runs
ML_SCORE
to compute model quality using the
balanced_accuracy
metric:
CALL sys.ML_SCORE('heatwaveml_bench.census_validate', 'revenue', @census_model,
'balanced_accuracy', @score);
where:
heatwaveml_bench.census_validate
is the fully qualified name of the validation dataset table (schema_name.table_name
).revenue
is the name of the target column containing ground truth values.@census_model
is the session variable that contains the model handle.balanced_accuracy
is the scoring metric. For other supported scoring metrics, see Section 3.10.8, “ML_SCORE”.@score
is the user-defined session variable that stores the computed score. TheML_SCORE
routine populates the variable. User variables are written as@
. The examples in this guide usevar_name
@score
as the variable name. Any valid name for a user-defined variable is permitted (e.g.,@my_score
).
To retrieve the computed score, query the
@score
session variable.
SELECT @score;
+--------------------+
| @score |
+--------------------+
| 0.8188666105270386 |
+--------------------+