ML.NET Tutorial - Get started in 10 minutes

Train your model

Now, you'll train your model with the yelp_labelled.txt dataset.

Model Builder evaluates many models with varying algorithms and settings based on the amount of training time given to build the best performing model.

  1. Change the Time to train, which is the amount of time you'd like Model Builder to explore various models, to 60 seconds (you can try increasing this number if no models are found after training) . Note that for larger datasets, the training time will be longer. Model Builder automatically adjusts the training time based on the dataset size.

  2. You can update the optimization metric and algorithms used in Advanced training options, but it is not necessary for this example.

  3. Select Start training to start the training process. Once training starts, you can see the time remaining.

  4. Model Builder Train

Training results

Once training is done, you can see a summary of the training results.

Model Builder Training Done

  • Best MacroAccuracy - This shows you the accuracy of the best model that Model Builder found. Higher accuracy means the model predicted more correctly on test data.
  • Best model - This shows you which algorithm performed the best during Model Builder's exploration.
  • Training time - This shows you the total amount of time that was spent training / exploring models.
  • Models explored (total) - This shows you the total number of models explored by Model Builder in the given amount of time.
  • Generated code-behind - This shows you the names of the files generated to help consume the model or train a new model.

If you want, you can view more information about the training session in the Machine Learning Output window.

After model training finishes, go to the Evaluate step.

In your terminal, run the following command (in your myMLApp folder):

Terminal
mlnet classification --dataset "yelp_labelled.txt" --label-col 1 --has-header false --name SentimentModel  --train-time 60

What do these commands mean?

The mlnet classification command runs ML.NET with AutoML to explore many iterations of classification models in the given amount of train time with varying combinations of data transformations, algorithms, and algorithm options and then chooses the highest performing model.

  • --dataset: You chose yelp_labelled.txt as the dataset (internally, the CLI will split the one dataset into training and testing datasets).
  • --label-col: You must specify the target column you want to predict (or the Label). In this case, you want to predict the sentiment in the second column (zero-indexed columns means this is column "1").
  • --has-header: Use this option to specify if the dataset has a header. In this case, the dataset doesn't have a header, so it's false.
  • --name: Use this option to provide a name for your machine learning model and related assets. In this case, all assets associated with this machine learning model will have SentimentModel in the name.
  • --train-time: You must also specify the amount of time you'd like the ML.NET CLI to explore different models. In this case, 60 seconds (you can try increasing this number if no models are found after training). Note that for larger datasets, you should set a longer training time.

Progress

While the ML.NET CLI is exploring different models, it displays the following data:

  • Start training - This section shows each model iteration, including the trainer (algorithm) used and evaluation metrics for that iteration.
  • Time left - This and the progress bar will indicate how much time is left in the training process in seconds.
  • Best algorithm - This shows you which algorithm has performed the best so far.
  • Best score - This shows you the performance of the best model so far. Higher accuracy means the model predicted more correctly on test data.

If you want, you can view more information about the training session in the log file generated by the CLI.

Continue