Industrial IoT/ML Workshop

Classify Data

Ingest classified data directly into AWS IoT Analytics

For supervised learning you need to have classified data in oder to build a ML model. We have a classified data set that you can now ingest along with the data coming from the PLC.

  • Open your Cloud9 IDE and open the script batch-upload-to-iot-analytics-channel.py  in the folder “iot-analytics”
  • Check that the configured channel name matches the name of the AWS IoT Analytics channel you created earlier
  • Open a new terminal in the Cloud9 IDE, and execute the following commands:
    cd iot-analytics
    python batch-upload-to-iot-analytics-channel.py

  • The data in the csv file  will be uploaded in batches directly into the channel

Compared to the data that our PLC is producing the data in the csv file also contains classification of the data. You will see how this classification is used later in order to train the ML model.

Add a new Data Set to query the classified data

Let's have a look into the data:

  • Open the IoT Analytics Console
  • Create a new SQL Data Set (“Data sets” → “Create”) and call it “classified_data”


  • As SQL query please insert:
SELECT * FROM drilldata WHERE classified = 1
  • Use default values for the Data selection window and the Frequency
  • Use an infinite retention period
  • Now manually trigger a run (“Actions” → “Run now”)

You now see also the classification columns which indicates if an error occurred (hasanomaly) and which one (e.g. spindlehigh). The flag hasanomaly is the observation that we will eventually try to predict using a machine learning approach. This data set is used for training and has been labelled with the correct labels by humans (or using a service like Sagemaker GroundTruth).