Industrial IoT/ML Workshop

Store and Analyze Data in the Cloud

Forwarding data to AWS IoT Analytics

While the data from the drill now arrives in AWS IoT, we are not using or storing it in any way. AWS IoT Analytics provides a managed service to prepare IoT data for analytics and machine learning. Thus, as a next step, we use this service to properly handle our drilling data. Open a new browser tab with the AWS Management console and open IoT Analytics to perform the following steps:

  • Open the AWS IoT Analytics Console
  • Open “Channels” and click “Create a channel”
  • Give a name to the channel: name it “drilldata” then you don't need to change something later
  • Select “Service-managed store” for storing the data. Optionally, you could specify a S3 bucket here instead.
  • AWS IoT Analytics also allows data retention policies to save cost and manage space, but for this workshop, we will keep the data infinitely. Thus, keep the selection “Indefinitely” for the data retention.
  • Finally, click “Next” at the bottom of the page.
  • On the next page enter as “IoT Core topic filter” which is the topic where the CombineEvents Lambda publishes the events
  • Click “See Messages” to ensure there are messages coming in. This may take up to 30 seconds.
  • Click “Create new” to create a new IAM Role to give IoT Analytics access to AWS IoT Core
  • Click “Create Channel”

Process and Store the Data

Now that the data is being forwarded from IoT Core to IoT Analytics, we need to create a pipeline that starts the processing of the data so that it can be used for machine learning and inference.

  • Change to the Pipeline section of Iot Analytics
  • Click “Create Pipeline”
  • To keep things simple, we will call this pipeline also “drilldata”
  • Click “Edit” to select the source for the pipeline and select the “drilldata” entry (this is the channel we have created above)
  • Click “Next”

After clicking next, IoT Analytics will automatically try to determine the format of the data that is coming from the channel into the pipeline. This may take a few seconds, but once it is finished, a sample of the data is displayed as shown below. It is sufficient here to click “Next”.

  • Click “Next”
  • Don't include Pipeline activities and click again “Next”

The end of a pipeline is always a datastore. Since we have not created one so far, we have to create one and then configure it to be the end of the pipeline:

  • Click “Create new data store” and call it “drilldata” as well
  • Click “Create data store”

Click “Create pipeline” to finish the creation of the pipeline.

Query the Data Store

In order to view the data you need to create a Data Set. The Data Store so far only stores the incoming data at the end of the pipeline. The Data Set is created as follows:

  • In the IoT Analytics main screen click on “Analyze”
  • Click on “Create a data set”

For our application, we will use a SQL based data set:

  • Give it a name (call it “all_data” then you don't need to change anything later)
  • Select the previously created data store (click on “Edit”). It should be called “drilldata”.

While it is possible to only select some parts of the data or data that meets certain conditions, this is not necessary for our application. Thus, we can accept the default generated query similar to SELECT * FROM sensordata and click “Next”.

For “data selection filter” select “None” and click “Next” 

Specify a schedule, for example every 15 Minutes, which means the data set will be updated automatically at the selected frequency.

For the retention period leave the configuration on “Indefinitely” (you can also define a time if you want) and click “Create data set”

The Data will now travel from AWS Greengrass to AWS IoT Core and from there to AWS IoT Analytics. It will be stored internally in AWS IoT Analytics and every 15 Minutes a Dataset (the result from the SQL Query) will be available.

Do the following to test it:

  • Go to the “Analyze” → “Data sets
  • Select the created Data set
  • If 15 Minutes have not passed yet, click on “Action” → “Run now” to trigger a Data set update manually
  • You will see the result in the “Result preview” and you can download a CSV file with the Data set content. It may take a couple of seconds for the result preview to appear.  

An example of the result shown in the preview window is shown below: