Blog

Refine search

2018.6.11 - 

Getting started with Machine Learning in MAGELLAN BLOCKS

Hello, everyone! I'm Satoshi, a consultant here at Groovenauts.

My job is to work with our customers to create Machine Learning systems for their businesses using MAGELLAN BLOCKS. Today I'd like to share some of the things I've learned from my many experiences helping people get started with Machine Learning and BLOCKS.




“Why do you want to use Machine Learning?”

I’ve talked about this question with many different customers, but most of their responses boil down to one of the following:

  • Our company is facing an issue that we want to solve with Machine Learning.
  • We want to try something new.
  • My boss said he wants us to do something with AI.

Actually, I hear that last response pretty often! But whatever the reason for starting with Machine Learning, the first steps don’t change much:

First, you need to start thinking with your business problems or goals in mind.

The actual process for getting started with BLOCKS Machine Learning is as follows:

  1. Break down the issue
  2. Decide what to predict
  3. Think about how you’ll use the results for your business
  4. Prepare your data
  5. Train, predict, and evaluate

I’ll explain each of the steps in more detail below.


1. Break down the issue

Every business has goals and challenges that they face. In this step, we break down those goals and challenges into formulas for how we can achieve or overcome them.

For example, let's take a simple goal like, "we want to raise our profits." We could break this goal down into the following:

  • Sales are too low
  • Cost for materials is too high
  • Our loss rate is too high

Let’s take a closer look at “our loss rate is too high.”

If we consider this loss rate to mean [Unsold Products ÷ All Products], we could improve our loss rate by lowering the number of unsold product.


2. Decide what to predict

Personally, I think this might be the most important step. Rushing through it can lead to problems later on, so be careful here. When deciding what to predict, we should consider the following three points:

  1. Think about what to predict and actions as a set
  2. Think about the unit of your prediction
  3. Think about things that affect what you will predict for.

a) Think about what to predict and actions as a set
In order to reduce our unsold stock and lower our loss rate, we could predict our sales and produce a more appropriate amount of product. Sometimes people decide what to predict and get started before they’ve considered the actions they will take afterwards.

b) Think about the unit of your prediction
This varies depending on the kind of thing you will predict, but the unit for your prediction could be something like the following:

  • For demand predictions:
    Predict for each event, hour, day, week, month, or year
  • For a manufacturer/factory:
    Predict for each product, manufacturing lot, or type of product
  • For sales contracts:
    Predict for each contract, person, company, or region

Of course, it’s also common to predict for combinations like product types × day.

c) Think about things that affect what you will predict for.
It’s important to this about this in terms of your business. Some business-related things that could have an effect include:

  • The time (month, day of the week, whether it is a holiday, hour)
  • Weather (temperature, humidity, sunlight hours, daytime hours)
  • Some sort of amount (# of people, # of things, # of times, amount of money, age, distance, size)
  • Various type (gender, industry, marital status, region)
  • Yes/no (has experience or not, possesses something or not, etc.)

Again, it’s important to think about these things in terms of your business. Also, it’s okay if the data you have at this point isn’t perfect.


3. Think about how you will use the results for your business

Once you’ve made a predictive model, how will you use the results of your predictions in your business?

For example, when predicting daily sales, you could think about trying to predict the number of sales per day at one location, or to create a 30-day sales forecast from your headquarters.

It’s also important to remember that Machine Learning won’t give you 100% accurate predictions, and you’ll need to plan how to handle any innacuracy. For example, will you use the predictions exactly as they are, or will you leave the final judgement up to employees to decide.

As you think about these things, you will need to make certain adjustments to your data. If you planned to create a one-month sales prediction, you wouldn’t be able to get accurate weather forecast that far ahead. As such, you would probably want to either remove weather data as one of the input features used to train your model, or use the model as more of a simulator.


4. Prepare your data

For the next step, you take the factors you think will impact the thing you will predict for and turn them into input features for your training data. Also, you may need to make some adjustments to your data if it’s in a different unit than the prediction you want to make.

If you will predict daily sales for various products, you could gather and combine point-of-sale data with your product master data. You could then align it with input feature data for things like the date, weather, whether there was an event nearby, and the like. The following is an example list of this kind of input features:

  • Product type (A,B,C,D,E, etc.)
  • Month (1–12)
  • Day of the week (0–6)
  • Holiday or not (0,1)
  • Weather (sunny, cloudy, rainy, snowy)
  • High temperature
  • Low temperature
  • Precipitation
  • Running a sale or not (0,1)
  • Event happening nearby or not (0,1)
  • Sales numbers (daily for each product)

Once you’ve decided your input features, you’ll need to actually create your training data from the initial data that you have. There are various different methods and tools you can use for preparing your data, a few of which are listed below:

  • Excel/Google Sheets
  • Various business intelligence/data analysis tools
  • The MAGELLAN BLOCKS Data Editor
  • The MAGELLAN BLOCKS Flow Designer

5. Train, predict, and evaluate

In MAGELLAN BLOCKS, you can use a Model Generator to train a model, then use that model to make predictions from a Flow Designer. I won’t get into the specifics of how to do those things in this blog post. Rather, I want to talk about evaluating your model, which is a very important step.

Just because you’ve trained a Machine Learning model and can make predictions doesn’t mean you’re ready to immediately use that model for your business. First, you need to evaluate whether or not the model is accurate enough to support your business needs.

If you had data for Jan 1, 2015 through Mar 31, 2018, you shouldn’t use all of that data to train your model. Rather you should split off some of the data to use more making predictions with your model to evaluate it. For example, you could use the first three years of data to train your model, then use the model to predict for the last 3 months of data. That way you can evaluate your model’s accuracy.

With our example of daily sales, we could try making a graph like the following to show the predicted sales versus actual sales for that 3 month period.

By making a graph like this, you can more easily see where your model is accurate and where it isn’t. Then, the next important step is to ask yourself, “Why isn’t the model accurate here? What could be happening on these days?”

As you answer those questions, you might decide to add or change your input feature data and retrain your model. Then you can make predictions again and evaluate your revised results.

However, even if our second graph clearly shows better results, we can only get a general sense of the model’s accuracy this way. We’ll need to use some other methods to evaluate our model.

For a sales forecast, we could calculate the average error rate and compare that for each of our models to see how well it improved. In the example graphs I posted above, the average error improved from 160 to 110. There are various other methods for evaluating models that I will probably introduce in later blog posts.

Prepare your data. Train a model and make predictions. Evaluate your results. Reconsider your input data. Then repeat.

Example Machine Learning cycle

By repeating this process, you can create a model that is able to help you solve a business issue or meet your business goals.


In conclusion

With MAGELLAN BLOCKS, you can easily do each of the steps in this cycle, even if you don’t have experience with Machine Learning. If you’re interested Machine Learning but haven’t been sure how to start, I hope you’ll refer to this post and give BLOCKS a try!