Date 03/26/19

The Way You Collect Data Can Make or Break Your Next AI Project

If you want to successfully solve a problem using machine learning, the foundation of your work should not be a set of fancy algorithms. Rather, your first step should be the collection of high quality data. After all, data is what the machine learning model will learn from. And if the data is incorrect or bad, or inaccurately labelled, there’s no fancy algorithm in the world that can save you.

We know this at Imagimob because we’ve tried and failed plenty of times before finally cracking the formula for successful AI projects. And, according to our learnings, the first set of priorities for anyone looking to solve a problem using machine learning should be figuring out what data to collect and what tools to use to collect it.

Let's start with the tools
When we first started running AI projects, we had no tools. So we built some extremely crude ones. At Imagimob, we build machine learning models that act on time-series information. Time-series information, or data, is basically made up of signals or events happening over time, like signals from sensors, an engine, or a radar.

So, we built tools to collect and store this kind of data in a very efficient manner only to find out that humans are not very good at spotting patterns in this kind of data. It's very difficult to understand what raw sensor data actually mean, as you can see in the picture below. So now, we had a lot of high quality data and no way of accurately labelling it.

This meant that the machine learning model had the right data, but with the wrong annotation. Hence the model was being deceived, by design. What we learned the hard way was that, in our branch of machine learning, we need to collect a lot of meta data. And not for the algorithms, but to enable us humans to teach the algorithms.

So we developed data collection tools that can collect video and audio together with the time-series data, and even annotate the data on the fly. This means that we can start labelling on the spot. Additionally, if we need to collect data over the course of days or months, we now have the ability to go through the data as if we're there in real-time, with the ability to freeze time, rewind, and get those labels just right.

But even with the perfect tools, you need to know what data to collect
Here, it becomes interesting. Of course, what data you should collect highly depends on your specific problem, and machine learning can be applied to so many areas that it's difficult to generalize. But, in any case, we’ve got some derived wisdom to share with you: Measure performance from the end-user perspective.

In some cases, the "real" world differs from the academic world and machine learning academia is a good example. In machine learning academia, accurate numbers are everything.You "win" if you create a machine learning model that solves a problem with the highest degree of accuracy, regardless of whether or not the collected data is representative of a real world scenario, or if an accuracy number even makes sense as a measurement in that scenario.

Here’s an example:
Let's say that you are designing a machine learning model that detects whether or not a person is falling by reading the motion sensor information from a sensor attached to their body. Let's say that you’re measuringt he performance of your solution in the standard academic way. Your measurement might look like this:

Correctly detecting a fall, when an actual fall has occurred: 95% accuracy
Correctly detecting a fall did not occur, when an actual fall did not occur: 99.5% accuracy

These numbers look impressive—95% and 99.5%! Almost unheard of. Too bad it doesn't mean anything in the real world.

What you should have done, is collect data from real people for extended periods of time. That way, you can translate the second piece of information into another measurement, the number of false triggers per user per day. By doing so, you would see 99.5% accuracy translated into several false triggers per day and a product that is, quite frankly, not good enough for the market.

The most important tip we can offer for your next AI project is to collect this kind of data and measurements as early as possible. Not only will this result in a better product. It will also save you time. A lot of time.

And, of course, the best way to collect this data is using Imagimob Capture, our tool for data collection. Contact us to learn more.

Alexander Samuelsson, CTO and Co-Founder

About the video below: In a project with Husqvarna Group we are doing data collection together with professional foresters.

The Way You Collect Data Can Make or Break Your Next AI Project

LATEST ARTICLES

October 2025 Studio Release

Behind the Scenes: An Interview with the CEO and M...

Generative AI on the Edge for DEEPCRAFT™ users—exp...

Behind the Scenes: An Interview with the Product D...

Starter Models are here! But what are they?

Generative AI on the Edge: What Does the Future Ho...

February 2025 Studio Release

4 Ways to Leverage Generative AI on the Edge

Delivering world class edge AI - watch the video

November release of DEEPCRAFT™ Studio

New research on data quality's role in model effic...

September Release of Imagimob Studio

Imagimob at tinyML Innovation Forum 2024

Imagimob Studio 5.0 has arrived!

May release of Imagimob Studio

2024 State of Edge AI Report

What is Edge AI?

March release of Imagimob Studio

What is tinyML?

February release of Imagimob Studio

Introducing Graph UX: A new way to visualize your ...

Imagimob Ready Models are here. Time to accelerate...

Deploying Quality SED models in a week

An introduction to Sound Event Detection (SED)

Imagimob condition monitoring AI-demo on Texas Ins...

Alert Vest – connected tinyML safety vest by Swanh...

Video recording from tinyML AutoML Deep Dive

Edge ML Project time-estimates

An introduction to Fall detection - The art of mea...

Imagimob to exhibit at Embedded World 2022

The past, present and future of Edge AI

Recorded AI Tech Talk by Imagimob and Arm on April...

The Future is Touchless: Radical Gesture Control P...

How to build an embedded AI application

Don’t build your embedded AI pipeline from scratch...

Imagimob @ CES 2022

Imagimob AI in Agritech

Deploying Edge AI Models - Acconeer example

Imagimob AI used for condition monitoring of elect...

Tips and Tricks for Better Edge AI models

DEEPCRAFT™ Studio (formerly Imagimob Studio) integ...

Recorded Webinar - Imagimob at Arm AI Tech Talks o...

Gesture Visualization in Imagimob Studio

New team members

Imagimob featured in Dagens Industri

Customer Case Study: Increasing car safety through...

Veoneer, Imagimob and Pionate in joint research pr...

Edge computing needs Edge AI

Imagimob video from tinyML Talks

Agritech: Monitoring cattle with IoT and Edge AI

Arm Community Blog: Imagimob - The fastest way fro...

Imagimob video from Redeye AI seminar

Webinar - Gesture control using radar and Edge AI

tinyML article with Nordic Semiconductors

Edge AI for techies, updated December 11, 2019

Article in Dagens Industri: This is how Stockholm-...

The New Path to Better Edge AI Applications

Edge Computing in Modern Agriculture

Our Top 3 Highlights from Hannover Messe 2019

The Way You Collect Data Can Make or Break Your Ne...

AI Research and AI Safety

Imagimob and Autoliv demo at CES 2018

Wearing Intelligence On Your Sleeve