OpenCV AI Kit - An Introduction to OAK-1 and OAK-D

Introduction

Okay so I just got my OpenCV AI Kit and I am really excited to play with. But even though Iv seen a bunch of features on Kickstarter, I don’t quite understand the true capabilities of this device and why I should even care. Well in this video we going take a deep dive into this device and find out exactly what its capable off, and at the end of the video I will share my thoughts on if this device is worth all the hype as well how it matches up to the competition.

What is the OpenCV AI Kit

So before we get into that, lets get into the first question of What exactly is the OpenCV AI Kit or OAK. According to the creators, it is a tiny, powerful, open source Spatial AI system. So you can think of it as if an embedded 4k camera and neural compute stick had a baby, make that two babies OAK-1 and OAK-D

Difference between the OAK 1 – OAK D

Now What sets these two devices apart is that the OAK-1 has automatic motion-based lossless digital zooming which means that the sensor has a higher resolution that the final display resolution of the image whereas the OAK-D on the other hand has Stereo Depth cameras which allows for 3D object Localizations and object tracking in 3D space which is really cool. In particularly Spatial AI

Spatial AI, Ritz What is that?

Well my friend, Spatial AI is the capability of an AI system to reason based on not just what it is looking at but also how far things are located. So OpenCV AI Kits specifically the Depth (OAK-D) allows for real time Spatial AI utilizing its RGB camera for deep neural inference and a stereo camera for depth estimation.

Capabilities

Speaking of Capabilities, let see what this device is capable of. First up in terms of Device Specifications for the camera

Both OAK devices have:

The IMX378for the image sensor by Sony which allows a max frame rate of
60FPS with a resolution of
12MP which is slightly higher than 4K.
It also has a display Field of View or DFOV of 81 degrees, and
Autofocus

On the OAK-D, however we have:

Additional Stereo Cameras, With Synchronized global shutter, so that they capture the image at exactly the same time
The image sensor is bit different using an Omnivision OV9282
which has a lower resolution of 1280x800
which can run at over 120FPS. It has the same FOV has the main camera
but with an F-number of 2.2.

Myriad X Specs

Now having these high-end cameras are great but whats the use if you can’t utilize its full power. This is where the brains of the kits come into play. They are using the Myriad X Visual Processing Unit or VPU for processing the visual information from the cameras. If you ever used the Neural Compute Stick from Intel, you should be familiar with the power that this AI chip provides.

So all OAK modules with the Myriad X allows for a
Computer capacity of 4 Trillion Ops/Sec – Comparing this to the
Jetson Xavier NX which has 21 TOPS and the
Google Edge TPU has 4TOPS
It has 16 High performance Shave Cores – Shave stands for Streaming Hybrid Architecture Vector Engine which is an architecture was designed primarily for the acceleration of machine vision processing
20+ Vision Accelerators
And 450GB/sec of memory Bandwidth.

Compatibility

Awesome okay I got this device now what can use it with?

Okay Windows – Check
Ubuntu - check
Mac, uuuhh I don’t have one, too expensive for me but from sources – Check
Raspberry, ROS2 and Jetsons – Check

Hardware Layout

Looking at the Physical Hardware, At this time I could only get my hands on the OAK-1 due to popular demand of the OAK-D. But essentially the device is quite small with the size around 65x36mm.

First things first, This thing propping out here is the 12MP RGB camera sensor that we spoke about. It seems bulging cameras are the in thing these days

Over here is the USB C type connector which allows for power and transmission of data. Sometimes you’ll also want to hit reset, So there’s an app, I mean there’s a button for that

Underneath the heat sink, you will find the peripherals such UART, SPI, I2C and Several GPIO Pins

Software

Okay so we had a look at the hardware lets look at its capabilities

To get the best idea of high level features of the kits, Lets quickly browse through the Kickstarter page

Okay so they show the OAK Kits here. Okay Detect and track anything. Nice to have when you are playing hide and seek at 30FPS.
Scrolling down. You can also stream your child crying in Realtime 4k in the H.265 codec. Lets click play. Oh sorry my bad, you can stream your child running bubbles in 4k 30FPS
Next combine Live Depth and AI. Hmmm looks like Brandon is experiencing some rapid temperature changes there, Better get that check out.
Just kidding, the color represents the depth data from the camera
The last one we have here is easily train your own neural networks, that’s really nice. From the demo I can train Skynet it pick strawberries for me :P. I really like strawberries.

Programming Languages

Okay you got this kit, its plugged in and ready to code, now what language do you use. So it comes natively with Python examples so that’s really great. But C++ is also support as the API was written in C++ with PyBind11 for python bindings.

There is also support for MicroPython particularly on the Myriad X in the Pipeline Builder which we’ll discuss in a moment.

Out of the Box Examples

Now this is where I was most impressed. So normally some manufactures give you a few ready-made examples and then let you venture off into unknown territory to develop other common apps. On the OpenCV Kits. You get a lot right out of the box. Let’s take a look.

So we have object detection which you can use for detecting fruit. They also have an application for mask detection, we covered this in my YOLOv4 course. Speaking off YOLOv4, Brandon mentioned that this kit will soon be able to implement tiny YOLOv4. So can’t wait for that.
Moving on they have face detection, hey look its Brandon and Satya. Wonder what Brandon saw that made him so surprised.
Vehicle detection and number plate detection and OCR… wow I really this.
Pedestrian detection with reidentification, nice
Pose estimation with 3d Location. This would be really great if we could integrate it into unity. You know for avatar overlaying.
Text Detection with OCR for when you rather read wrestling comic books, rather than watching WWE.
Lastly semantic segmentation but it depth assisted. Cool

I must say Im really impressed with comes right out of the box. What more do you need. I mean of course it is also Fully OpenVINO compatible should you wish to go deeper with the tools.

Pipeline Builder

When I spoke to Brandon during my interview, he mentioned that the Kits had a pipeline builder which would allow you to drag an drop blocks which you can generate as a script that would run your image processing pipeline. Now instead of developing this builder from scratch, they leveraged Pyflow which is a general purpose visual scripting framework for Python.

So for example If I want to build a face detection app, I can just drop a face detection block and then drag any other transformation and parameters for customization. I think this will be quite useful for rapid prototyping.

Stretch Goals

Lastly lets look at all the unlocked goals.

So, they $250k mark the OAK-D kits will get IMU. This is really nice, I mean IMU’s can be used to assist with image stabilization and motion-based deblurring.
They also plan to have power over ethernet variant mean that you can run a long cable to connect to your OAKs. By long I mean the length of a football field. Whereas USB is limited to just 5 meters
The 1Miliion mark, the kits will have an aluminum or aluminum/ case. I think I will just 3d print mine
There’s options of Wi-Fi and Bluetooth versions. Now I wonder if they will send the video over Wi-Fi or just use to transfer information like for IoT applications.
And the final milestone, Is the FREE model suite. Wait, What do they mean high quality. Do they mean models with high accuracy and frame rate? Or state of the art models?

Conclusion

Awesome so we covered a lot of features, and my opinion of this device is that its has a lot, I mean a lot of features and out of the box items, that I would classify as this hardware one of the best options for Embedded Computer vision AI. Not only for its capabilities but also for its form factor and flexibility. Its easy to use and get started which we shall see in the upcoming tutorials but because of its partnership with OpenCV, I can only imagine disruptive this device will be to the market.

If you are interested in Enrolling in my upcoming course on YOLOv4 and OAK then sign up over here when it gets released - Click Here

Search This Blog

Artificial Intelligence