The AI conductor: machine learning on machine learning by Veritone
Posted: 7 August 2017 | By Charlie Moloney
Veritone, a company which uses artificial intelligence (AI) to gain actionable insights from unstructured data like video and audio, announced last week that their system, Conductor, is operating at 82% accuracy, as opposed to the previous best of 75%.
This success may herald a new dawn for Veritone, who announced their concerns over whether they may ever achieve or sustain profitability in May this year. Chad Steelberg, Veritone founder and CEO, has focused on investing in an AI platform and moved away from Veritone’s media buying and placement business, traditionally seen as more lucrative.
Developers told Access-AI that Veritone’s improvements are due to Conductor, a technology which acts as a kind of managing editor, optimising the results of the 70 third-party engines that form the Veritone platform, in a seemingly unique instance of machine learning (ML) being applied to machine learning.
The screenshot below shows an example of Veritone collecting instances of media figure Emily Chang. Veritone recognizes her face via public media and stores them in a database, along with a transcription of what she says
How does the Veritone platform work?
The Veritone platform can be applied to audio and video clips, even if they feature lots of background noise or informal speech, and extract a transcript of what is said, but also can highlight specific objects, faces, logos, and even sentiments. A user can skip ahead to any part in that media file to find precisely what they’re looking for.
This can be applied as a transcription tool, but the image and audio recognition technology can even be applied in such fields as crime prevention, and Veritone believe the technology could accurately review Police bodycam footage and extract information.
The screenshot, below, is showing Conductor using for object recognition of police body camera
Why is Conductor useful?
The orchestration tool, Conductor, can decide which technology should be applied to which part of a clip. The various tools on the Veritone platform have different strengths and weaknesses, and Conductor attempts to harmonise their abilities by deciding which one should be applied when and where.
To give a real-world example: in the financial compliance arena, there’s billions of calls that are happening every year, “in fact we figure that there’s about 4 billion hours of those calls”, said John Ward, Global Marketing Executive at Veritone.
“we will continue to inch higher
The calls are often in the vernacular of traders, or conversational language during calls with customers. Certain tools in the Veritone platform are specifically adept at interpreting how traders talk, and others are designed to understand informal, conversational language. There are even specific engines which specialise in accurately recording people’s names.
The Conductor tool should be able to determine which of the engines in the Veritone platform are best for these three specific tasks, then identify at which point in the call they need to be applied, and thereby get a complete picture of what is said in the financial trading call, which will help determine whether it is in compliance.
What will happen now?
“Conductor is training itself through supervised and unsupervised learning over the course of time to look for exactly how to optimise across multiple transcription outputs”, Tyler Schulze, Strategic Development Executive with Veritone told Access-AI “We believe that we’ve just scratched the surface, and we will continue to inch higher”.
“What we anticipate happening as we get closer and closer to 100% accuracy level”, Schulze continued, “the third-party engine providers”, who build on the Veritone platform, “will need to compete for speed, accuracy, and cost in the new ecosystem we’ve created”.