Yandex: letting the cat out of the bag
Posted: 18 July 2017 | By Charlie Moloney
Yandex, a Russian technology company whose search engine, translation, weather, photo analysis, and traffic information services rival Google’s, announced today that it’s open-sourcing a machine learning (ML) technology.
The technology, which can be downloaded and installed here (for all you ML tech developers) on the 64-bit version of Python (a coding language), is called CatBoost and is based on a form of ML known as gradient boosting.
But what is gradient boosting, and when is it used? We asked Misha Bilenko, head of AI and research at Yandex, who said that “Gradient boosting is the unsung hero of the machine learning world”.
“It’s not as fun to talk about in the world of latest and greatest, but it has been around for many years”, said Bilenko, who joined Yandex after a decade with Microsoft in January of this year.
“In industrial machine learning solutions”, Bilenko explained, “you will have multiple inputs. Take image search: somebody enters the text query, and there are lots of images that we can serve them.
“For every image, we could extract information from just the contents of the image, from the pixels”, as in the case of image recognition technology. This, said Bilenko, is where deep learning and neural networks shine.
“But there’s also text surrounding it, there’s also the domain on which the image is located, there is the history of peoples’ clicks on the image or on the domain, and so on.
“And as a result, you get these multiple inputs. In the end, you want the final ranking of the images to be driven not by just how relevant you think the contents of the image is, but also by these additional factors.
“To take these multiple inputs and combine them, where some of them are deep learning based, and some of them could be text based, and so on, etc., to bind it all together, that’s where gradient boosting usually shines.
“It is very heavily used by freelance data scientists, students, and people just playing Kaggle [an online platform for data-mining and predictive-modelling competitions], to being a really core tool at the large companies like Microsoft and Google.
Yandex claims that this is not a commercial move, and their aim is to encourage innovation in gradient boosting. “Its role as a working horse in the machine learning industry is why we feel like it’s important to put state-of-the-art, industrial grade tooling out there”, said Bilenko.