There are so many companies offering services related to big data. How are you different?

First of all, we focus on a very specific field in the big data universe – machine learning technologies. They make it possible to get measurable results out of the data available through optimisation and automation of decision-making processes.
Second, we base our solutions on a set of advanced proprietary technologies that has been developed for years and mastered to personalise search results, predict user interests and target ads with the highest click-through probability. Yandex as a search engine started to work with big data long before it became a buzzword, and now we apply these technologies to solve other businesses’ challenges.
Third, we make everything measurable. Our solutions are designed to bring tangible, easy-to-demonstrate business value. We simply help you make money out of the data you have, improving the business’s profitability, revenue and operations.

How are YDF solutions implemented? Is it a tool, a product, a project, or a service?

Yandex Data Factory provides machine learning-based services that solve specific problems in an automated manner, using clients’ data. We do not provide tools or scripts for data scientists, descriptive analytical reports or project-based consulting. We provide end-to-end services that are integrated into existing business processes and create direct value through cost optimisation or sales growth. We offer a range of pre-packaged services, and also deliver highly customised solutions tailored to clients’ requests.

What is needed to get started?

Available data and a defined problem with a known KPI, be it higher conversions due to targeted recommendations, or a decrease in production costs due to reduced use of expensive materials. We can help you to identify the scope of potential applications and critical areas for your industry and business to improve through the use of machine learning. Contact us to find out more.

Do I need to have an internal data science team to use Yandex Data Factory services?

No. You do not need to have any specific data science expertise in-house to start working with us. We take care of all the tasks starting from data cleaning and preparation to model development and regular updates.

Which data is used for the projects?

We use data provided by the client and, when it is applicable or defined by the nature of the task, we use external data, such as open web sources (accessible via the Yandex search engine index), or data coming from third party sources – from weather forecasts to stock market quotes.

How much data do I need to have to successfully apply machine learning solutions?

The more the better, but there is no prerequisite volume. As a rule of a thumb, for machine learning techniques to be applicable, the size of the data should start from tens of thousands of entries (e.g. customers). For each class or category there should be at least hundreds of examples of objects, with dozens of different features belonging to each one. Of course, we can easily take hundreds, thousands and much more.

Do I need to transfer the data to Yandex Data Factory?

In most cases, yes – at least some portion of the data.

Our default approach is to provide services in SaaS mode. In this case, the services are deployed in the YDF cloud and expose APIs that can be used to transfer data, generate and apply analytical models, and provide integration endpoints for the client’s internal systems.

In cases when transferring the data to the YDF cloud is not feasible (due to performance issues or privacy regulations), we can use a hybrid approach, where only a portion of anonymised data is transferred to the YDF cloud for training new predictive models. The resulting models can then be deployed on the client’s premises, reusing existing infrastructure.

How is data transfer organised?

We can either provide a “push” API for the client to send the data to, or use the client’s own web services API to query the data. For on-premises installations, we ship our products with connectors for querying data directly from Hadoop or SQL-based data storage.

All remote data transfers are conducted through a secure VPN channel.

What about data privacy and security?

To reduce concerns related to data transfer, we use a variety of options to decrease data sensitivity.
Personally identifiable information is removed, through hashing or use of random identifiers. Such an anonymising technique provides for recognition that events in the data are linked to the same person, but excludes the possibility of connecting it to the real person. Other potentially sensitive data (e.g. product prices or sales volume) can undergo one-way monotonic transformation on the client’s premises: e.g. real values are replaced with percentiles. In this case, the algorithm can still capture the necessary properties of the data by observing the distributions of the original values, but actual sensitive data cannot be exposed. Upon request, we consult our clients on such techniques.

We use end-to-end encryption for data transfer and provide secure storage options.

For regionally sensitive data, we have several data centres in the CIS, EU and Turkey.

How are services integrated?

We provide REST APIs for integration with client systems. For on-premises installations, we ship our products with connectors for querying data directly from Hadoop or SQL-based data storage.

How does the service interface look?

In the majority of cases, our services do not have visual interfaces. They are integrated with the client systems via REST APIs and embedded in the existing workflow.

What support do you provide?

With time, every model needs to be updated based on new data. We include regular model re-training as part of our service, to improve quality and continuously learn from new data.

What is the pricing scheme?

We charge a subscription fee for the service. But note – you pay only if you get results. We agree on a quality metric and minimal guaranteed effect (e.g. sales lift based on our recommendations) that is included in the service-level agreement. The fee is charged only if the service quality is delivered.

How do I know that I am actually getting results?

To demonstrate the results we conduct A/B testing, which allows the effect delivered by our service to be measured compared to existing in-house or competitive solutions solving the same problem. A/B testing is regularly repeated for quality control purposes.