Report: Scale Business Breakdown & Founding Story

Thesis

It has been over ten years since Marc Andreessen’s famous pronouncement that software was eating the world. Whether in shopping, entertainment, healthcare, or education, software has become a key component of almost every aspect of life. Now, however, artificial intelligence (AI) and machine learning (ML) are starting to eat software. There are early examples of this, such as Tesla’s Autopilot, Waymo’s driverless taxis, GitHub Copilot, TikTok content recommendations, or on-the-fly context-aware automated website guides like Ramp’s AI tour guide. This could increase, as generative AI directly improves software engineering productivity by 20-45%, primarily by reducing the time it takes for certain tasks such as generating initial code drafts, code correction and refactoring, root-cause analysis, and generating new system designs. But all of this stems from one key thing: data. As Clive Humby pointed out in 2006, “Data is the new oil.”

A persistent issue with building AI / ML applications has been a lack of well-organized data necessary for developing models. This is a self-reinforcing cycle where initial data collection leads to improved AI models, enhancing user experience. Better user experiences attract more users, leading to more data collection. Over time, this cycle continually upgrades the quality of the AI model and the user experience. A scarcity of data extends the timelines required to build AI models and reduces their accuracy. Without a strong dataset to train these AI applications, applications can often exhibit decreased capabilities and increased vulnerabilities. Moreover, a lack of data can often prevent the application from being developed altogether. For example, in medical research, the limited availability of data for diagnosing rare diseases and conditions makes building an accurate AI application for identifying such conditions challenging and often unreliable.

One meaningful benefit of building AI applications has been the dramatic increase in the volume of data available. And the data being generated doesn’t just stem from the digital world, but the physical world as well. As early as the 1960s, technology started impacting the physical world through projects like the Stanford Cart. Through advances like computer vision, sensor fusion, robotics, and autonomous vehicles, the volume of physical data has increased significantly. However, in order to leverage these types of data in building AI applications the data needs to not just be available, but be organized.

That’s where Scale AI comes in. Scale AI’s vision is to be the foundational infrastructure behind AI/ML applications. The company began with data labeling and annotation used in building AI/ML models. Data labeling and data annotation involve tagging relevant information or metadata in a dataset to use for training an ML model. To train and build any ML algorithm, the model needs to be grounded on accurate data that is correctly labeled. Scale AI’s core value proposition is built around ensuring companies have correctly labeled to allow them to build effective ML models. By building comprehensive datasets to train AI/ML applications, Scale AI seeks to enable developers to build accurate applications with increased capability and limited vulnerability.

Founding Story

Scale AI was founded by MIT dropout Alexandr Wang (CEO) and Lucy Guo, Carnegie Mellon dropout and Thiel fellow.

In 2015, Wang enrolled at MIT to study computer science, where he received perfect grades in his first year. Wang had the insight that artificial intelligence and machine learning was going to transform the world.

"First we built machines that could do arithmetic, but the idea that you could have them do these more nuanced tasks that required what we view as humanlike understanding was this very exciting technological concept."

To further explore AI, Wang started out with a fairly small-stakes problem: knowing when to restock his fridge. That project eventually led to the creation of Scale AI. Obsessed with the grocery problem, Wang decided to build a camera inside his fridge to tell him if he was running low on milk. At this point, he realized that there was not enough data available so that he could train his system to quantify the contents of the fridge properly. He also noticed his peers weren’t building AI products despite their training because there was a lack of well-organized data available for them to develop models.

Wang extrapolated this problem out to the implications for AI in general, realizing that data would clearly be a meaningful hurdle. That’s where Scale AI was born. Wang identified that there was a hole in the market. In order to bridge the gap between human and machine-learning capabilities, there was a need for accurately labeled datasets that could train AI models.

During this time, Lucy Guo was at Carnegie Mellon, studying computer science and human-computer interactions. In her second year, she applied for the Thiel Fellowship. It awards $100K to help those motivated enough to build a business, on the condition that they drop out of school. So in 2014, her senior year, she dropped out.

After her initial company failed due to legal issues surrounding food delivery from non-commercial kitchens, she interned at Facebook and was a product designer at Quora and Snapchat. The two met at Quora and Scale AI was accepted into Y-Combinator, shortly thereafter raising a $120K seed round. After exploring several aspects of the infrastructure needed for AI, the team narrowed their efforts down to autonomous vehicles. Self-driving cars needed humans to label the images so that the AI the cars used could be trained on those labeled images.

As the company got started, the team attended Computer Vision and Pattern Recognition, an AI conference. There, according to Wang, they went “booth to booth with a laptop with a demo on it.” As the company grew it became more applicable to other industries related to AI including satellite imagery, ecommerce, and others.

By 2018, the company had grown significantly, and both Wang and Guo were named to the Forbes 30 Under 30 list. At this time, Guo left Scale AI to start a venture capital firm called Backend Capital. According to Guo, the separation was due to a “division in culture and ambition alignment".

Product

To get a foundational understanding of Scale AI, it's important to understand the lifecycle of building a machine-learning model for any given industry vertical. That process begins with data and its sources before moving to data engineering, which is a component of data science.

Source: Andy Scherpenberg

Scale AI’s core value proposition is built around the data engineering component of this lifecycle. Specifically, Scale AI helps companies with data annotation and labeling of “ground truth” data. This ground truth data refers to correctly labeling data in an expected format, such as tagging a picture of a cat as a “cat” or assisting in differentiating a dog from a cat in an image.

As of September 2024, Scale segments its products into three sections, Build AI, Apply AI, and Evaluate AI, each of which with its own set of capabilities. Under Build AI is the Scale Data Engine, which is further segmented into three use cases: generative AI, government, and automotive. Apply AI segments into two products, Scale Donovan and the Scale GenAI platform. Finally, under Evaluate AI there is Scale Evaluation, which also has three use cases: model development, public sector use cases, and enterprise use cases.

Scale AI

Tags

Reading Time

Reading Time

Thesis

Founding Story

Product

Scale Data Engine

Generative AI Platform

Scale Donovan

Scale Evaluation

Market

Customer

Market Size

Competition

Data Collection & Labeling Market

ML SaaS Market

Business Model

Traction

Valuation

Key Opportunities

Data Labeling for Specific Industries

Partnerships

Geographical Expansion

Key Risks

Regulatory Exposure

Competition

Risk Margin

Summary

dbt Labs

Databricks

Applied Intuition

Pinecone

Cohere

Nomic

DataRobot

Hive

Snorkel AI