SenseTime Releases AI Model That Needs 10 Times Less Data to Train

SenseTime has released an open-source AI model called NEO that requires 90% less training data than competing systems. The model comes in two sizes and could make it easier for smaller organizations t

Martin Holloway·Published 2w ago·4 min read·Based on 8 sources

Reading level

SenseTime Releases AI Model That Needs 10 Times Less Data to Train

SenseTime, a Chinese AI company, has released and opened up the source code for a new model called NEO. The company created it with help from researchers at Nanyang Technological University. The unusual thing about NEO is that it can learn from 390 million pairs of images and text — which is roughly one-tenth what similar AI models normally need.

Think of training an AI model like teaching a child to recognize animals. Normally you'd show them thousands of pictures of cats, dogs, and birds along with labels. NEO learns to do the same job with about one-tenth as many labeled pictures.

The company has released two versions of NEO: a smaller one (2 billion parameters) and a larger one (9 billion parameters). The smaller version works on basic computers; the larger one is for companies that need better performance.

Why This Matters

One of the biggest challenges in AI development right now is that training these systems eats up enormous amounts of data and computing power. Getting billions of image-text pairs, checking they match correctly, and processing them all costs money and time. Many organizations don't have access to that kind of data or computing resources.

NEO's reduced data requirements could change that. Smaller companies and organizations with limited budgets could potentially train or customize their own AI models without needing as much data.

SenseTime's Bigger Picture

SenseTime is a major AI player in China. According to 2024 market data, the company holds about 12% of China's large model market and ranks third in the country. Its ModelStudio platform came in second place in a recent industry ranking for AI services in the first half of 2024.

The company has been active lately. In April 2024, its stock price jumped 36% after it unveiled a new model called SenseNova 5.0. That same month, SenseTime raised billions in funding specifically to expand its AI computing infrastructure.

Two Types of Models, Two Uses

The smaller 2-billion-parameter version of NEO is designed for devices with limited computing power — think smartphones, edge devices, or places where you can't rely on powerful servers. The 9-billion-parameter version targets businesses that need higher performance and can spare the computing resources.

Both versions use the same underlying design principles that make the reduced data requirements possible. SenseTime has not yet detailed how it plans to integrate NEO into its existing products like SenseChat (a conversational AI) or its image generation tools.

What This Means Going Forward

The broader context here is that training giant AI models on bigger and bigger datasets raises real questions about cost, availability, and how sustainable this approach is long-term. We have seen this kind of shift before — when earlier breakthroughs showed that smarter model design could get better results without just throwing more data at the problem. NEO's approach suggests there may be other ways to improve AI performance besides simply collecting more training data.

By releasing NEO openly, SenseTime is making it available for anyone to use and study, not just for the company's own products. This is partly a research contribution and partly a business move. In China's competitive AI market, where companies like Baidu, Alibaba, and ByteDance are all building their own large language models, standing out matters. A model that needs less data to work well could appeal to companies operating in regions where data is harder to come by or where regulations limit what data they can use.

For businesses considering whether to build their own AI systems, NEO's lower data needs could make custom AI projects more feasible. A hospital, factory, or insurance company might be able to train a model tailored to their specific work without needing to amass billions of labeled examples first.

What Organizations Should Know

The 390 million image-text pairs is still a substantial amount of data. Companies interested in using NEO will need decent data pipelines — systems to organize images, match them with accurate descriptions, and check the quality of that matching. The smaller 2B model and larger 9B model will perform differently depending on what job you're asking them to do, so real-world testing would be necessary.

If NEO's approach holds up in the real world — across different industries and uses — other AI teams and companies may adopt the same techniques. That could shift how the whole field approaches building these systems.

SenseTime Releases AI Model That Needs 10 Times Less Data to Train

SenseTime Releases AI Model That Needs 10 Times Less Data to Train

Why This Matters

SenseTime's Bigger Picture

Two Types of Models, Two Uses

What This Means Going Forward

What Organizations Should Know

Related Articles

Sony's Table Tennis Robot Shows a New Way for Machines to See and Learn

OpenAI Now Works With Amazon and Google Cloud, Ending Microsoft's Exclusive Deal

Microsoft and OpenAI Are Rethinking Their Partnership as AI Gets More Powerful