π Explore this must-read post from TechCrunch π
π Category: AI,artificial intelligence,Fyxer,training data,vision model
β Main takeaway:
For one week this summer, Taylor and her roommate wore GoPro cameras strapped to their foreheads while drawing, sculpting and doing housework. They were training an AI vision model, carefully synchronizing their shots so the system could get multiple angles of the same behavior. It was difficult work in many ways, but they were paid well for it, and it allowed Taylor to spend most of her day making art.
βWe woke up, did our usual routine, then strapped cameras to our heads and synced the times together,β she told me. “Then we make breakfast and clean the dishes. Then we go our separate ways and work on art.”
They were set to produce five hours of simultaneous footage each day, but Taylor soon realized that she needed to devote seven hours a day to work, to leave enough time for rest periods and physical recovery.
βIt’ll give you a headache,β she said. βTake it off and there’s only a red square on your forehead.β
Taylor, who asked that her last name not be used, was working as a freelance data recruiter at The Turing Company, an artificial intelligence company that connected her with TechCrunch. Turing’s goal was not to teach AI how to paint oil paintings, but rather to acquire more abstract skills around sequential problem solving and visual thinking. Unlike the large language model, Turing’s vision model will be trained entirely on video, most of which will be collected directly by Turing.
Along with artists like Taylor, Turing contracts with chefs, construction workers, electricians β anyone who works with their hands. Sudarshan Sivaraman, chief AGI officer at Turing, told TechCrunch that manual collection is the only way to get a sufficiently diverse data set.
βWe do this for many different types of work, so we have a variety of data in the pre-training phase,β Sivaraman told TechCrunch. βAfter we capture all this information, the models will be able to understand how to perform a particular task.β
TechCrunch event
San Francisco
|
October 27-29, 2025
Turing’s work on vision models is part of a growing shift in how AI companies handle data. Where once training sets were freely scraped from the web or collected from low-paid commentators, companies now pay big bucks for carefully curated data.
With the raw power of AI already in place, companies are looking to ownership of training data as a competitive advantage. Instead of outsourcing the task to contractors, they often handle the work themselves.
An example is email company Fyxer, which uses artificial intelligence models to sort emails and craft responses.
After some early experimentation, founder Richard Hollingsworth discovered that the best approach was to use a set of small models with tightly focused training data. Unlike Turing, Weixer builds someone else’s basic model, but the basic vision is the same.
βWe realized that data quality, not quantity, really determines performance,β Hollingsworth told me.
In practice, this has meant some unconventional employee choices. In the early days, Fyxer’s engineers and managers sometimes outnumbered the executive assistants needed to train the model four to one, Hollingsworth says.
βWe used a lot of experienced executive assistants because we needed training on the basics of whether or not to respond to email,β he told TechCrunch. βIt’s a very people-oriented problem. Finding great people is very difficult.β
The pace of data collection never slowed, but over time, Hollingsworth became more interested in the data sets, preferring smaller sets of more structured data sets when it came time to post-training. As he puts it, βData quality, not quantity, really determines performance.β
This is especially true when using synthetic data, which magnifies the range of possible training scenarios and the impact of any imperfections in the original data set. On the visual side, Turing estimates that 75% to 80% of his data is synthetic, derived from original GoPro videos. But this makes it even more important to maintain the quality of the original dataset as much as possible.
βIf the pre-training data itself is not of good quality, then everything you do with the synthetic data will not be of good quality either,β says Sivaraman.
Beyond quality concerns, there is a strong competitive logic behind keeping data collection in-house. For Fyxer, working hard at collecting data is one of the best moats the company has against the competition. As Hollingsworth sees it, anyone can build an open source model into their product β but not everyone can find expert tutorials to train it on a working product.
βWe think the best way to do this is through data, through building custom models, through high-quality human-led data training,β he told TechCrunch.
Correction: An earlier version of this piece referred to Turing by an incorrect name. TechCrunch regrets this error.
π₯ Tell us your thoughts in comments!
#οΈβ£ #startups #data #hands
π Posted on 1760658271
