We all know how important data quality is. That’s where the timeless phrase, garbage in/garbage out came from. As the pace of data creation accelerates, and it gets used in a variety of ways across a company to make business decisions, deliver quality customer experiences and drive machine learning models, making sure the data is accurate has become ever more crucial.
Lightup wants to put data quality in the spotlight, and today the company announced a $9 million Series A investment to continue building their data quality solution. The company has now raised over $20 million, according to Crunchbase data.
Lightup builds on the work that CEO and co-founder Manu Bansal began with his first startup, Uhana, which was acquired by VMware in 2019, shortly before he launched his latest venture. He said companies building data pipelines didn’t have much choice when it came to data quality beyond building it themselves, and he saw his next opportunity.
“We were building data pipelines with Kafka and Spark and whatnot, and all the tools that we could find to solve the data quality problem were built for spreadsheets worth of data, which were being processed interactively, as opposed to millions of events per second moving through an automated pipeline,” Bansal told TechCrunch.
As he and his co-founders looked for a solution and spoke to other companies, they realized they had stumbled onto an unmet need and decided to build it. “We looked around for solutions and all we heard was that people were just building their own ad hoc solutions when they could afford to do it. And others were just running on prayers. And we said, ‘yeah, this needs solving’”.
He said their secret sauce is leaving the data in place, regardless of whether it’s stored in Snowflake, Databricks or another data storage solution. This reduces the overhead of checking the data pretty dramatically because they’re not making a copy of the data, and they’re taking advantage of resources that are already in place from the service provider.
“We leverage the compute fabric of those scalable data warehouses and data lakehouses instead of moving data. So the big difference for us is that traditional systems would take out data from where it lives, and we leave it in place,” Bansal said.
They then look for anomalies in the data that would suggest a problem, create a report and deliver it to a human to make the final data quality decisions. They don’t have write access to the data, and that’s by design.
The company launched in 2019 and it took a couple of years to build the solution. Today, the startup has 20 employees with plans to hire more with the new money. He believes that being distributed is helping him build a more diverse workforce.
“We love that because we’re distributed, we’re seeing people from different time zones, different cultures, and different ways of approaching problems. And if you look around the team right now, you will find some very impressive diversity already,” he said.
The $9 million investment was led by Andreessen Horowitz and Newland Ventures with participation from Spectrum 28 Capital, Shasta Ventures, Vela Partners and Incubate Fund. The deal closed at the end of March.