Why are we starting a 'data' company

December 25, 2025

4 min read

by Satyam Singh & Sai Niranjan

non-technical
multiple-authors
company

Data systems live and die with the infrastructure that shapes them. Before analytics surface insights, before pipelines orchestrate flows, before an organization can trust a single result, the groundwork must be engineered with care. The software that define this foundation shape the entire experience of building, scaling, and understanding modern data.

Yet despite the critical role these systems play in our work, we’ve seen their quality decline a little more each year. Software services often lock users in, making it hard to switch or integrate with other tools. The idea that software should be flexible, easy to update, and reliable over time seems to be fading. It’s a strange reality for something we rely on so heavily.

Our first goal is simple: build infrastructure that earns trust. Tools that stay fast under pressure. Tools that help you understand your data, but stay out of your way. Tools that make debugging, auditing, and navigating massive pipelines feel less like wrestling with a black box and more like working with a clear, predictable partner. Anything less isn’t worth building.

The vision of making data systems transparent, navigable, and collaborative is a defining principle guiding our journey. Therefore, with this initial piece of writing, we aim to describe our goals in precise terms and share an anecdote.

Where are data systems right now ?

Looking back at the history of data systems, it’s striking how many ideas we still rely on were born decades ago. In 1970, Ted Codd introduced the relational model, an abstraction so clean and powerful it reshaped computing. Early pioneers built entire worlds, filesystems, storage engines, query planners, transactional semantics, without a cloud or playbook for scaling beyond a single machine. Out of those constraints came systems that felt almost impossibly durable.

Over the decades, new layers arrived: distributed databases, Hadoop, Spark, and then the SaaS explosion, APIs, hosted pipelines, managed workflows. These innovations made “big data” manageable, but also shifted the focus from craftsmanship to convenience. Systems grew larger, not always better, and layers of abstraction piled up faster than teams could understand them.

As the cloud matured, new challenges emerged. Data landed in half a dozen warehouses before anyone asked why. Tools that once felt liberating grew brittle, and debugging became an archaeological exercise. Infrastructure worked, but it didn’t earn trust, and it felt like everyone was quietly fighting the same battles.

Challenges

The decline of data software isn’t a mystery. It’s the result of an ecosystem pulled apart by fragmented infrastructure and warped by late-stage capital that demands infinite returns from companies still running on borrowed time. Two forces shape this landscape more than anything else: the incentive structures that push data vendors toward lock-in instead of craftsmanship, and the technical complexity that makes building reliable data systems harder than it should be. These pressures compound and that's why we are in the mess today. There are bright spots Apache DataFusion, for example, shows how an open, reusable query engine can create a virtuous ecosystem. But sustainable, economically viable open source in data is still an unsolved issue.

Our Approach

Our approach is to build a quality focused company that creates a network of experts that help other parties with their data challenges and build and maintain trusted open solutions that helps the wider ecosystem.

Our Commitments

We are committed to build modern software that is highly performant, secure and modular at the same time. We want to provide clean and powerful ways of handling your data and help bring out latest of research to reality.

What to expect from our Blogs

We will be uploading new technical blogs in the upcoming months. We’ll publish concise, engineering-driven posts on query engines, storage and metadata layers, and observability pipelines, with an emphasis on real-world performance, failure modes, and architecture trade-offs.