Pratyush Kumar has a habit of working on problems before the world is ready to care about them. When he and Vivek Raghavan started AI4Bharat in 2019, the idea of building AI specifically for Indian languages was considered niche at best, quixotic at worst. The dominant view in the AI research community was that multilingual models trained on large English corpora would eventually handle other languages well enough. Kumar disagreed, and the subsequent five years have proven him right in ways that have reshaped the Indian AI landscape.
The AI4Bharat Years
AI4Bharat began as an academic initiative at IIT Madras, where Kumar was a faculty member. The project's initial goal was modest: build open-source datasets and tools for Indian language AI that researchers could use as a foundation for their work. What it became was something much larger — a community of researchers, engineers, and language experts who collectively built the most comprehensive open-source Indic AI resource in the world.
The work was painstaking and unglamorous. Collecting and cleaning text data in 22 languages. Recording speech samples from speakers of dozens of dialects. Building evaluation benchmarks that could measure AI performance on Indian language tasks. "We spent years building the foundation that nobody else wanted to build," Kumar says. "The exciting work — the model training, the product development — depends entirely on having good data and good evaluation tools. Without that foundation, you are building on sand."
The Decision to Start Sarvam
The release of ChatGPT in late 2022 was a catalyst. The product demonstrated, more clearly than any research paper, that large language models could be genuinely useful to ordinary people. It also demonstrated that the models were not genuinely useful to ordinary Indians — the Hindi responses were stilted, the cultural knowledge was thin, and the voice interface did not work for Indian accents. "We had spent years building the research foundation," Kumar says. "We had the data, we had the models, we had the evaluation tools. What we did not have was the product. Starting Sarvam was about closing that gap."
Building the Team
One of the most important decisions Kumar and Raghavan made in Sarvam's early days was to build a team that combined research depth with product experience. The AI4Bharat network provided the research talent. But building a product that millions of people would actually use required engineers who had built products at scale, product managers who understood Indian user behaviour, and business development professionals who knew how to sell to Indian enterprises.
The OpenHathi Release
The decision to open-source OpenHathi was not universally popular within the company. Some team members argued that the model represented years of work and significant competitive advantage. Kumar's counter-argument was strategic. "The Indian AI ecosystem needs to grow for Sarvam to succeed. If we open-source, we accelerate the ecosystem, and we benefit from the community contributions and the goodwill that generates." Within weeks of the release, hundreds of developers had downloaded the model and begun building applications. Several of these applications became Sarvam enterprise customers.
What Comes Next
Kumar is clear about the direction. "The next frontier is multimodal AI — systems that can understand and generate not just text but speech, images, and video in Indian languages." He is also clear about what he sees as the most important long-term challenge: building AI that is not just linguistically Indian but culturally Indian. "Language is the surface. Underneath language is culture — the values, the assumptions, the ways of thinking that shape how people communicate. Building AI that genuinely understands Indian culture is a much harder problem than building AI that speaks Indian languages. But it is the problem that matters most."