
For far too long, businesses have grappled with the limitations of traditional customer service and internal communication. Think about your last call to book an appointment or track a delivery. You navigate frustrating IVR menus, wait on hold, and finally, when you reach a human, you might find yourself repeating information you’ve already provided. This is inefficient and a drain on both the customer’s patience and the company’s resources.
Cliqq AI, a Voice AI company co-founded by Vikramaditya (Vikram) Shekhar and Kowshik Chilamkurthy, is building a product that helps businesses answer their clients instantly, resolve queries without a hitch, and handle routine tasks with the warmth and efficiency of a human, but with the tireless precision of a machine.
Their mission? To build 10X voice agents that fundamentally change how work gets done, empowering field and factory operations, delivering personalization at scale, and collapsing multiple workflows into a single, effortless call.
Vikramaditya Shekhar, co-founder of Cliqq AI, vividly paints this picture.
He says, "There are a lot of transactional calls that happen in business. You’re booking an appointment, you’re trying to resolve a quick query. Now that is you talking, just for getting something done, right? You’re not really talking to build a relationship or get insights. These are the lowest hanging fruits for voice agentic systems, tasks that are repetitive, predictable, and don't require the nuanced empathy of a human.”
Vikram’s Journey From Consulting to Building Cutting-Edge AI
Born and raised in Bokaro, a steel city in Jharkhand, Vikram’s early focus on education led him to pursue computer science engineering at BIT, Mesra. After a stint at Goldman Sachs, where he developed an interest in finance, Vikram pursued an MBA at IIM-A.
This led him to a career in consulting, first with a US-based financial services firm, then with Booz & Company, and finally Bain. Across these roles, he gained invaluable global exposure and worked extensively with tech companies, including IT services and SaaS firms.
It was during his final projects at Bain in 2023 that the transformative power of Generative AI truly captivated him. He realized the best way to immerse himself in this burgeoning field was to dive in, working with “unreal use cases.” This led him to a period of AI consulting, focusing on the practicalities of moving AI from proof-of-concept to deployment. It was a challenge that was just beginning to be overcome.
The pivot from consulting to product building was a natural evolution.
Vikram says, "Consulting is itself services-oriented. You do want to productize stuff and really build something that is world-class and can stand on its own by just focusing on that specific product."
The Voice AI Revolution
Why now? Why is 2025 the year of voice AI, after years of clunky chatbots and robotic voices? Vikram points to a confluence of technological breakthroughs that have transformed the landscape in just the past six to eight months.
"Last six months, I think, have been amazing for voice AI," says Vikram.
He highlights three critical advancements:
1. Speech-to-Text (STT) Accuracy: The ability of AI to accurately recognize spoken words has skyrocketed. “The word error rate in the state of the art case has moved from like maybe 90 to 97-98%,” he explains. This means AI can now understand human speech with minimal mistakes, eliminating the frustrating need for repetition that plagued earlier systems.
2. Powerful, Smaller LLMs: Large Language Models (LLMs) are not only becoming more sophisticated but also more efficient. Vikram cites examples like GPT-4o mini, which, despite its smaller size, is a “workhorse model.” This means that the “brain” of the voice agent can process information faster and more intelligently, leading to more natural and coherent conversations.
3. Natural-Sounding Text-to-Speech (TTS): The robotic, monotone voices of the past are rapidly becoming obsolete. “TTS has become so natural now,” Vikram enthuses. This is crucial because the voice is the first point of interaction, and a natural-sounding agent significantly enhances user experience. The progress is evident even in Indic languages, where high-quality, natural-sounding options are now abundant.
“The market is less than 0.7% insured with any business that has bought at least one business insurance policy. This means somebody who has purchased insurance to protect their business, whether it’s their stock or any sort of general or public liability policy.”
These improvements in the core components of voice AI (STT, LLMs, and TTS) have had a profound impact. As Vikram observes, “Because this whole naturalness improves, the agent starts sounding intelligent, I think we are seeing longer and longer conversations possible now.”
This increased engagement means voice AI can handle a wider range of tasks, moving beyond simple, short interactions to more complex and extended dialogues.
Beyond these core advancements, innovations in noise cancellation, turn-taking, and graceful interruption management further enhance the naturalness of these interactions. The entire voice AI pipeline is evolving at an unprecedented pace, making it an incredibly exciting time for companies like Cliqq AI, who act as an “orchestration layer,” benefiting from every state-of-the-art improvement.
Cliqq AI's ICP Sweet Spot
So, who stands to benefit most from Cliqq AI’s transformative technology? Vikram outlines their ideal customer profile are “Medium sized businesses, which probably have somewhere around five to maybe 50 staff in either call centres or internal staff that are calling.”
This “sweet spot” is where Cliqq AI can deliver maximum impact.
Why this specific range? Vikram explains that businesses in this segment are large enough to realize significant benefits from automation but also agile enough to adopt new technologies quickly. Beyond 50 employees, the complexities of larger organizations, such as multilingual requirements, extensive legacy system integrations, and longer budget and planning cycles, tend to slow down adoption.
While Cliqq AI can work with larger enterprises on specific workflows, their greatest impact and fastest traction are found within the 5-50 employee range.
Cliqq AI operates across geographies, with core markets in the US and India, and early POCs in the UAE and parts of Southeast Asia. Their offerings primarily fall into two categories: Voice AI over Telephony and Interactive Voice Experiences.
1. Voice AI over Telephony (80%):
This is the core of their business, enabling businesses to automate call operations. Use cases include:
- Customer Support: Handling simple queries, providing information, and routing complex issues to human agents.
- Field Sales: Qualifying leads, setting appointments, and conducting initial verifications.
- Collections: Sending payment reminders and facilitating payment arrangements.
- Coaching: Providing automated feedback and training for internal teams.
2. Interactive Voice Experiences (20%):
This caters to more autonomous and longer-form interactive discussions, particularly in recruitment. Cliqq AI offers AI interviews for both students and companies, serving as a first-round screening tool.
These interviews can involve dynamic questioning based on candidate responses, including technical elements like whiteboard discussions or coding challenges.
Vikram emphasizes that the goal is not to eliminate human jobs entirely but to augment them.
"The idea is never to probably eliminate everyone. Tthere will always be humans for some very specific cases, complex use cases that won't be easily sorted by AI. Plus there are cases that really need empathy, which AI cannot deliver to you."
Behind the Scenes of Cliqq AI’s Orchestration Layer & Intelligent Agents
1. The Orchestration Module (80-90% of effort):
This is the engine that seamlessly integrates all the disparate components of a voice AI pipeline. Imagine an intricate dance between various AI models:
- STT (Speech-to-Text): Converts spoken words into text.
- LLM (Large Language Model): Processes the text, understands the intent, and generates a response.
- TTS (Text-to-Speech): Converts the LLM’s text response back into natural-sounding audio.
- Supporting Models: Additional ML models handle crucial elements like turn-taking, interruption management, and noise cancellation.
The challenge lies in making these six distinct models, each with its own code and data handling mechanisms, “behave nicely with each other.” Cliqq AI’s orchestration layer provides the “model swapability” to easily integrate and switch between open-source and closed-source models for each component.
This flexibility allows them to balance critical parameters like cost, accuracy (or throughput), and latency, offering a bespoke optimization for each client’s unique needs.
2. Building Efficient Voice Agents:
This module focuses on the “brain” of the operation – the LLM. The goal here is to ensure the LLM is not only intelligent but also efficient, avoiding unnecessary processing time. This involves:
- Context Engineering: Providing the LLM with the right information and background to understand the conversation.
- Prompting Behavior: Crafting effective prompts to guide the LLM’s responses.
- Guardrails: Implementing mechanisms to ensure the agent stays on track and adheres to desired behavior.
Cliqq AI: From Discovery to Deployment
For a business considering Cliqq AI, the journey from initial interest to full deployment is a meticulously guided process.
1. ROI-First Assessment: The first and most crucial step is to determine the business case. “The first step would be to identify if you need it, and if you can make the business case,” Vikram emphasizes.
Cliqq AI works backward from the client’s budget and desired ROI, configuring the AI pipeline with the optimal balance of models (open-source for cost-efficiency, sophisticated for high-accuracy use cases) to meet specific needs.
2. Rapid Prototyping and Workshops: Once the business case is clear, Cliqq AI quickly moves to “live prototyping.” They take metadata from the client’s existing systems (e.g., CRM) and simulate a real-world context using synthetic data. This allows the client to “play around with it” for a few days, calling the prototype directly or experiencing inbound calls.
Following this, an in-depth workshop is conducted to gather feedback. Vikram highlights a key insight: “People miss out on edge cases. It might be so intuitive to them that they don’t document.”
These workshops are crucial for identifying unforeseen scenarios and fully mapping out workflows. Based on this, the agent is fine-tuned, and integration strategies (e.g., integrating with an Excel sheet for immediate impact or a full API integration for scalability) are discussed.
3. Iterative Testing with Live Calls: The next phase involves deploying the agent in a controlled environment for “at least 5,000 calls.” During this period, Cliqq AI employs a “conservative fallback,” where human agents are looped in if the AI encounters a challenging situation or seems to be going “down a sad path.”
Automated monitoring tracks KPIs such as resolution time, user frustration, and operational issues like latency.
“We would then listen to calls and then listen to the transcripts, and then kind of follow through and try and fix it,” Vikram explains. This iterative process, involving continuous tweaking of the agent during the 5,000 calls, aims to “iron off” repetitive issues and solve for a vast number of edge cases.
4. Full Deployment and Ongoing Support: Once the evaluation period is complete and the agent demonstrates consistent performance, Cliqq AI helps the client build a clear business case for management approval.
Then, the full deployment begins, with Cliqq AI assisting with all aspects, including integration with existing systems like CRMs. While clients’ IT and engineering teams are involved for authorization and security, Cliqq AI ensures a smooth and secure transition.
Should You Build or Buy a Voice AI Agent?
When every company today is urged to become a “tech company,” the build versus buy dilemma is ever-present. For voice AI, Vikram offers a compelling argument for buying from a specialized provider like Cliqq AI.
He says that if a company’s core revenue stream is contact center operations, then building an in-house voice AI solution might make sense. However, for the vast majority of businesses where voice interaction is a backend function, or a means to an end, buying is the more strategic choice.
"The moat of your business is more your operations, your brand, your people," Vikram explains, suggesting that focusing on these core strengths while leveraging external expertise for AI is the smarter play.
The rapid evolution of voice AI further strengthens the “buy” argument. Building an in-house solution means committing a dedicated team to continuous monitoring, drift detection, and integration of new models. It’s a commitment that often wanes after the initial development phase. Internal IT teams, after the initial project, often shift to a “maintenance mode,” hindering the continuous evolution and improvement of the AI.
By partnering with Cliqq AI, businesses gain access to a team exclusively focused on staying at the forefront of voice AI innovation. They benefit from continuous updates, optimizations, and the integration of the latest state-of-the-art models without having to dedicate their own engineering resources to a non-core function.
Furthermore, Cliqq AI provides a flexible alternative to highly verticalized solutions. While a vertical solution might offer deep expertise in a niche (e.g., banking collections), it often comes with an “opinionated view of the workflows.” For companies that prefer to embed their existing best practices and workflows, a horizontal platform like Cliqq AI offers greater adaptability and control.