IT giant Cloudflare has announced that developers can now build full-stack AI applications on Cloudflare’s network, Workers AI. The developer platform will provide affordable inference without the need to manage infrastructure. From startups to enterprises, businesses are looking to augment their services with AI. The platform is for developers to ship a production-ready application with use cases including LLM, speech-to-text, image classification, sentiment analysis and more.
Matthew Prince, CEO and co-founder of Cloudflare said, “Workers AI will empower developers to build production-ready AI experiences efficiently and affordably, in days, instead of what typically takes entire teams weeks or even months.”
The IT leader has also announced collaborations with Databricks, the data and AI company, Microsoft, Hugging Face, and Nvidia. Through these significant partnerships, Cloudflare will provide access to GPUs running on Cloudflare’s hyper-distributed edge network for a low-latency end-user experience. The company’s privacy-first approach will help users’ data used for training. Cloudflare currently supports a model catalogue to help developers get started quickly.
The company also introduced Vectorize, a vector database for developers to build full-stack AI applications entirely on Cloudflare. With Workers AI and Vectorize, developers no longer have to glue multiple pieces together to empower their apps with AI and machine learning – they can do it all on one platform.
Benefiting from Cloudflare’s global network the database allows vector queries to happen closer to users, reducing latency and overall inference time. It also integrates with the wider AI ecosystem, allowing developers to store embeddings generated with OpenAI and Cohere. .
Cloudflare is also introducing AI Gateway for developers, and c-suite leaders to understand how money is being spent across AI infrastructure, or how many and from where queries are happening. AI Gateway’s observability features will help them understand AI traffic like the number of requests, number of users, cost of running the app, and duration of requests. Additionally, developers can manage costs with caching and rate limiting, giving them more control over how they scale their applications.