April 12, 2024

Cloudflare challenges AWS by bringing serverless AI to the edge

Cloudflare, the leading connectivity cloud company, recently announced the general availability of its Workers AI platform, as well as several new capabilities aimed at simplifying the way developers build and deploy AI applications. This announcement represents a significant step forward in Cloudflare’s efforts to democratize AI and make it more accessible to developers around the world.

After months of open beta, Cloudflare’s Workers AI platform has now reached general availability status. This means that the service has undergone rigorous testing and improvements to ensure greater reliability and performance.

Cloudflare’s Workers AI is an inference platform that allows developers to run machine learning models on Cloudflare’s global network with just a few lines of code. It provides a serverless and scalable solution for GPU-accelerated AI inference, allowing developers to use pre-trained models for tasks such as text generation, image recognition, and speech recognition without the need to manage infrastructure or GPUs.

With Workers AI, developers can now run machine learning models on Cloudflare’s global network, leveraging the company’s distributed infrastructure to provide low-latency inference capabilities.

Cloudflare currently has GPUs operational in more than 150 of its data center locations, with plans to expand to nearly all of its 300+ data centers worldwide by the end of 2024.

Cloudflare is expanding its partnership with Hugging Face, now offering a curated list of popular open source models ideal for serverless GPU inference across their extensive global network. Developers can deploy models from Hugging Face with one click. This partnership makes Cloudflare one of the few to offer serverless GPU inference for Hugging Face models.

Currently, there are 14 composite Hugging Face models optimized for Cloudflare’s serverless inference platform, supporting tasks such as text generation, embeddings, and sentence similarity. Developers can simply choose a model from Hugging Face, click “Deploy to Cloudflare Workers AI,” and immediately distribute it across Cloudflare’s global network of more than 150 cities with deployed GPUs.

Developers can communicate with LLMs such as Mistral, Llama 2 and others via a simple REST API. They can also use advanced techniques, such as retrieval-augmented generation, to create domain-specific chatbots that access contextual data.

One of the key benefits of Workers AI is its serverless nature, which allows developers to pay only for the resources they consume, without the need to manage or scale GPUs or infrastructure. This pay-as-you-go model makes AI inference more affordable and accessible, especially for smaller organizations and startups.

As part of the GA release, Cloudflare has introduced several performance and reliability improvements to the Workers AI. Load balancing capabilities have been upgraded, allowing requests to be routed to more GPUs in Cloudflare’s global network. This ensures that if a request needs to be queued in one location, it can be seamlessly routed to another city, reducing latency and improving overall performance.

Additionally, Cloudflare has increased the rate limits for most major language models to 300 requests per minute, up from 50 requests per minute during the beta phase. Smaller models now have rate limits ranging from 1,500 to 3,000 requests per minute, further improving the platform’s scalability and responsiveness.

One of the most requested features for Workers AI is the ability to perform sophisticated inferences. Cloudflare has taken a step in this direction by enabling Bring Your Own Low-Rank Adaptation. This BYO LoRA technique allows developers to tailor a subset of a model’s parameters to a specific task, rather than rewriting all parameters as in a fully refined model.

Cloudflare’s support for custom LoRA weights and adapters enables efficient multi-tenancy in model hosting, allowing customers to deploy and access sophisticated models based on their custom datasets.

While there are currently some limitations, such as not supporting quantized LoRA models and limitations on adapter size and rank, Cloudflare plans to further expand its refinement capabilities and eventually run refinement jobs and fully refined models directly on the Workers AI to support. platform.

Cloudflare also offers an AI Gateway, a powerful platform that acts as a control plane for managing and governing the use of AI models and services in an organization.

It sits between applications and AI providers such as OpenAI, Hugging Face and Replicate, allowing developers to connect their applications to these providers with just one line of code change.

Cloudflare AI Gateway serves as a management and governance control plane for AI models and service usage across enterprises. It acts as a conduit between the model providers and organizations, providing developers with a streamlined method to connect their applications to these services with minimal code changes.

This gateway provides centralized control, enabling a single interface for various AI services, simplifying integration and improving the organization’s use of AI capabilities. It features observability through extensive analytics and monitoring, ensuring application performance and usage transparency. It addresses critical security and governance aspects by enabling policy enforcement and access control.

Finally, Cloudflare has added Python support to Workers, its serverless platform for deploying web functions and applications. Since its inception, Workers has only supported JavaScript as a language for writing edge-running functions. With the addition of Python, Cloudflare is now targeting the large community of Python developers, allowing them to leverage the power of Cloudflare’s global network in their applications.

Cloudflare challenges AWS by continuously improving the capabilities of its edge network. Amazon’s serverless platform, AWS Lambda, has yet to support GPU-based model inference, while its load balancers and API gateway have not been updated for AI inference endpoints. Interestingly, Cloudflare’s AI Gateway offers built-in support for Amazon Bedrock API endpoints, giving developers a consistent interface.

As Cloudflare expands the availability of GPU nodes across multiple points of presence, developers now have access to state-of-the-art AI models with low latency and the best price/performance ratio. The AI ​​Gateway brings proven API management and control to manage AI endpoints offered by various providers.

Leave a Reply

Your email address will not be published. Required fields are marked *