Making the case for custom LLMs and custom LLM deployments

In this article, I will cover two of the major debates in the NLP community and when working with clients:

Do you use an API of a proprietary company (e.g. OpenAI’s GPT-4) or do you use a custom / off-the-shelve open-source LLM?
If using an open source LLM, do you deploy your own nodes or do you rely on services such as Amazon Bedrock or Sagemaker?

Choose your side the debate: self-deployed open source LLMs or external proprietary APIs?

The answer to this question is not definitive and, naturally, it varies depending on the application. The fact that this field is constantly evolving makes it difficult to provide a straightforward response. While there are certainly advantages to utilizing external APIs (GPT-4’s capabilities are still the gold standard), there are also numerous pitfalls that must be taken into account. In many cases, opting for a custom and self-deployed LLM is the wisest and most cost-efficient choice.

Spoiler alert: I will obviously discuss how working with Datameister provides you with the best of both worlds.

Risk of API dependency

There are many possible applications for LLMs. Most of the chat-like, creative applications have taken most of the spotlight recently, but actually in the industry LLMs are mainly used in much more closed contexts.

Typically, clients want to generate text (reports, mailings, ...) based on structured data (scores, categorical data, ...).
Or the other way around, the LLM receives unstructured data (freeform text) and is asked to extract some specific information out of a text.
Or alternatively, they want to serve an end-user predefined knowledge from their resource center in a controlled manner.

Reproducibility is key in production environments. Basically, there are few cases where surprise outcomes are welcomed. The creative factor of generative AI has been the most attention-grabbing feature lately, but it is arguably not the most useful in most industries.

Picture the following. So you’ve made the investment to put in considerable engineering work on a specific OpenAI model. You’re app runs fine, the client is happy. You were able to be cost-efficient by using some of the older OpenAI models since they were fine for your use case. Life is good. You shut down shop for the weekend, and then this email appears:

OpenAI’s announcement in which they deprecate lange models that are only a few years

I don't think your client was counting on having to update models every two years. Wouldn't it be nice if you had a specific version of a specific LLM running? A version that you knew would never suddenly change? An LLM that you know what works and what doesn't work.

LLMs are hard to validate. In theory, changing models is easy, but in practice, each model behaves very uniquely. Changes models costs time and thus money.

At Datameister, we ensure the deployment and continuous operation of your LLM for as long as you need.

Building your own IP

If I had a penny for every startup idea in 2023 that was along the following lines, then, well, I would have a lot of pennies:

My app will be a chatbot front-end build on ChatGPT that takes in resources from the field X to provide users faster, easier and more human-readable info about X.

Now, OpenAI has released GPTs in which people can do exactly this without any code. Can you spot what the problem was in the above business plan? The main issue that the company did not hold the IP behind the core functionality of the product. The IP lays entirely in the data. In some cases, this will still be a valid plan, e.g. when you are the sole proprietor of uniquely copyrighted material, but in most cases it isn’t. OpenAI came in and destroyed 90% of the LLM startups in a single product release.

I haven’t come across many open-ended applications that are build on top of GPT-4 that delivered unique IP to a company. They mainly rely on the GPT-4 magic, which you do not own.

At Datameister, you own your LLM, we only deploy them for you.

One of the key advantages of deploying your own LLM is the freedom to customize the model according to your specific needs or preferences. Unlike external APIs, which may have limitations on customization options, having full control over the model allows you to tailor it precisely to fit your requirements. Of course you can fine-tune models on AWS or OpenAI. However, these are very costly operations and still provide you with a vendor lock-in. When you customize, you build intellectual property. Your IP is what sets you apart from your competition.

Managing your own LLM provides an opportunity for deeper understanding and learning within your team or organization. By taking ownership of the model's management and maintenance, you can gain valuable insights into how it works and potentially drive innovation in natural language processing within your company.

Cost Control

For large-scale or long-term use, deploying your own LLM can be more cost-effective than paying for API usage, and even when using platforms like AWS SageMaker. While there may be an initial setup cost involved, the savings can add up over time as you avoid recurring API fees. Note that, in general, the throughput and delay requirements are the two main cost drivers.

Inference

Most LLM applications do not have strict time constraints, allowing for various cost optimizations. At Datameister, we focus on two key drivers for reducing costs:

Scale-to-zero: Running an LLM continuously can be expensive. However, with our deployment platform, compute nodes are only active when necessary. While there may be a short delay when starting a node that was previously inactive, this is not an issue for non-time critical applications. In fact, it can result in up to 95% cost savings.
Spot instances: On platforms like AWS, spot instances offer lower costs but less predictability compared to on-demand instances which are more expensive but provide stability. By leveraging a robust job scheduling system like ours and not requiring real-time processing, you can take advantage of spot instances and save at least 50% on compute costs without worrying about their inherent unpredictability.

When you utilize SageMaker to deploy your own LLMs, you do not have the choice to use spot instances for inferencing (only for training). Instead, only the on-demand system is accessible, which leads to higher costs. Nevertheless, SageMaker does allow for serverless inferencing and off-line batch transforms. Although pricing remains relatively high.

Fine-tuning

Fine-tuning through external APIs such as OpenAI can be very expensive. Form our experience, fine-tuning a GPT3.5 on a few thousands (large) samples can rake up to hundreds of dollars. This does not allow you to do a lot of experimentation. Plus, even if you wanted to spent thousands of dollars on fine-tuning, you need to be in the right usage tier to be allowed to do so.

For the same cost-cutting reasons as in inferencing, it can be more cost efficient to deploy your own LLM if you need many iterations in your R&D phase. Furthermore, you do not have a lock-in to a specific vendor which could change pricing at will.

Deploying your own LLM at Datameister can offer significant cost savings and predictability compared to using external APIs or AWS SageMaker.

Full Control Over Data and Privacy

This is a harder topic to cover. OpenAI is now GDPR compliant and you can run the OpenAI API in Europe through Azure’s OpenAI. You can also run Claude, or Jurassic-21 models using Amazon Bedrock in your preferred region. Oh, wait, no you can’t. At time of writing, most models are not available in every region on AWS, but it probably won’t take much time before they are.

However, many clients still prefer to have the LLM running in either their own cluster or in the Datameister cluster. This provides them with greater control over their data, ensuring enhanced privacy and security. This is especially crucial when dealing with sensitive or proprietary information. By keeping the data in-house, clients can guarantee its protection and confidentiality. However, theoretically, this shouldn't be a problem if external APIs are used correctly within the designated regions and if all applicable regulations are followed.

Datameister: Tackling the Challenges Associated with Self-Deployed LLMs

While there are many benefits to deploying your own LLM, it's important to consider the challenges that come with it:

High initial setup costs and complexity: Setting up your own LLM requires a significant investment. You need the right infrastructure and resources to ensure smooth operation and optimal performance.
Scalability challenges: Managing scalability in-house can be challenging, especially if there are sudden spikes in demand for your LLM.
Limited knowledge, resources and support: Unlike established external APIs, deploying your own LLM may limit your access to support and resources. If you encounter unique challenges or bugs, you might have to rely on internal expertise or community forums for assistance.

And of course, there is the risk of obsolescence: NLP is rapidly evolving, with new advancements being made regularly. If you're deploying your own LLM, there's a risk that it might become outdated if you're unable to keep up with these advancements. However, as mentioned above, in many use cases it is actually preferred to have a system with known limitations, over a solution that is constantly evolving and needs to be constant validation. Furthermore, most of your power lays in your data. In the case of fine-tuning, you can fine-tune newer models with the same dataset.

While deploying your own LLM comes with its fair share of challenges, most of these are addressed when relying on Datameister.

Most challenges are offloaded to Datameister:

Initial Setup Cost: Datameister has already covered the initial setup cost by building on their Datameister LLM deployment platform.
Complexity: Datameister manages the complexity associated with setting up an LLM.
Scalability: With Datameister's solution, scaling is taken care of by automatically spinning up instances as needed, reducing the challenges of managing scalability in-house. Scaling to zero machines is one of our unique offerings, this means there are no machines running when no workload is present. No machines = no costs.
Support: Datameister offers experienced support to help you overcome any unique challenges or bugs you may encounter.
Risk of Obsolescence: By relying on Datameister's expertise and continuous updates, you can mitigate the risk of your LLM becoming outdated.

Conclusion

In conclusion, here are the key points to consider when deciding between using an external proprietary API or deploying your own LLM with Datameister:

Opting for a custom and self-deployed LLM provides greater control, ownership, and customization options.
With Datameister, you can ensure the deployment and continuous operation of your LLM for as long as you need.
Building your own IP is crucial in most cases, as relying solely on external APIs may result in a lack of ownership over the core functionality of your product.
Deploying your own LLM allows for cost control and potential savings compared to recurring API fees or platforms like AWS SageMaker.
Datameister offers cost-cutting measures such as scale-to-zero instances and leveraging spot instances for inferencing.
You have full control over data privacy when running your LLM in either your own cluster or the Datameister cluster.
While deploying your own LLM comes with challenges, many of these are addressed by relying on Datameister's expertise and support.