VP Engineering

Minimizing LLM latency in code generation

Ofer LaOr — Thu, 01 Aug 2024 09:05:33 +0000

Reading Time: 2 minutes

Optimizing Frontier’s Code Generation for Speed and Quality

Introduction

Creating Frontier, our generative front-end coding assistant, posed a significant challenge. Developers demand both fast response times and high-quality code from AI code generators. This dual requirement necessitates using the “smartest” language models (LLMs), which are often slower. While GPT-4 turbo is faster than GPT-4, it doesn’t meet our specific needs for generating TypeScript and JavaScript code snippets.

Challenges

Balancing Speed and Intelligence:
- Developers expect rapid responses, but achieving high-quality code requires more advanced LLMs, typically slower in processing.
Code Isolation and Assembly:
- We need to generate numerous code snippets while keeping them isolated. This helps us identify each snippet’s purpose and manage their imports and integration.
Browser Limitations:
- Operating from a browser environment introduces challenges in parallelizing network requests, as Chromium browsers restrict the number of concurrent fetches.

Solutions

To address these challenges, we implemented a batching system and optimized LLM latency. Here’s how:

Batching System

Request Collection:
- We gather as many snippet requests as possible and batch them together.
Microservice Architecture:
- These batches are sent to a microservice that authenticates and isolates the front-end code from the LLM, ensuring secure and efficient processing.
Parallel Request Handling:
- The microservice disassembles the batch into individual requests, processes them through our regular Retrieval-Augmented Generation (RAG), multi-shot, and prompt template mechanisms, and issues them in parallel to the LLM.
Validation and Retries:
- Each response is analyzed and validated via a guardrail system. If a response is invalid or absent, the LLM is prompted again. Unsuccessful requests are retried, and valid snippets are eventually batched and returned to the front end.

Micro-Caching

We implemented micro-caching to enhance efficiency further. By hashing each request and storing responses, we can quickly reference and reuse previously generated snippets or batches. This reduces the load on the LLM and speeds up response times.

Conclusion

The impact of parallelization and micro-caching is substantial, allowing us to use a more intelligent LLM without sacrificing performance. Despite slower individual response times, the combination of smart batching and caching compensates for this, delivering high-quality, rapid code generation.

The post Minimizing LLM latency in code generation appeared first on Anima Blog.

Guard rails for LLMs

Ofer LaOr — Thu, 04 Jul 2024 15:23:43 +0000

Reading Time: 3 minutes

Implementing Guard Rails for LLMs

Large Language Models (LLMs) have made a profound leap over the last few years, and with each iteration, companies like OpenAI, Meta, Anthropic and Mistral have been leapfrogging one another in general usability and, more recently, with the ability of these AI models to produce useful code. One of the critical challenges in using LLMs, is ensuring the output is reliable and functional. This is where guard rails for LLM become crucial.

Challenges in Code Generation with LLMs

However, as they are trained on a wide variety of code techniques, libraries and frameworks, trying to get them to produce a unique piece of code that would run as expected is still quite hard. Our first attempt at this was with our Anima Figma plugin, which has multiple AI features. In some cases, we intended to expand our ability to address new language variations and new styling mechanisms without having to create inefficient heuristic conversions that would be simply unscalable. Additionally, we wanted users to personalize the code we produce and have the capability of adding state, logic and more capabilities to the code that we produce from Figma designs. This proved much more difficult than originally anticipated. LLMs hallucinate, a lot.

Fine-tuning helps, but only to some degree – it reinforces languages, frameworks, and techniques that the LLM is already familiar with, but that doesn’t mean that the LLM won’t suddenly turn “lazy” (putting comments with /* todo */ instructions rather than implementing or even repeating the code that we wanted to mutate or augment). It’s also difficult to avoid just plain hallucinations where the LLM invents its own instructions and alters the developer’s original intent.

But as the industry progresses, LLM laziness goes up and down and we can use techniques like multishot and emotional blackmail to ensure that the LLM sticks to the original plan. But in our case, we are measured by how well the code we produce is usable and visually represents the original design. We had to create a build tool that evaluated the differences and fed any build and visual errors back to the LLM. If the LLM hallucinates a file or instructions, the build process catches it and the error is fed back to the LLM to correct, just like a normal loop” that a human developer would implement. By setting this as a target, we could also measure how well we optimized our prompt engineering, Retrieval-Augmented Generation (RAG) operations and which model is ideally suited for each task.

Strategies for Implementing Guard Rails

This problem arose again when we approached our newest offering: Frontier, the VSCode Extension which utilizes your design system and code components when it converts Figma designs to code.

In this case, a single code segment could have multiple code implementations that could take in additional code sections as child components or props, yielding the need for much tighter guardrails for the LLM. Not only do we need to use all the previous tools, but we also need to validate the results it produced are valid code. This needed to happen very quickly, which meant that a “self-healing” approach wouldn’t work. Instead, we are able to identify props and values using the existing codebase, combined with parsing the Typescript of the generated code to ensure that it makes sense and is valid code against the code component that we have chosen to embed in a particular area in the code base. Interestingly, despite the LLMs generating very small function call and getting a fair amount of context and multi-shot examples, they do hallucinate more often than expected. Fine-tuning might help with that, but we assumed that this is an inherent piece of the technology and requires tight guardrails.

That means that for each reply from the LLM we first validate that it’s a valid response, and if it is invalid we will explain to the LLM what’s wrong with it and ask it to correct. In our experience a single retry shot often does the trick and if it fails, it will likely fail in subsequent rounds. Once an initial validation is passed we actually go through the reply and validate that it makes sense, we have a few simple validation heuristics that improve the success rate dramatically.

Conclusion: The Necessity of Guard Rails for LLMs

Hallucinations are an inherent challenge with LLMs, that cannot be ignored. They are an inherent part of LLMs and require dedicated code to overcome. In our case, we provide the user with a way to provide even more context to the LLM, in which case we explicitly ask it to be more creative in its responses. This is an opt-in solution for users and often generates better placeholder code for components based on existing usage patterns. Interestingly, when we apply this to component libraries that the LLM was trained upon (MUI, for example, is quite popular) the hallucinations increase as the LLM has prior bias towards those component implementations and the guard rails are particularly useful there.

Start using Frontier for free and experience the benefits of robust guard rails for LLM in your code generation process.

Start Frontier for Free

The post Guard rails for LLMs appeared first on Anima Blog.

Pluggable design system – Figma to your design system code

Ofer LaOr — Tue, 02 Jul 2024 14:35:35 +0000

Reading Time: 3 minutes

Design to code is a difficult problem to crack, there are so many variations to consider. On the Figma side, we have to consider auto layouts, design tokens, component sets, instances and Figma variables. On the code side, we have to assume that the codebase could contain both local and external components that could come from anywhere.

That’s why, when we created Frontier, we didn’t want to stick to just one coding design system. MUI, for example, is a very popular React Design System, but it’s one of many design systems that are rising and falling. Ant Design is still extremely popular, as is the TailwindCSS library. We’re seeing the rapid rise of Radix-based component libraries like ShadCN as are Chakra and NextUI. However, we knew that if we wanted to reach a wide audience we could not rely on a limited subset of Design Systems, we had to create a “pluggable design system”.

Key Challenges in Implementing a Pluggable Design System

There are a few challenges to accomplishing this:

1. Existing Project Integration:
  
  You have an existing project that already uses a design system. In this case, we are expected to scan the codebase, and understand and reuse the design system. We do this when Frontier starts, it looks through your codebase for local and external components (you can restrict where it actually scans and also control how deeply it looks at the code) for your code components and usages of those code components.
2. Design and Code Component Mismatch:
  
  When we look at the Figma design, we don’t assume that the designer has a clear understanding of which component system will be utilized to implement the design. Typically, if this is an Enterprise with a Design System Team, the components in the design will match in design. Still, not necessarily in their name, variants nor have a 1:1 match between the Figma and code component counterparts. In fact, the same design could be used with different Design Systems code components and fully expected to match and work.
3. Flexible Implementation:
  
  Once applied, components could have multiple ways to implement overrides and children:
  1. Props / variants
  2. Component children
  3. Named slots
4. The “Cold start” problem
  
  Even if you solve scanning the project’s repo, what happens when you encounter a brand new project and want to use a new library with it? In this case, you would have zero code usage examples and zero components that you are aware of…

To overcome these problems we started with a few assumptions:

1. Leverage Usage Examples:
  
  the project has a robust set of usage examples, we can take inspiration from them and understand how this particular project utilizes those components, which will help us solve the prop/overrides/children/named-slots issue.
2. Custom Matching Model
  
  We had to create a custom model that understands how designers in design systems implement their components and how developers code the code components. This matching model was trained on a large set of open source Design System repos and open Figma design systems. It reached a surprisingly high matching rate on all our tests. Looks like many designers and many developers think in similar ways despite using very different conventions and actual designs.
3. Cross-System Matching
  
  Once we were able to match within the same design system, the next challenge was to make the model more robust with matching across design systems – take a design that relies on AntD components and train the model to implement it using MUI components, or vice versa. This made the model much more versatile.
4. Local Storage for Privacy and Security
  
  For security and privacy purposes, we have to encode and store our RAG embeddings database locally, on the user’s machine. This allows us to perform much of the work locally without having to send the user’s code to the cloud for processing.

Interestingly, the fact that we can store bits and pieces of this databases, also opens up possibilities with cold starts. An empty project can now easily state that it wants to use MUI and simply download and use the embeddings. That gives the usage LLMs all the context that’s needed to produce much more robust results, even when the codebase is completely empty from any actual context.

The result is that Frontier can now generate code components in projects, even if the Design System doesn’t actually match the code design library and even when the codebase is completely devoid of any actual examples.

The post Pluggable design system – Figma to your design system code appeared first on Anima Blog.

Does Frontier support NextJS?

Ofer LaOr — Fri, 21 Jun 2024 07:43:53 +0000

Reading Time: 2 minutes

Short answer: Yes!

Long answer:

NextJS is an extremely popular framework for ReactJS that provides quite a few benefits, one of which is the mix between server and client-side components.

Server-only components are components that do not use/need state and can pull their data from external APIs without worrying about credentials falling into the wrong hands. They can only be rendered on the server. Server components may contain server and/or client components.

Client-only components are components that have the “use client” directive defined. A component that uses state and other React APIs needs to be a client component, but a client component doesn’t require state to function.

In Next.js, components are server components. This ensures that fully-formed HTML is sent to the user on page load. It’s up to the developer’s discretion to set the client boundaries. If components are not using state and are not making outward API calls, they can be implemented both as client and server, which is ideal.

Since it can be quite complex to determine, which type of component a particular React component is (Server only, Client only, agnostic), Frontier will generate client components by default, when it detects NextJS. This is done by adding the ‘use-client’ statement at the beginning of the component declaration.

This issue arises because it can be challenging to identify if the rendered component tree includes descendants that must be rendered on the client side. Without a ‘use client’ directive for those components, runtime errors may occur.

If you remove the ‘use-client’ and the code still builds with no errors, this means that the client boundaries have been set correctly, and you can let Next.js determine whether the component is rendered on the client or the server. If, on the other hand, removing it causes a build error, it means that one or more of the descendants uses client-only APIs, but hasn’t declared itself as a client component. In this case, you can add the ‘use-client’ statement in the code we’ve created, or add the directive directly inside of the offending descendant.

So, what’s the bottom line?

Short answer: Yes, Frontier supports NextJS!

Start here!

The post Does Frontier support NextJS? appeared first on Anima Blog.

Generative code: how Frontier solves the LLM Security and Privacy issues

Ofer LaOr — Wed, 05 Jun 2024 14:17:50 +0000

Reading Time: 3 minutes

When it comes to generative AI and LLMs, the first question we get is how we approach the security and privacy aspects of Frontier. This is a reasonable question given the copyright issues that many AI tools are plagued with. AI tools, after all, train on publicly available data and so could expose companies to potential copyright liability.

But it’s not just that, companies have invested heavily in their design language and design systems, which they would never want to expose externally and their code base is also a critical asset which they would never want to partake in LLM or AI training.

When designing Frontier, privacy and security were foremost concerns from day one. First, it was clear to us that Frontier users cannot expose their codebase to anyone, including us. That means that much of the data processing had to take place on the user’s device, which is quite difficult given that we are running in a sandbox inside a VSCode Extension. Secondly, we needed to expose the minimum amount of data and design to the cloud. Additionally, any data that needed to be stored, had to be stored in such a way where it could be shared by multiple team members, but should not be stored on the cloud. Finally, none of our models could have any way to train from the user’s design or codebase.

Get Access

The first part was isolating the Figma designs. By building a simplified data model, built in memory from within VSCode, using the user’s own credentials, we are effectively facilitating an isolated connection between the user and Figma APIs without us in between and without our servers even seeing a copy of the design.

The typical implementation used for generative code generation is to collect the entire code base, break it into segments, encode the segments into embeddings and storing them into a vector database. This approach is effective but won’t work well in our case, since storing this data on our servers would mean we are exposed to the data. In addition, the code base is continually evolving and would need to be reencoded and stored every so often, which would make this process slow and ineffective.

Instead, our approach was to develop an in-memory embedding database, which can be stored and retrieved locally and rebuilds extremely quickly, even on large codebases. In order to secure this data, we store it on the user’s workspace, where it can be included in the git repository and shared between the users, or simply rebuilt per-user.

But this would be useless if we would have to send a large code sample to an LLM for each line of code we generate. Instead, we implemented a local model that runs in VSCode, so when we do need to use an LLM, we share the interface of the components instead of needing code. Users can improve the results by opting in to include some real-world usage examples of how Button is used in the codebase, sharing with the LLM a simplified thin code showing how Button component is used in the code base, but not how Button is implemented or what it actually looks like or does…

By limiting the amount of data and anonymizing it, we can guarantee that the LLM doesn’t get trained or store the user’s code in any way.

But how do we guarantee that data doesn’t get “leaked” from outside sources that the LLM trained on back into the codebase, exposing the company to potential copyright risk? First, we limit the type of code that the LLM can generate to specific component implementations, only after it passes a guard rail system. The LLM Guard rail validates the code makes sense, and can identify hallucinations that might invalidate the code or introduce copyright liability to the code base. If the code passes the guard rail system, we are extremely sure that the results correlate with what the user expects from the component code.

Finally, for full transparency, we store the data in open JSON files inside the .anima folder on your project’s workspace. Different workspaces would have different settings and components. Sharing this information between users can be done through git (or a shared file system of any kind), which eliminates Anima from being exposed to any of the cached data for components, usage or the entire codebase or Figma design data.

Get Access

The post Generative code: how Frontier solves the LLM Security and Privacy issues appeared first on Anima Blog.

LLMs Don’t Get Front-end Code

Ofer LaOr — Tue, 28 May 2024 08:57:35 +0000

Reading Time: 3 minutes

LLMs Don’t Get Front-end Code

I see this pattern repeats every few months: A new multimodal LLM comes out, and someone on Twitter takes a screenshot of a game or app and provides it to the LLM, resulting in working code that actually runs.

Hence the meme: Front End Developers, you will soon be replaced by AI…

After so many years of managing software, I should know better. The variation between each team and projects within each team are infinite. Each team uses a different combination of tools, frameworks, libraries, style of coding, CSS language/framework, all of which are constantly changing. Small startups will typically adopt a public Design System and adapt it to their needs, while larger companies would have their own customized Design System components through a dedicated team. Good luck asking an LLM to conform to these requirements, since it has zero context with that combination of tools and components.

So, good luck trying to get the LLM to code in your style, using your front-end components and have an in-depth understanding of design. At best, it can take a 2D image of your screens and make it do something… Turning that result into production code will likely take you longer than to start from scratch.

More so, as the tools evolve, the level of complexity and thought that goes into these combinations make front-end Developers into professional problem solvers. They typically get an impossible Figma design, which they would have to fully understand, then negotiate changes with the designer, until they hopefully can adapt it to the design system. These are very human problems, and require human operators to drive them.

Enter: Useful generative coding

But LLMs are revolutionary and will make a huge impact on developers. Given the right context, AI can locate and correct bugs, help design the software, and turn developers into 10x individual contributors (10xIC). This is precisely what Github Copilot does: It learns from your project and given the huge amount of relevant context, it attempts to predict what you’re trying to accomplish and generate the code for that prediction. Developers get an efficiency boost using Copilot, but just one problem…

Copilot understands concepts like functionality, components, and state. It fundamentally does not understand design. Why would it? It has no context to the design that the front-end developer is using, so when you start creating React components, it will just give you boilerplate code that it most likely learned either from your project or from other designs. I often see it generating an endless round of meaningless HTML gibrish, it’s chance of actually succeeding to predict your design is infinity small. More so, match your particular components and giving you code that’s of value, that’s Sci-Fi…

That’s why many front-end designers either do not use Github copilot at all, or use it for everything apart from design. But what if you could extract context from the design? That’s where Anima Frontier comes in. Frontier has context to the Figma design, including deep understanding of the Figma components, overrides and Figma Design System, as well as your codebase and your design system code components. By matching those, and with the ability to generate scaffolding code based on the Designer’s specifications (and not a static snapshot of their design), the resulting code is a perfect code companion specifically made for front-end developers. It works together with Github Copilot to fill the void that is design.

We do not really think that Designers or Front End Developers are going away any time soon. We don’t think it’s realistic that they’ll be replaced by automated tools. Tools like Frontier are intended to work like Copilot – in making front-end development easier and more approachable. By providing context and assistance to the developer we can turn Front End developers more productive. This is exactly the type of tool I wish I had when I started coding – it’s the perfect way to extract the most from what the Designer has already embedded in the design, sometimes without even realizing it.

The post LLMs Don’t Get Front-end Code appeared first on Anima Blog.

VP Engineering

Minimizing LLM latency in code generation

Optimizing Frontier’s Code Generation for Speed and Quality

Introduction

Challenges

Balancing Speed and Intelligence:

Code Isolation and Assembly:

Browser Limitations:

Solutions

Batching System

Micro-Caching

Conclusion

Guard rails for LLMs

Implementing Guard Rails for LLMs

Challenges in Code Generation with LLMs

Strategies for Implementing Guard Rails

Conclusion: The Necessity of Guard Rails for LLMs

Pluggable design system – Figma to your design system code

Key Challenges in Implementing a Pluggable Design System

Existing Project Integration:

Design and Code Component Mismatch:

Flexible Implementation:

The “Cold start” problem

Leverage Usage Examples:

Custom Matching Model

Cross-System Matching

Local Storage for Privacy and Security

Does Frontier support NextJS?

Short answer: Yes!

Long answer:

Generative code: how Frontier solves the LLM Security and Privacy issues

LLMs Don’t Get Front-end Code

LLMs Don’t Get Front-end Code

Enter: Useful generative coding