Rapid Prototyping with Claude Code

One very useful thing you can do with Claude Code is create rapid prototypes. Recently I was asked to create a small project to translate the contents of various files types between languages. By rapid prototyping with Claude Code, I was able to present a rough first draft in less than 30 minutes.

Even this first draft proved very interesting to the user. I was able to use the limitations of the draft to start a discussion. This allowed me to delve deeper into their requirements. One valid concern they had was that they needed to know if a translation might not be correct. This became one of the core requirements for the prototype.

Rapid Research

Using a Claude skill I did some research on methods for assessing the accuracy of translation. To do this I got instructed Claude to do iterative rounds of online research about methods for assessing translation quality. My first thought was to conduct a back translation with a different LLM and then use a third LLM to compare this to the original text. However Claude’s literature search of translation evaluation methods suggested that this check on its own might not be robust enough. A lot of the methods suggested by Claude required human translation for comparison which was not practical for the use case. However some other methods were suggested including BLEU (bilingual evaluation understudy), Semantic Textual Similarity BERT Score, cosine similarity of sematic embeddings, and COMET-QE.

Working Prototype

Armed with this knowledge I took less than a day working with Claude to produce a moderately robust translator which could translate a variety of input format files. I would have been even faster but for token limits on the Claude Pro plan I was using. The program includes retry logic for poor translations. The program outputs the translations to the specified languages. It also outputs the back translations and quality scores for evaluation of the prototype.

Translations are scored by taking a harmonic mean of the scores from 4 different evaluation methods. These are cosine similarity, BERT Score, BLEU Score and LLM evaluation. Despite UV’s best efforts at package resolution, I could not find a version of COMET that relied on a version of numpy compatible with Python 3.13 so this was left out of the prototype. I chose a harmonic mean to bias in favour of any low scores for a translation. This should help in case one method picks up issues the others did not.

This system works quite well with modern LLM models accessed over OpenRouter. I also trialled it with small models accessed via Ollama. This proved much more problematic since Ollama has no direct way to clear its KV cache. This means constant loading and unloading of models is required to avoid cache pollution. This made Ollama models impractically slow. The smaller local models did not produce good quality translations in any case.

The prototype can be seen at https://github.com/JustinMatters/translator

Conclusions

The prototype still has a lot of rough edges and is only a command line tool at present. However it took less than a day to build with enough features and documentation to make it a genuinely useful prototype which can now be evaluated in practical use to see whether we want to pursue the idea or go with a commercial offering.

The really nice thing is that Claude was able to assist at all stages, making the entire process quick and painless. Also the prototype is well documented meaning any further work whether progressing from the prototype or starting with a clean sheet will have useful information and a point of comparison to work with. Overall, rapid prototyping with Claude Code shows great promise for investigating possibilities for useful computer based tools in much less time than was previously required.

Rapid Prototyping with Claude Code

Rapid Research

Working Prototype

Conclusions

Published by justinmatters

Leave a Reply Cancel reply