Domain Adaptation of Small Language Models - Fine-Tuning SmolLM-135M for High-Fidelity Paraphrase Generation

SLM vs LLM

I recently embarked on an adventure to fine-tune HuggingFace’s SmolLM-135M, one of the smallest language models available. The goal was to tame this lightweight model and explore its performance on paraphrasing tasks.

Dataset Choice

Instead of the usual Quora Question Pairs (QQP) Dataset, I chose PAWS since I wanted plain statements to train my model on. While the dataset size was sufficient, I found that quality mattered more—ultimately, I augmented it by creating reverse pairs to improve paraphrasing coverage.

Training Experience

Once the appropriate module versions and dependencies were aligned, the training process ran fairly smoothly. I documented the full process in my training notebook: https://www.kaggle.com/code/finalepoch/fine-tuning-smollm2-135m-for-paraphrasing-tasks
.

Results and Observations

The results were modest but insightful. The model learned to paraphrase, although the output quality highlighted the limitations of training data. One thing I can vouch for is that hallucinations were totally under control—in the worst case, the model simply repeated the same sentence. Using techniques like LoRA fine-tuning, and adjusting tokenizer settings, temperature, top-k sampling, and max tokens, I gained a deeper understanding of model behavior and fine-tuning nuances.

Here are some results:
Results post LoRA

Inference Workflow

I implemented a workflow where the fine-tuned model is converted to ONNX, and inference is performed using ONNX Runtime, yielding outputs comparable to the original PyTorch model. This workflow is fully demonstrated in my Kaggle notebook https://www.kaggle.com/code/finalepoch/smollm-360-lora-onnx-inference
.

Challenges and Takeaways

I faced a significant roadblock with running ONNX models in Node.js. My vision was to train a small LM, quantize it, and deploy it with a low-footprint runtime (Node + ONNX) on consumer-grade CPUs within 1GB RAM, but practical examples are scarce. If you have pointers or references for LLMs in JavaScript using ONNX, I’d love to hear them.

Despite the hurdles, this project was extremely rewarding. It reinforced my understanding of lightweight LLMs, LoRA, inference optimization, dataset quality, and controlling hallucinations.

I hope you enjoy reading about these experiments and perhaps find inspiration for your own small-scale LLM projects.