← Back to blog

An (Over-Engineered) Clone of Apple's Scribble

Jan 26, 2026Updated: Jan 26, 2026

Intro

At the 2016 WWDC, Apple premiered a new way to communicate on Apple Watch called Scribble. Scribble lets you draft text messages by using your finger to draw text directly onto the small screen of the watch. While not the fastest way to communicate, it enabled quiet, accurate writing under the constraints of the device: a small screen, weak processor, and 273 mAh battery.
This feature was super satisfying when it first came out. After 2 minutes of playing with it, I just wanted to know how they did it! And to be honest, the precise answer to that question still eludes me. But in this post, I’ll present one way to recreate the same functionality, using the path of your mouse cursor as the input.
It's important to caveat that I'm a hammer, and to me every nail looks like a machine learning problem. In any case, I hope you'll find my solution fun and straightforward. I'll continue building on this project as a test-bed for some ideas in meta learning I'm eager to play with.

Try it out!

Start drawing...
preview
conf: --time: --
Move mouse over canvas to draw, pausing briefly between characters. You can click to reset a stroke.
This widget records your cursor's velocity as you drag it across the canvas and forwards that data to api.hectorastrom.com, where the mouse velocity data is processed by a 1.2MB CNN hosted on an AWS t3.small instance in a couple of milliseconds. Most of the latency comes from the back-and-forth between AWS and your browser.
####

How It Works

I don't actually know how scribble works on the Apple Watch - that's all proprietary. It's certainly different from my version; after all, the engineers at Apple were presented with much tighter constraints when building their version than I was. However, that's also the reason this project could be built in a few evenings of excited hacking, rather than by a team of payed engineers over XX months.
In this section, I'll describe how my version of scribble (the one you tried above) works. This is a simple, fullstack ML project, but it's principles & process are very universal. To see the code and read about my attempts, visit the repo.

Pipeline

Scribble records mouse velocity data (either on device using pynput, or in browser with JavaScript), integrates the velocity data into positional data, converts that positional data into a 28x28 image, and passes it through a 300K param CNN of 53 classes (a-z, A-Z and space). We sample the most likely class as the letter prediction.
There are a couple oddities in that pipeline that deserve some justification:
  1. Why do we record with velocity, if we're integrating to position anyway?
  2. Why do we convert timeseries data to images, effectively throwing out a rich source of information about the path of the cursor?
The first oddity is a vestige of a constraint I was previously under: this project started for a different purpose, where all data I had access to was mouse velocities. The second oddity actually has a good explanation, and will become clear in the training section.

Training

The training of the stroke classification model is broken into two phases:
  1. Pretraining on EMNIST
  2. Finetuning on my own mouse strokes

Pretraining

I wanted to get a prototype of scribble up fast, which meant offloading as much of the work as possible to existing assets. Originally, I thought I could do that by using lightweight OCR models. This worked terribly, because as it turns out OCR models are built specifically for clean, typed characters - not messy, continuous mouse strokes.
There was unfortunately no alphabetical character recognition model I could find online, either. So, the laziest thing I could do next was use data that had already been collected. Thankfully, there's a dataset called EMINST that contains 145,600 28x28 B&W images of lowercase handwritten characters.
EMNIST example images
EMNIST example images
This dataset was perfect for my use case, and explains why I chose to use image representations of mouse strokes instead of the timeseries versions: that's all I had available for pretraining. So once I locked in that I'd be using a CNN for this task, I could happily pretrain away.
Pretraining takes 5 minutes on my M2 Mac, and reaches a 94.9% offline test accuracy in classifying the 26 characters available. I only needed to run this once.
EDIT: Upon further inspection, it turns out that EMNIST Letters is not 145,600 lowercase letters, but actually 145,600 lowercase + uppercase letters (where e.g. A and a correspond to the same class). This doesn't change a whole deal, as the pretraining priors learned are still (evidently) valuable. However, another dataset -- EMNIST ByClass -- distinguishes between the two: with 62 classes for uppercase, lowercase, and the digits. Pretraining on this dataset would lead to substantially better performance on uppercase letters, and shrink the domain shift between pretraining and finetuning.

Finetuning

I unfortunately couldn't stop right after pretraining because my desired domain -- of lowercase + uppercase mouse strokes -- was slightly different from lowercase, handwritten characters. Mouse strokes are continuous, messy, and harder to differentiate. I sadly couldn't avoid collecting my own data.
After collecting around 25 samples for each of the 53 characters, applying data augmentation through the 200 epochs of training, and optimizing with a partial finetune with a new classifer head, the model was ready!
wandb run for finetuning
wandb run for finetuning
Finetuning takes less than a minute, and boasts 92.9% offline test accuracy on my mouse strokes. Admittedly, the widget above feels worse than that. I attribute this descreptancy to the slightly different domain in which strokes were collected (see data collection below) and how they have to be drawn back-to-back in the widget above. With a centered mouse and patient stroke, the widget feels really accurate!

Data Collection

I really really hate data collection, but I do like making good tools. So I tried to make the script for data collection as streamlined as possible. After launching the data collection script, all you have to do is start a new stroke with space, draw the stroke you're told to draw, pause your mouse for 200ms and repeat. You can stim out pretty easy on this data collection, and it's rewarding to know you're building a better model :)
Data collection script UI
Data collection script UI

The Model

Although I'm a deep learning guy at heart, this model is really simple. The deployed version in the widget is a 300K parameter CNN, composed of three convolutional blocks, with max pooling layers gluing them together. It takes the input image from 28x28 -> 1x1, and uses two-layer a classifier head trained with dropout. Finetuning is done partially (relatively arbitrarily, I must admit; I didn't compare many methods here), updating only block 3 and the classifer head in that phase. The finetuning of the CNN reaches 92.9% performance from ~1000 training samples.
Since it feels so bad to be throwing out the rich timeseries information from a stroke, I did try using an LSTM, too. Unfortunately, this LSTM only has the ~1000 samples that I personally collected for full training, as there's no public, large-scale timeseries mouse stroke dataset I know of (HMU !)
This is an ongoing effort (attempt 3 here), which -- at a dataset size of ~1000 -- attains 62% test accuracy. I would have thought 1000 timeseries samples would be enough to match CNN performance, but that was apparently too optimistic. I'll continue working on this, but most likely the best action I could take to improve LSTM performance is upload a page for all of you to collect lots and lots of data for me ;) (coming soon...)
####

Future of Scribble

scribble, at is current state, is a cool demo; but it doesn't provide value to anyone. I could see scribble becoming a useful medium to interact with computers for those with physical disabilities or those with fine motor impairments.
scribble could also become a cool tool for everyone else. Effectively, this software transforms the mouse into an additional input medium (imagine being able to quickly execute a macro with a swipe of your mouse!)
Here are what I think the biggest limitations of scribble are, and how I intend to solve them:
  1. Images are a poor medium for rich, timeseries mouse velocity data
  2. Character recognition is best paired with natural language parser
    • Pair the outputs & confidence levels from scribble with a language model, which votes on the plausability of certain strings based on its knowledge of complete words.
    • Making this fast & on device is the real challenge -- I actually think this is a very hard problem.
  3. Data collection needs to scale, a lot
    • Make this software public & directly useful to automate large-scale data collection, so that people will want to use it give the model feedback (training data)
    • Executing macros is one idea for a use-case
To be honest, though, scribble is in a fun, functional state so I felt comfortable making it public to you all. It was an exciting thing to build after school over a week, but is not the most impactful project of all time, nor anything I intend to progress much further on its own.
The greatest value in scribble for me now is as a test-bed for some ideas in meta-learning I want to try. It's a lot easier to test and feel a model experiment here than with a bloated LLM or BFM.

Last Thoughts

Like I said at the start, though, the process of building scribble is identical to the process of building a model for any machine-learning-amenable task. Whether it's mouse velocities, language tokens, electrical brain activity, or photos, you'll discover this process is ubiqitous.

THE ML ENGINEERING RECIPE

  1. Decide what your task and goal is
  2. Figure out what data you have access to, and the cost to attain different types of data
    • Select the type that is both easiest to scale and highest signal-to-noise
  3. Design good representations & features for that data
  4. Select the best architecture for your data
    • Mess around with this! It's the main researchy part, allowing for intellectual creativity
  5. Train the model (pretrain on whatever you can)
  6. Deploy & actually let people use it
Executing this process is just a muscle, but one that's starting to get really strong for me. It's the fundamentals that I used to dream about having when first starting to learn machine learning, and enables me to work on a lot more complicated, experimental research with confidence. That's the real reason I wrote this blog: to document this recipe, applied to a task that everyone can play with.
Shoot me an email if you have thoughts for how to make scribble useful, if you want to build anything out of it, or (especially!) to give feedback on my approach to the solving problem.