How to Deploy Machine Learning on GitHub Pages Spinning Up

Learn how to deploy machine learning on GitHub Pages by spinning up browser-based inference using TensorFlow.js and ONNX. A complete guide to serverless ML hosting.

Deploying machine learning (ML) models traditionally involves setting up a Flask or FastAPI backend, containerizing with Docker, and managing cloud infrastructure costs on AWS or GCP. However, for many edge-use cases, browser-based inference provides a faster, cheaper, and more private alternative. If you are looking for a way to showcase your research, build a portfolio project, or provide a zero-latency tool for users, learning how to deploy machine learning on GitHub Pages is the optimal path.

GitHub Pages is a static site hosting service. While it cannot run Python or server-side code, advancements in WebAssembly (Wasm) and JavaScript-based ML frameworks like TensorFlow.js and ONNX Runtime have made it possible to "spin up" sophisticated models directly in the user's browser.

The Architecture: Static Hosting vs. Dynamic Inference

The fundamental challenge with GitHub Pages is its static nature. To deploy ML here, you must shift from a Request-Response architecture (where a server processes data) to a Client-Side Inference architecture.

1. Model Conversion: You must convert your Python-trained models (PyTorch, Keras, Scikit-Learn) into a format the browser understands (JSON, .bin, or .onnx).
2. Static Storage: These model weights are hosted as static files on the GitHub repo.
3. Client Execution: The browser downloads the model and runs the inference using the user’s local CPU or GPU (via WebGL or WebGPU).

Step 1: Converting Your Model for the Web

You cannot upload a `.pkl` or `.pt` file and expect it to work. Depending on your framework, follow these conversion paths:

For TensorFlow/Keras

Use the `tensorflowjs` converter. This splits your model into a `model.json` file (topology) and several binary weight files.
```bash
pip install tensorflowjs
tensorflowjs_converter --input_format keras model.h5 ./web_model
```

For PyTorch/Scikit-Learn (ONNX)

ONNX (Open Neural Network Exchange) is the industry standard for cross-platform deployment. Use `torch.onnx.export` to convert your PyTorch model to an `.onnx` file. This single file can then be loaded using ONNX Runtime Web.

Step 2: Setting Up the GitHub Repository

To spin up your deployment, initialize a standard frontend project structure:

`index.html`: The UI.
`script.js`: The inference logic.
`/models`: A folder containing your converted model files.
`style.css`: Basic styling.

Critical Note on File Sizes: GitHub has a 100MB file limit. If your model weights exceed this, you may need to use Git LFS (Large File Storage) or, better yet, quantize your model to FP16 or INT8 to reduce the footprint without significantly impacting accuracy.

Step 3: Writing the Frontend Inference Code

With your model hosted as a static asset, you need a JS library to load and run it. Here is a boilerplate example using TensorFlow.js:

```javascript
async function runInference() {
// 1. Load the model from the static URL
const model = await tf.loadLayersModel('models/model.json');

// 2. Prepare input (e.g., from an HTML canvas or input field)
const inputTensor = tf.tensor2d([1.0, 2.0, 3.0, 4.0], [1, 4]);

// 3. Execute and get results
const prediction = model.predict(inputTensor);
prediction.print();
}
```

If you are using ONNX Runtime Web, the syntax is similar:
```javascript
const session = await ort.InferenceSession.create('./models/model.onnx');
const feeds = { input_node: new ort.Tensor('float32', data, [1, 3, 224, 224]) };
const results = await session.run(feeds);
```

Step 4: Configuring GitHub Pages for Model Loading

Once you push your code to GitHub, head to Settings > Pages. Choose your branch (usually `main`) and the folder (usually `/root` or `/(docs)`).

Handling MIME Types and CORS

When "spinning up" the site, you might encounter issues where the browser refuses to load binary files. Since GitHub Pages serves files with standard headers, ensure:
1. SharedArrayBuffer: If you use advanced multi-threading in ONNX, you might need specific COOP/COEP headers. GitHub Pages does not support custom headers natively. To fix this, you can use a service worker like `coi-serviceworker` to enable these features on static sites.
2. Relative Paths: Always use relative paths (`./models/...`) to ensure the assets resolve correctly regardless of the sub-path of your GitHub Pages URL (e.g., `username.github.io/repo-name/`).

Optimizing Performance for Indian Users

In India, where mobile penetration is high but high-end hardware and stable high-speed 5G can vary across regions, optimization is key:

Model Quantization: Reducing a model from 32-bit to 8-bit can shrink the size by 75%, leading to faster "spin up" times over mobile data.
Progressive Loading: Show a loading bar while the model weights are being cached in the browser's IndexedDB. This prevents users from thinking the site is frozen.
WebGL/WebGPU Acceleration: Ensure your code targets the GPU to keep the device from overheating during inference.

Limitations to Consider

While GitHub Pages is excellent for cost-free deployment, it isn't a silver bullet:

No Private Models: Everything in a GitHub Pages repo (public) is accessible. Do not host proprietary weights if you are sensitive about IP.
RAM Constraints: Large Language Models (LLMs) like Llama 3 often exceed the browser's available memory. Stick to vision models (ResNet, MobileNet), NLP (BERT-tiny), or custom regression models.
Cold Start: The very first time a user visits, they must download the weights. Subsequent visits are near-instant due to browser caching.

FAQ: Deploying ML on GitHub Pages

Q: Can I run Python on GitHub Pages?
A: No. GitHub Pages only serves HTML, CSS, and JS. You must use Pyodide (Python in Wasm) or convert your models to JavaScript-compatible formats.

Q: Is there a cost to hosting ML models on GitHub Pages?
A: No, it is completely free, making it ideal for Indian startups and students building MVPs.

Q: Does it work on mobile browsers?
A: Yes, as long as the mobile browser supports WebGL or WebGPU, which most modern versions of Chrome and Safari do.

Q: How do I handle large models above 100MB?
A: Use Git LFS for storage or host the weights on an external CDN and fetch them via URL in your JavaScript code.

Apply for AI Grants India

Are you an Indian founder building the next generation of AI-native applications? At AI Grants India, we provide the resources, mentorship, and equity-free funding needed to scale your vision from a GitHub repo to a global product. If you are spinning up innovative ML solutions, we want to hear from you. Apply today at https://aigrants.in/ and join the ecosystem of world-class AI creators in India.