20x Faster Background Removal in the Browser Using ONNX Runtime with WebGPU

update: 22 June 2024
 

TL;DR: Using ONNX Runtime with WebGPU and WebAssembly leads to 20x speedup over multi-threaded and 550x speedup over single-threaded CPU performance. Thus achieving interactive speeds for state-of-the-art background removal directly in the browser.

Removing background from an image is a typical job to be done in creative editing. We have come a long way from manually knocking out the background from an image to full automation with Neural Networks.

Most state-of-the-art background removal solutions work by offloading the task to the server with a GPU as it was simply infeasible to run the NN on the client.

However, running background removal directly in the browser offers several advantages over server-side processing:

Reduced server load and infrastructure costs by offloading heavy lifting to the client.
Enhanced scalability by distributing the workload across client devices.
Easier compliance with data protection and security policies by not transferring data across a network to a server.
Offline processing without needing a reliable internet connection.
It caters to a wide range of use cases, including but not limited to:

E-commerce applications that need to remove backgrounds from product images in real time.
Image editing applications that require background removal capabilities for enhancing user experience.
Web-based graphic design tools that aim to simplify the creative process with in-browser background removal.
In general, two factors influence the feasibility of running background removal directly on the client.

The execution performance, and
the download size of the Neural Network.
The performance or overall runtime is the major factor to be useful in interactive applications, if a user has to wait several minutes or hours for a neural network to execute, this is in many cases far too long in terms of good user experience. From experience, there are three factors to consider.

The initial first-time execution. The major factor is that neural networks come with the drawback of generally being several MB to GB in size, thus the time to download the neural network into the browser cache is considerable. In subsequent browser page reloads this has no impact anymore.
The neural network or session initialization time, cannot be cached and has to run with every reload of the page in the browser.
The neural network or session inference time, largely depends on the longest path inside the neural network and most importantly the execution time of each operator in the neural network.

 

https://search.asu.edu/sites/default/files/2024-06/complete-guide-to-watching-bad-boys-ride-or-die-online.pdf
https://search.asu.edu/sites/default/files/2024-06/how-to-stream-bad-boys-ride-or-die-full-movie.pdf
https://search.asu.edu/sites/default/files/2024-06/watch-bad-boys-ride-or-die-full-movie-for-free-online.pdf
https://search.asu.edu/sites/default/files/2024-06/watch-bad-boys-ride-or-die-full-movie-online-guide-usa.pdf
https://search.asu.edu/sites/default/files/2024-06/where-to-watch-bad-boys-ride-or-die-full-movie.pdf
https://jsbin.com/podomayujo/edit?html,output
https://dev.bukkit.org/paste/9af38ae8
https://tempaste.com/E21WhnhWuVm
https://yamcode.com/cxsguydgusd