You can access it here, but it is on a cheap deployment system so please don’t use it more than three at a time 😉.
The Object Detector
We used SSD for PyTorch that Max deGroot and Ellis Brown created here.
There may have been some niggles getting it going, but they were so minor that I’ve forgotten what they were.
The Front end
We used Streamlit – which is brilliant because it allows us to concentrate on the actual computation part rather than on all of the boring infrastructure and boilerplate around traditional front end and back end web frameworks.
However using Streamlit does not come for free – it was easy to overwhelm the cheap Heroku Dyno that we used. We assume this is because Tornado running on a a cheap heroku dyno is not capable of handing all the web socket traffic. Streamlit by itself was a minor component in usage of RAM, which we will get to below.
Heroku
Heroku is great, because it hides much the messy details around the tooling and infrastructure associated with web apps.
However we have come to the conclusion that Heroku is not the best platform to run your ML tools on.
Space Issues
An SSD PyTorch object detector which has the torch, torch vision, Matplotlib, cv2 libraries is big.
It couldn’t be installed on Heroku the normal way, because the slug size was close to 950MB, and their limit is 500MB.
Plan B was to use Docker on Heroku. A Docker image using the libraries above ended up being between 4 and 5 Gigabytes on my local machine. Incredible. And for one in two updates, the whole torch vision library gets reinstalled, multiple prunes have to be done etc, etc. Before trying out whether this could be cached we tried plan C, because we were not going to upload 4.5GB to heroku every time a change was made in the code (well we did try this a couple of times but didn’t want to wait a week for the 3GB tranche to get there, so we gave up after a few attempts at an initial test deployment).
Plan C was to simply get Heroku to do the Docker image build.
That worked beautifully, relatively quick to upload (given 100MB of SSD weights), quick to build, and finally we were up and running!
RAM
However the cheaper Heroku Dynos have a RAM limit of 500MB, they are lenient, you can gave 10-20% over, but running deGroot and Browns SSD, took it to that limit with only one browser client. It fell over with two.
We assumed part of the issue was that two clients doubled the RAM (and it log messages did suggest that. Two clients took around 950MB of RAM usage, before causing a crash).
So we pulled out most of the SSD and put it into a synchronous Flask service. Streamlight still uploaded images, and annotated and drew on it, once the SSD Flask service returned its detections.
FastAPI looks good however, weirdly we needed to bottleneck the detection to 1 client at a time, so we used Flask.
This architecture works quite well for 4-5 clients. After this the hobby dyno starts to get memory warnings and time outs occur.
We assume that we could run 20 cheap Dynos running the Flask SSD service, and maybe if we gave the Streamlit app a big hearty dyno (minimum $50 a month), the whole thing could operate quite stably for a small audience.
Often we hear talk about managing thousands of concurrent connections in web apps.
Have a think about how much RAM 1000 simultaneous clients would use doing object detection, with each using approximately 600GB of RAM. I’m sure AWS would happily scale that for you, but it is also sure it would be minimum tens of thousands of dollars a week.
Lessons from the Demo
Streamlit is amazing – but perhaps better in your intranet, than as a public service.
High usage ML in the cloud seems like a world of hurt for your wallet, putting as much computation as you can on a smart phone in a native mobile app (not a web app) seems like the most obvious way to more easily manage costs and reduce architectural effort, we can retain some of the wonderful python ecosystem (and training would all be in python), but deployment would be using CoreML, and whatever the Android equivalent is (MLKit?).