We’ll start with Parler’s case against Amazon being dismissed a few minutes ago. Basically after warnings and no action Amazon enforced its terms of service and really there’s no legal way you can force a company to let you violate the terms of service and act against your own interests. Talk radio and politicians have been jabbering on about how much power these service providers have and to what degree they should be able to censor sites.
But the reporting and the rhetoric on who gets to control speech is based on very little understanding of how the net works, how a website works, or how a social network works.
TL;DR – I’m going to go over the basics of creating a social network in your basement that could scale to Parler’s size.
So, we’ll start with how Parler worked – Parler rented servers and bandwidth from Amazon on the AWS platform. They ran on Amazon’s computers, they used Amazon’s bandwidth, they used Amazon’s processing power, and they were under Amazon’s terms of service. All eggs, one basket. As went Amazon, so went they. They went.
Now, this is how a lot of places do this. This isn’t a bad thing, but the media and anyone who wants to complain they’re being censored, well maybe they are by that level of service host, but Amazon and the big hosting providers are not the only way you can do this. You can host your own social network (seriously, look at the dark web, I ran a message board on a cell phone for a while.) Parler just chose the easiest path.
Just for crap’s sake if you didn’t know, I worked scaling up a website from 2 million to 47 million users at its peak back in 2005. We owned and operated everything except the bandwidth. I’m still uncertain if the NDA applies so I’m still vaguebooking it in.
So, I’m going to design, very poorly I will note, a platform that’s not based on AWS/cloud hosting for a social network like Parler based on them owning the equipment and operating it themselves.
The following will make IT people cringe, so avert your eyes.
Parler was small essentially.. I mean you hear 57 TB of data pulled that sounds like a crapton of data, but at $100 per 1 TB SSDs that’s backend storage of $6000. That’s of course without redundancy, no RAID, just fast raw storage without a lot of things, seriously fear my tinkertoys, but we’re going to start by just building a crappy functioning clone and then we can talk about what we need to be a real network. So let’s say $10,000 in storage media to own it and host it yourself (and have no redundancy – lemme stress, this is building a tinkertoy network).
You’ll need PHP, MySQL, Apache, and a wide variety of software that’s going to cost you in the … $0’s of dollar. Yeah, you can build this free (minus programmers,).
You need a few database servers (say $15K as all we need is memory, network cards, processors.) They don’t have to be insanely powered as we’re looking at a very limited data set, but a lot of memory and a plan for how to pull data on the random storage units you have laying around rather than just throwing everything on wide storage and serving it up (like Parler did on AWS,) will let you prioritize accessed content (fast storage,) vs stale content that can be put on much cheaper spinner storage.
Maybe spread databases to different servers instead of sharing just so each server is more efficient and working with less of a database.
You’ll need some NAS/drive bays for all that storage. Let’s say $40K above the $10K for media… $50K.
Keep in mind any social network with only 2 million or so user’s active data is small.. it’s all within a few days… it’s not like you have to access all 70tb of data all the time. You can pull that stale data from archives back to active serving as needed. Sorry Helen, your poodle pictures aren’t getting accessed, off to the cheap spinner farm until they are.
Keep your old puppy videos on cheap storage, keep the stuff that’s been accessed frequently on the good storage (just copy it over as it’s accessed and let it fall off as it’s not). Move your old pet photos and posts to archived storage and when needed pull them back out to main. This reduces needing to have a huge front end high speed investment as your database is not dealing with 57TB, it’s dealing with 2-5 TB of active data and if it’s not in the active we go pull it into the active.
You’ll need a few front facing computers for web servers. Throw in $5K more (we need NIC cards and a few processors, not much else.) A few back end database servers (listed above for $15K.) And let some of the NAS units you’re using for storage do some of the back end maintenance like figuring out what’s not been accessed in forever and needs to be moved off.
Now, we’re going to look back at Net Neutrality for a second. If this were around, there’s absolutely nothing an internet service provider could do to you unless you were violating laws. But it’s not here, some jerk killed it, and you’re dealing with physical equipment you own and are operating in your basement. Shame Net Neutrality ain’t around, but let’s look at what you can do.
First off, you can hide and obscure your data. There’s nothing to say that if you’re on AT&T business fiber ($400 a month,) that the data is being served to the public via AT&T business fiber. You go for a service like Cloudflare (paid plan,) DDOS-Guard, or something else. Hell, Pocketables uses Cloudflare… it would take you all of like 3 seconds to find out who our web hosts are by searching “web hosts,” but it might take you a little longer to get our IPs (I’m betting 13 seconds for anyone who knows how hard I tried to hide them.) At the point where AT&T wasn’t the front facing IP or associated with you they don’t care. They also might not ever have any clue. You probably wouldn’t have any clue if I worked at hiding them.
You may be recoiling at the thought of trying to cram 2 million users down a 1GB pipe. Yeah, we’re not. We’re serving up static content off Cloudflare or somewhere else. Our only connection to the outside world is Cloudflare. We’ve got some PHP generation for new content going on but there’s not a huge amount of brand new content being generated, until there is.
Got fiber? Get a few connections (google ($250 a month,) AT&T ($400,) Comcast ($299,) etc) – you can rent space in an office building with multiple fiber inputs. Hell, you could rent an office, host it there for $1000 a month plus whatever your fiber connections are costing. Let’s say $2K a month plus electricity.
And there’s more.. there’s also pretty much nothing that says your storage of videos (the bulk of the Parler pull) has to even be associated with your service. There are video hosting services you can pay if you wanted. Bunches of places you can stash data if you’re willing to pay.
Serving up cached content via Cloudflare or the like so your databases aren’t even being hit unless someone responds to a thread. Your home/building bandwidth is not being touched for much other than logging in and seeing what your user has access to view, generating some new pages that get pushed up and served as static later on.
Parler had all its eggs in one basket. This was an easy basket. It’s an industry standard basket. It’s a damned cozy basket. And the basket owners told them they were going to take the basket away.
But yeah, let’s look at a system that can handle maybe 2TB of active database and video content because that’s probably what they were hitting per day. I very much doubt old content was a hotbed of activity, and with only 2.3 million users in the database and barely half of them active you’re looking at having to build a couple of million sessions a day. If that.
Storage arrays, a plan to move stale data off all the time, a set of database servers, a set of front facing web servers, an obfuscation/bandwidth caching service such as Cloudflare, I mean it’s not a pretty site but you can cobble this together inside a house and be running until you can move portions of your site back onto real hosting.
I suspect with decent programming you’d need 3-5 front facing web servers, 3 database servers of umpteen RAM, 3 primary NAS servers with SSDs out the butt, 3-6 spinner NAS servers. Ongoing dual fiber connections to the net (for cloudflare, noncached data) at about $1K a month.
By spreading out, hosting the database and generated content in your basement, hosting video and static content elsewhere, you can build a really tin-can functioning social network for not a whole lot of cash and run it on your own terms of service. Spreading out to other services also will reduce initial cost of equipment, but if you’re company non gratis you might have to pay up front for your own toys as they might not let you play.
Yeah, I realize there’s a lot more involved, but the basics are relatively simple to get up and running, it just takes some effort and planning on how to not throw everything into one basket. How to optimize for the now rather than for the everything (which it looks like Parler did.) Hiring security that’s capable of doing things like implementing rate limits, hiring programmers that can create a limit based on what we can spend, and then operate well (2tb or so) and work within that limit. Thinking to reality.
Want redundancy? You’ll be amazed at how much easier it is to deal with companies like MS and Amazon if they’re not involved in serving your data and if they also have no access to the data. Alternately colo and double the cost.
Now, there are a TON of things that AWS does for you. I’m not saying my $150K tinkertoy operation doesn’t come without its perils – you’re in charge of security, you’re responsible for bandwidth, backups, intrusion, you’re responsible for all sorts of things that AWS offered. But it’s doable. Just takes some cash, some designing differently, and planning for that any day some of your content may be missing.
Now, without Net Neutrality you are at the mercy of whoever provides your bandwidth, but from what I’ve seen Cloudflare don’t care. I don’t know what you’d have to do to get dropped from them.
So yeah, AWS and the hosting providers have the power to enforce their rules. Nothing says you have to play with their toys. Just they have neat and useful ones for people who don’t want to go the more difficult route that sometimes ends up with you hanging at a colocation spot holding a vacuum cleaner over your head at 3AM because someone stole a ladder and you had to improvise.