Saturday, July 18, 2009

Site Architecture of a Rapidshare Site

When the site started up, I knew very little about MySQL or proper database design. Thankfully I got pretty lucky and what I designed worked, though it's far from ideal.

You may have noticed that at SinlessLinks we have a few features that other websites don't have. The most noticeable is the new way to select just what websites you want to view. This is made possible because of something that I've called a "linkset". I can't remember for the life of me if I heard that word somewhere else or coined it as I created the site, but it's that part of the database design that makes it all possible.

A LinkSet is intended to be a collection of links that contains all the links necessary to download and unrar a particular item. Or in the case of YouTube, it would contain all the links necessary to watch a full tv-episode or movie.

Often on other sites you'll see them post a "primary" set of links, and then some alternate sets. On a few "news only" blogs, you'll see a lot of different sets of links in the comments, like you have started to see here more often. This is the "problem" I was trying to solve with LinkSets. Just because the main links that the original poster provided go dead, doesn't mean that the post should be deleted. And anyone that's trusted should be able to add links of any type to a post, and have them show up, but not clutter up the page. It needed to be easy to look at and quickly grab the links you want.

And in TV-Series, I had seen many posts where there were 3 sets of alternates for each episode. Now this is great for making sure there's always working links, but when you're looking at something like Seinfeld or The Daily Show, it's just not practical to have that many different sets of links all visible at the same time. So it was decided that while many linksets can be added, only one will show up per episode. Then as soon as one set goes dead, the other would come into place. (Now mind you this has a long way to go before it's perfect -- we need an easy way to check and flag the links in a single episode or season so that the dead links will go away and working ones will show up -- and we also need a way to view all the links available for a particular episode, a page devoted to that episode mind you.)


Okay, so now that what we were trying to achieve has been mentioned, a few words about how I achieved it, what I learned, and how I would do it differently if I could start over.

The main post is stored in a table of its own. However, this table only stores information like the url, the title, the year, the creator of the post, when it was created, etc. It doesn't actually store the body of the post. Instead, it is linked to a "linksets" table, which stores cohesive sets of links. These linksets have a "Type" which a number standing for Rapidshare, MegaUpload, etc., they have an author, and they have a "status" (which is normally 1 for good-working links, but the linkchecker can switch it to 2 for dead, and we also have 3 for "submitted" but not yet approved).

In addition to linksets, their is a table for series, seasons, and episodes, which help store all the information to make up a proper TV-Series post.

All posts are created when you view them. This has it's up sides and down sides. On the up side, it means that as soon as a change occurs (such as adding a linkset, or adding/deleting an episode) everyone sees that change immediately. The downside is that even one little portion of bad code can slow down or even crash the whole site, because it is run hundreds if not thousands of times a minute. Eventually I hope to create a "cache" system that builds out the post anytime a change is made, and then that is used to display the post to the user.


Some things that I would change about this architecture...

There's no need for the "series" table. It contains redundant information to the posts table, and needs to be eliminated eventually. Not sure if I will ever do this though, as it would be tons of work, with very little speed improvement.

There's no need for the "links" table that I use. Every line in a linkset is stored in a separate row in a links table. The original thought was to be able to store the status of a single link as being dead and show that every time someone views a post. But after trying that, it was just overkill. Besides, if one link in a linkset is dead, it's normally safe to assume that people can't properly use the content at all.

I originally connected a linkset to a post (if it was a regular movie, etc.), or an episode (if it belonged to a tv-series). This was a nightmare, because in order to get all the linksets for the main page, I had to do queries that involved the posts, series, seasons, episodes, and linksets table. While that may be a good normalized setup, it makes a lot more sense for optimization to just have all the linksets directly connected to a post, and then just get the rest of the tv-series information when you have to. This is one of the biggest changes I made in the past week.




Those are really the only things I see wrong with the current database design. Though I see TONS wrong with the code that builds out the posts... the php is very redundant and extremely inefficient. To the point where it always amazes me how fast pages can still load most of the time. I hope to fix this eventually, and make everything object orientated in the process (it's currently procedural), but it's such a big project that I want to make sure all other important features are finished first.


Well those were just some thoughts I wanted to share... it's probably not well written at all, but I hope those of you that were interested actually learned something about how the site works.


~SinlessLinks

No comments: