November 23, 2009

Rainbow Google and Annoying Google

I launched two more Google parodies: Rainbow Google and Annoying Google!

Rainbow Google is just a pretty demonstration of dynamic stylesheet modification. It was actually extremely hard to code — it took me about 8 hours of JavaScript hell. On another day, I’ll go over how I did it. This site adds a colorful spray of colors to the text on the page (see screenshot).

rainbowgoogle.com

RainbowGoogle.com

Annoying Google was about 10 minutes of work since it was just a super simple version of Rainbow Google’s code. :) Search queries and results are jumbled so that their letters are in random capitalized states… LiKe ThiS.

annoyinggoogle.com

AnnoyingGoogle.com

If you have suggestions or ideas, please let me know!

Filed under: Geeky Stuff, Javascript — Michi @ 10:03 pm

July 23, 2008

The Basics on Using Models and Controllers in PHP

Today I want to talk about passing in objects as arguments in PHP methods. Many PHP developers do not have this patience. This is obvious when studying libraries written for Java versus those in PHP. It is a horridly underused programming style in PHP, and since PHP supports argument type prototyping in methods, I thought it would be good to go over this particular style of development.

First of all, let’s start with a few “PHP” way of doing a mundane task (this will seem extremely familiar):

function login($username, $password, $remember);

function processMail($to, $from, $subject, $body);

function editPost($id, $title, $body, $newTime);

Of course, good developers would make sure these types of examples are part of a class:

$user = new User;
$user->username = $_POST["username"];
$user->password = $_POST['password'];
$user->login();

The examples can continue, and I hope that you good developers use this type of basic OOP development when doing using PHP. :) But there are examples where the standard “newbie” OOP model seemingly falls apart. The system quickly breaks down when new requirements are added to the system. For example, what if we want to do a “remember me” option in the login? What about logging in as an administrator versus a regular user? Okay, now what if we have different login session lengths depending on the user type? How long do you think that login function will be? Think about how it will accommodate IP bans, login logging, banned users, suspended users, max failed attempt lockouts, etc. The list goes on, and depending on your implementation, things can get very ugly. Your login function might become a huge bloated monster sitting in your User class.

The problem is that what you’re actually doing is mixing a data model [user data] with controller logic [how to login using the user data]. The solution is to separate these two entities into two classes, which is what you would see in most modern MVC frameworks.

Here is the seemingly more complex, but far more elegant solution:

$authenticationController = new UserAuthenticationController;
$user = new User;
$user->username = $_POST['username'];
$user->password = $_POST['password'];
$user->rememberMe = true;
$authenticationController->login($user);

The prototype for the login method would look like this:

function login(User $userObject);

In this example, I am hoping to show you possibilities. First, notice that the arguments for login() are down to one. But the more interesting part of the implementation comes with the proper abstraction between the User data and the authentication process. In my example, I Just logged in a regular user. So if I had to map out my class structure, it might look like this:

(Abstract class) AuthenticationController
=> GuestAuthenticationController
  => UserAuthenticationController
    => AdminAuthenticationController

The old way of doing things would look something like these:

function adminLogin($username, $password, $remember);

$user->adminLogin();

But using the Java-esk model, we’d end up with something like this:

$authenticationController = new AdminAuthenticationController;
$user = new User;
$user->username = $_POST['username'];
$user->password = $_POST['password'];
$user->rememberMe = true;
$authenticationController->login($user);

This means the login method is likely broken up in a few pieces inside the AuthenticationController. The Guest user’s login() method would always return false. The UserAuthenticationController would piggy back on AuthenticationController::login() by looking at the User::rememberMe variable and take it into account. But the AdminAuthenticationController doesn’t allow people’s logins to be remembered due to security reasons, so it doesn’t take that variable into account. And in that crazy case where there is nothing different about the admin login, the method would remain untouched (inherited from the parent), but any other changes (such as session length) would still kick in for the admin user with no additional coding.

All of this is done without modifying the core User class. The user class remains clean for its own further abstraction possibilities. New fields such as profile, name, DOB, etc., could be added with no modifications to the controller.

Yes, my version requires the most lines of code, but it is also the easiest to maintain and understand. Why? Because it isn’t cluttering up the User data class with methods that have nothing to do with the user. If you’ve ever written a generic “user” class, you know how large and cluttered such a class can become when you start piling in the methods for login, logout, preferences, session management, lookups, and other needs. I haven’t even talked about the fact that virtually all “operations” that involve a user also involve the database, which adds its own headaches. Being able to keep the hard work in other more data-manipulation-oriented classes is for your own good.

If down the road, it is determined that logging in should also require pre-approved IP addresses, what will your code need? Will your login method need an IP address passed in too? Or will the IP address be generated inside the login method? In my example, I would update the core Authentication class and be done. What happens when a new requirement is added that requires that the login also passes a CAPTCHA test? What about when logins need to be logged in a flat file? What happens when we change logins to use the email field instead of the user field?

Today, I only talked about logins. The ideas I propose here are not new; they are simply good ideas that get ignored by web developers. Remember, you’re application developers that happen to work in a browser. Don’t think that regular application design principles don’t apply to you: they do, more than ever.

Filed under: PHP — Michi @ 3:19 am

May 31, 2008

Getting Around Overwriting form.submit()

Since my dear reader Sameer requested it, I’m here making an update. I’ve got a cool JavaScript fix for everybody! I mentioned in a post a long time ago, but JavaScript has this semi-unexpected “feature” where you can accidentally overwrite the submit() function from a form. As in:

<form id=”myform”>
<input name=”submit” value=”submit me” type=”submit” />
</form>
<script>
document.getElementById(‘myform’).submit(); // THIS FAILS – Object not a method
</script>

Apparently, by creating a form element called “submit” you overwrite the native function that exists in every form element in JavaScript. Because it’s native, it also means you can’t just willy-nilly redefine it. And to make things worse, you cannot (at least not in a cross browser manner), successfully re-assign the submit() method because some browsers will disregard any attempt to reassign its value. As in:

<script>
document.getElementById(‘myform’).submit = ‘This gets ignored’;
</script>

Fortunately, there is a fix. This fix requires modifying the actual DOM. Because this tends to be inconsistent across browsers, I’m doing this fix in MooTools (which is my JS library of choice). However, the fix is fairly straight forward and can easily be done with (or without) any JS framework, as you will see. The steps are:

  1. REMOVE the form element in question. This is an absolute requirement to make the solution cross browser compatible. This can be skipped, but it will cause quirks. However, the good news is that we can assume that 99.99% of all form elements named “submit” are due to designers being ignorant — thus, such cases are exclusive to submit buttons. Luckily, these are almost NEVER needed in the server side code and really just act as wall flowers.
  2. Check if step #1 completed successfully
  3. If it did not, create a new Form element and copy its submit function over
  4. Submit

The code looks like this:

<script>
var formObject = document.getElementById(‘myform’);
// Removes the node
formObject.submit.remove();
// Functions don’t have tagName defined

if(‘undefined’ == (typeof formObject.submit.tagName)) {
    // create a form and assign its submit function
    formObject.submit = new Element(‘form’).submit;
}
formObject.submit()
</script>

Let me know if you encounter any problems.

Filed under: Javascript — Michi @ 1:54 am

February 17, 2008

Neat Idea: Creating Alphanumeric IDs

UPDATE: For those of you looking for a great way to generate highly unique ID that is shorter than what you might get using a hex number, try this (it will generate a ~17 character ID):

list($hex, $dec) = explode(‘.’, uniqid(null, true));
$id = (base_convert($hex, 16, 36) . base_convert($dec, 10, 36));

Ever needed to create an ID has that looked something like f39a2xm91? You might not have, but some day you’ll want to. The easy way out is to use the native md5() function, but that creates a long 32 character hash which may be a total waste of (database) space. These types of IDs are often used to mask integer IDs so that your users can’t just type in user_id=10000, user_id=10002, user_id=10003, and so forth to look at your records. Some might even call it security through obscurity. Well, let’s be clear: this sort of activity does not add security, but it does make for making “browsing” behavior more difficult.

Either way, if you desire to move away from the classic integer format IDs, I have a different solution for you:

base_convert($someId, 10, 36);

This will convert the number 10,001 into 7pt, 10,002 into 7pu, and 100,000,000 into 1njchs. As you can see, you can store a heck of a lot of numeric information in a very tiny amount of (character) space. I am not saying this will save you database storage space, but it will make your URLs shorter.

One of the main benefits is this is that you can store more data in a smaller human-readable space, thereby allowing you to create smaller unique IDs. So for example, in our logging system at work, I use this method to generate issue IDs that end-users can send to us when they have a problem. This issue ID is based on the last eight digits of current time with microseconds concatenated with (using “.”) a random seven digit number in front. I then base-36 encode this resulting number (stripping out the decimal point).

Note: my solution is specific to the problem I was facing. It’s not necessarily a full proof way to generate unique values, but it’s what you would call “good enough”. Do not use the solution unless your solution does not hinge on absolute unique values.

A warning about base_convert is that large numbers breaks down in PHP, so be careful (we’re talking very large numbers). This means pasting together the current timestamp, the user ID, the session ID, and a fourteen digit number into one 50 character long number will probably result in some precision errors (not a huge deal for most implementations, but be warned). From the PHP manual:

This is related to the fact that it is impossible to exactly express some fractions in decimal notation with a finite number of digits. For instance, 1/3 in decimal form becomes 0.3333333. . ..

 

So never trust floating number results to the last digit and never compare floating point numbers for equality.

The “base” refers to the numbering system used to convert the number. In a base 11 system, the counting goes to 10, then the letter A, and then loops around back to 1. So in other words, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, A, 1, 2… In base a base 16 system, you would go all the way to F before going back. So the larger the base, the more “compressed” a number can become.

I used base a 36 scheme (0 – 9, A – Z), but you can use smaller bases to come up with longer conversions. For example, a base 21 conversion (0 – 9, A – K) will convert 10,001 into 11e5, 10,002 into 11e6 and 100,000,000 into 13a3k7g. So in short, if you have a database where your record IDs start at above 7 or 8 digits, maybe you can think about base encoding them into shorter IDs.

Just a neat idea I wanted to share.

Filed under: PHP — Michi @ 4:41 pm

January 28, 2008

Debugging Tips for Database Abstraction

Today I want to talk about database script debugging in large systems. The main problem is that in large applications, it becomes difficult to find the source of rogue queries that, for example, broke in a recent system update.This may not readily apply to most of you, but bear with me: some day it will.

Pretend for a moment you have a database architecture where you have 2 masters (dual replication) and 2 read-only slaves. Now pretend that you have a large application with 100 different pages/scripts. You have 5 web servers with mirror copies of the application. This would be a fairly typical setup for a small, but growing company.

One day, you come into work and find out that you had a bad transaction lock that caused your system to hang all weekend. So you look at the process list and you know what query is causing the problem (because it’s still stuck). The problem is that it looks suspiciously like the queries you’d find on virtually every page in your application. How do you fix this problem? An different (but related) problem is when an update initially executed on one master database server replicated to a slave and got stuck on the slave but executed fine elsewhere. What happened? Which master server got the initial query? This sort of debugging is very difficult to track down without more information such as where the query was initially sent and from what page it originated.

The primary challenge is figuring out which query came from what page in your application. The solution is to add logging straight into your queries. The implemented looks something like this:

//Get the current page or script file
$source = $_SERVER['REQUEST_URI'] ? $_SERVER['REQUEST_URI'] : $_SERVER['SCRIPT_FILENAME'];
//Replace out any comment tags and add in the database being connected to
$metaData = str_replace(array('/*', '*/'), array('/ *', '* /'), $source) . " ($databaseHost)");
//Escape the query so the URI can't be used to inject data
$metaData = mysql_real_escape_string($metaData);
//Execute the query
$result = mysql_query("/* $metaData */ " . $query, $connection);

This solution inserts a comment into your query that gives you useful information that can be seen when looking at the raw query. MySQL uses C++ style comment blocks (the /* */) which are ignored by the parsing engine. This means you can pass data to the engine which can be useful for debugging. These comments are also replicated down to the slaves, which can be useful when you find a slave having problems with a query that came from a master server. For those of you unaware, the “URI” refers to the full URL that was typed in the address bar to access a page.

But make sure that you correctly sanitize the URI so that somebody can’t arbitrarily end your comment block (with a */) and inject their own nonsense into your query. Also, considering issues like multi-byte character attacks, I don’t even want to take the risk of not further escaping the data with a call to mysql_real_escape_string.

The solution we use at my work logs the web server IP, database server IP, and script path/URI. Other potential ideas are local timestamps, version information, user IDs, and session IDs.

In conclusion, this solution will help you identify the source (and sometimes the destination) of queries that are causing problems. This has been used in our production environment at work often when trying to determine what pages are producing extremely slow queries. This solution should work with any database, although my example is written for MySQL.

Happy debugging!

Filed under: MySQL, PHP — Michi @ 1:24 pm

January 23, 2008

PHP Best Practice: Don’t use INC extensions

I have been bad about updating, and this goes back to an old habit that probably has to do with human nature: as time between updates increases, there’s a desire to write a “big” update, which is increasingly difficult as news-worthy events happen and are ignored. There’s so many things for me to update about that I could touch on, such as the iPhone SDK update, news on IE8 passing the ACID2 test, my predictions from a year ago that were spot on (until about a week ago when all stocks tanked), and Sun buying MySQL. But I wont. Perhaps next time. So this post is small, but serves as a feeler post to help me get back into the routine. The truth is that I have several programming post drafts setting on my machine that could have been posted a long time ago if I had given them a final read-over. Those things take a lot longer than they look from the casual observer.

Today’s post is a best practices post. The tip is simple: When creating a naming convention, never rely on the .inc extension. The .inc is used in some shops to denote files that serve as libraries. This is a terrible practice for a number of reasons.

First, it means deploying your library ANYWHERE requires adding the extension to your server’s configurations so that it knows these files are for PHP executables. This isn’t a deal breaker in most cases, but beware that if you use shared hosting environments, this sort of thing can be annoying and stall development.

The second far more practical reason is for security. When these library files are moved to a new server which has yet to be configured, they are wide open for public viewing. Because the server doesn’t know they are PHP files, they are served up as text files, essentially exposing your code base for the world to see. I’ve seen this issue pop up in production environments where a new web server was brought online without being fully configured, causing pages to become exposed. This is the sort of business that helps cause source code leaks (remember the Facebook code leak late last year?).

Of course, this points to the greater issue that library files shouldn’t be web accessible, but I have also seen this paradigm used in common CMS applications where you have a .php file include a .inc file that contains the bulk of the page logic. Here, again, you would be exposing highly sensitive application logic to the world.

If you really want to denote files differently, I prefer to use file prefixes. As in, classes might get a prefix like “class.[rest-of-filename]“. Or perhaps “function.[rest-of-filename]“. There’s even “include.[rest-of-filename]“. The point is, a prefix can’t kill you because the files retain the .php extension. :) Happy coding!

Filed under: PHP — Michi @ 1:38 pm

December 10, 2007

BUG: Constructors, Interfaces, and Abstracts Don’t Mix Well

I just discovered a bug today in PHP 5.1 (haven’t confirmed if it was fixed in newer versions). When trying to enforce interface arguments on constructors, PHP behaves unexpectedly. Normally, interfaces allow you to enforce argument counts or types in child class methods, but not with the constructor (and probably destructor).

Crash course on interfaces: An interface lets you as a developer dictate a standard for a class. For example, you might write an interface class for interacting with your class. Then other people who want to interact with your class would “implement” your interface class. This would force their classes to have a certain set of methods, of which you dictate their names and argument counts (and types). This way, your class is always guaranteed these implementer classes have certain key methods. In the real life example, it’s like saying an interface for a Car would have methods like brake($amount), gas($amount), steer($direction), etc, and the User class would be able to have a guaranteed way of interacting with the Car object (i.e., $user->getCar(‘Ferrari’)->steer(‘left’)). Abstract methods exist in abstract classes and are essentially the same thing. Read more about these here and here.

First, here is an example of a typical interface:

class ExampleClass {}

interface TestInterface {
	public function output(ExampleClass $var);
}

class Test implements TestInterface {
	// error, no output() method was defined
}

The following fails too:

class ExampleClass {}

interface TestInterface {
	public function output(ExampleClass $var);
}

class Test implements TestInterface {
	public function output($var) {} // error, wrong argument type
}

Here is the same example but with the __construct method instead:

class ExampleClass {}

interface TestInterface {
	public function __construct(ExampleClass $var);
}

class Test implements TestInterface {
	// error, no __construct() method was defined
}

Up to here, it works as expected. However, if you define the constructor, the __construct method argument datatype/count checks go out the window:

class ExampleClass {}

interface TestInterface {
	public function __construct(ExampleClass $var);
}

class Test implements TestInterface {
	public function __construct() {} // NO ERROR
}

Despite the data types and argument count being off, PHP doesn’t care. Even if I define an argument in the constructor, the datatype check is ignored. So the best you can do is force a __construct() definition to be required, but you can’t dictate its arguments (i.e., interfaces for constructor methods are useless). And finally, for those of you really astute readers:

class ExampleClass {}

abstract class AbstractTest {
	abstract public function __construct(ExampleClass $var);
}

class Test extends AbstractTest {
	public function __construct() {} // NO ERROR
}

This problem produces the SAME results if instead of an interface, abstract methods in an abstract parent class are used.

Filed under: PHP — Michi @ 4:04 pm

November 9, 2007

Improving Your JavaScript Load Time

On our production website at work, I noticed that there was considerable lag time when loading the page due to a high number of JavaScript files. For those of you who don’t know, when a JavaScript file is loaded into a page, the rest of the page will hang until that file is completely downloaded. So unlike an image on a page, a slow JavaScript file can completely bog down your page. This is similar to an issue I noticed many months ago with FeedBurner. Each JavaScript file that is pulled requires the full overhead of firing up Apache and serving an HTTP request. This can be slower for you and painful for the web server if you have a high traffic website.

Additionally, once the JavaScript files load, if the code is full of asynchronous snippets (AJAX, event handlers, etc), the pieces can load in the wrong order! This has caused me headaches when unexplainable and random JavaScript errors began popping up (undefined variables and functions that are clearly defined in another file that should have loaded prior). This issue became increasingly common as the number of files being loaded increased. While I admit I don’t understand browser physiology enough to explain why this problem is more common with more files, I concluded there is some correlation that likely is attributed to the rendering order.

So after some thought, I came up with a solution. The goal was simple: decrease the number of web calls and try to make the JavaScript code render in 100% reliable and linear matter. Additionally, the hack would need to be easy to implement and take issues such as caching into account. The solution is a PHP file that looks like this:

/*
 * This file compiles a collection of JS files and then dumps them collectively
 * to the page, thereby reducing overall request overhead
 * Copyright 2007 Michi Kono (www.michikono.com)
 * Feel free to modify this however you want.
 */
header("content-type: application/x-javascript");
foreach(explode(",", $_GET["files"]) as $filename) {
   /*
    * prevent malicious attacks, only allow JS files
    */
   $filename = basename(trim($filename), ".js") . ".js";
   if($filename && file_exists($filename)) {
       $handle = fopen($filename, "r");
       fpassthru($handle);
       /*
        * in case there is no trailing ;
        */
       echo ";";
   }
}

Put this file in your JavaScript folder next to the rest of your JavaScript files. I called mine render.php.

Then, you put this in your HTML:

<script type="text/javascript" src="/javascript/render.php?files=firstfile.js, secondfile.js, thirdfile.js, etc.js"></script>

Ta-da! Faster JavaScript loading for everybody. :) Oh, and the issue of caching? Just put a timestamp on the end of the JavaScript URL string (like, "&<?php echo substr(time(), 0, -2) ?>" — cache changes every 100 seconds)

Other useful ideas: JS code could be compressed during this step, with comments and extra spaces being removed. Because code is being run through PHP, server side macros are now possible, rather than relying on cryptic JavaScript functions (such as for date management or database integration).

Filed under: Javascript, PHP — Michi @ 11:50 am

August 7, 2007

The Secret of SQL_CALC_FOUND_ROWS

Today, I wanted to go over a relatively simple MySQL feature that a lot of people don’t understand: SQL_CALC_FOUND_ROWS. To use this mystical key word, simply put it in your query right after the SELECT statement. For example:

SELECT * FROM USER WHERE id > 10 LIMIT 2,1 –see just second record

Becomes

SELECT SQL_CALC_FOUND_ROWS * FROM USER WHERE id > 10 LIMIT 2,1

This won’t change your results. It may, however, make your query run slower than when you select just one row the regular way. What this statement does is tell MySQL to find out just how many total records exist that match your criteria (in this case, where id is bigger than 10). For example, let’s assume that the user table has 100 records that have an id bigger than 10, then the query will take as long as it would have taken for the engine to find those 100 records.

The returned result will still be one the records you are expecting (in this case, the second record it found). But here is where the magic starts: If the very next query you run is a special select statement, you will have access to the total that was found. As in:

SELECT FOUND_ROWS(); –returns 100

The MySQL documentation on this subject says:

[SELECT FOUND_ROWS()] returns a number indicating how many rows the first SELECT would have returned had it been written without the LIMIT clause. In the absence of the SQL_CALC_FOUND_ROWS option in the most recent SELECT statement, FOUND_ROWS() returns the number of rows in the result set returned by that statement.

No matter what your LIMIT clause looks like (such as LIMIT 10, 1), this second query will still return the same number (in this example, 100). Why is this useful? Pagination. Often times, beginners (including me a few years ago) are stuck doing something like this:

SELECT count(*) FROM USER WHERE id > 10 –figure out how many total records there are
SELECT * FROM USER WHERE id > 10 LIMIT 50, 1 –get to record #50

People do this because you need the total to know if other matching results exist or what the last page number is.

This requires the engine to run the same query twice. This can be disastrous in cases where that query already takes a very long time to run. By including SQL_CALC_FOUND_ROWS, the overhead of running that count is grouped up with the process of actually retrieving the row of interest. So while the initial query might take a little longer to run than if you hadn’t tried to do a count, it is definitely faster than running the same query twice.

To take this to the next level, your pagination code should omit the use of SQL_CALC_FOUND_ROWS in subsequent page loads by caching the total count in the URL or session.

Happy hunting!

Filed under: MySQL — Michi @ 1:38 pm

August 3, 2007

A Great Web Developer is a Great Application Developer

After being a part of the developer hiring process for a while now, I have begun to see what distinguishes an exceptional web developer (the ones that get hired faster than we can even make an offer) and everybody else. Things have changed since 1995 – web development requires actual knowledge of programming!

The problem is that people think web development is somehow easier than other types of development. Nothing is inherently easier about developing an application in PHP than in C++. While I could accuse people who say this as being naive, the truth is that web developers themselves seem to hold this as the truth. In order to excel at being a web developer, one must understand how to develop an application. In other words, there are very different skills required in programming something like Facebook or Digg than for updating Mom’s store website.

First, you must actually understand object oriented programming (OOP). Everybody these days says they “know” OOP. I don’t mean “you read about it in a book” or that you can use a class when necessary; I mean being able to write extensible, modular libraries. This takes time and practice. When people first begin writing classes, they mistake the process for “grouping up functions under one class name in one file.” This is dead wrong and actually counter productive in some cases.

Classes are about automation and simplification. The classic analogy for OOP is a car because it has many parts that make it a whole. But just like the driver has no idea what a crank shaft, piston, or fuel injector does, people using your class should have no idea (nor need to learn why) about the inner workings of your class. All they need to know is that calling one of the few public methods (turn_ignition(), step_on_pedal(), steer()) does a whole lot of cool stuff behind the scenes that makes their life easier. And just like how you wouldn’t want the entire car engine welded together (instead of using bolts and screws), you shouldn’t write humongous class methods that do 1000 things — break it up into digestible parts so that other people can enhance just the pieces they want (see “overloading” below).

When you design a class, you start by listing out things that the class will be used for. Then, you figure out the lowest common denominators between these actions to remove “special, one time cases.” You now have a list of your primary public functions. Everything else in your class should pretty much remain protected or private (discretion is learned with experience). The entire point is that classes are not a group of similar functions, but rather a single conceptual unit.

And when you’re all done with understanding why object oriented programming exists, you must then learn to use Inheritance and Overloading.

I would link to a Wikipedia article on overloading, but the topic is poorly explained. [EDIT: This section applies more to PHP] In short, overloading (also known as polymorphism) is the practice of “overwriting” a function of a parent class in a child class. Let’s look at a metaphor where Inheritance and Overloading are used:

A Mammal would be a parent level concept (class). All Mammals sleep() and eat(). A Cow is a Mammal, so it inherits (extends) the Mammal concept (class) and has all attributes of a Mammal, including sleep() and eat(). But its eat() function is different because eat() does what all other mammals do, but also calls regurgitate(). A Bat is a Mammal, but its sleep() requires hanging upside down in addition to doing whatever all other Mammals do (such as recovering health). Overloading allows us to change a behavior (function) to suite the child’s specific behavior that is different. It is easiest to think of it as “overwriting.”

Second, you must understand and demonstrate competency with design patterns. At the very least, you should be aware of the Singleton and Factory patterns, which are very commonly used, and for (arguably) good reasons. Knowledge of basic design patterns goes a long way because a time will come when you may need to borrow from one of these concepts when developing your web application. It also means you have a wider exposure to the different types of architectures available to someone writing a library or application.

Last, you must understand scaling issues in growing web applications and how to resolve them. This is probably the rarest trait, but if you have it, you will shine like a rock star. It is important to realize that certain operations are much more costly than others. In today’s world, the CPU is rarely the bottleneck, but there are still bottlenecks. A common instance of this problem is when Word Press blogs go offline because it is slammed by a spike in traffic. That said, the most common type of bottleneck involves the database and often manifests in one of three ways:

  • Each query sent to the database involves some small amount of lag between the web server and the database server. Hundreds of queries on a page will slow down your page loads. On high traffic pages, one less query means thousands of queries a second. Make sure you learn how to do JOIN statements.
  • Each query must use an index in the database. Without this index, a query looking at a large table will bottleneck the hard drive (because it has to search large portions of the disk). Learn how to use the MySQL command: explain.
  • Database tables using MyISAM (instead of InnoDB) risk facing table locking issues. When you run a big or complex update on MyISAM, no other updates can occur on that table until the first one is done. If that table is very large, this may mean someone’s page sits there and loads for a long time (or times out). Typically, one should use a transactional database such as InnoDB.

Other types of common scaling issues are:

  • Session management on the database instead of letting PHP handle it. This is an issue because any large web application has more than one web server which means when a person hits a new page they might be loading from a different web server each time.
  • Managing files generated on the server – such as when a user uploads a file. The issue here is that if a file is created on one server (in a cluster of 100), how do the other servers know it exists? Obviously the solution is to somehow centralize this with a synchronization process or store all of the files on the database using blob fields. :)
  • How queries can start acting different (much slower) when your tables reach 1M+ rows without proper precautions. Larger tables change things because indexes sometimes stop getting used or slowness that wasn’t apparent before is now very noticeable. (A book could be written here…)

Merely reading and understanding these issues has put you ahead of a bunch of people out there. :)

In conclusion, you must re-think the “web developer” as an application developer that happens to work in a browser. I often hear the excuse that these types of advanced concepts can’t be readily applied in small-time applications (such as a simple store-front), but I believe this is simply not true. Virtually every site requires a database connection or a template system (for easy inclusion of header/footer files). Database abstraction and template systems are probably the single best place to use OOP concepts. It simply makes no sense that one would use the raw mysql_query() function in any website, even if it’s just Mom’s baby pictures. Putting these ideas in classes reduces security vulnerabilities and makes development much faster. If the notion of abstracting templates or databases is totally foreign to you, I suggest you start by looking at Smarty or EzSQL.

Happy studying. :)

Filed under: Jobs, PHP — Michi @ 12:36 pm
Next Page »