May 31, 2008

Getting Around Overwriting form.submit()

Since my dear reader Sameer requested it, I’m here making an update. I’ve got a cool JavaScript fix for everybody! I mentioned in a post a long time ago, but JavaScript has this semi-unexpected “feature” where you can accidentally overwrite the submit() function from a form. As in:

<form id=”myform”>
<input name=”submit” value=”submit me” type=”submit” />
</form>
<script>
document.getElementById(’myform’).submit(); // THIS FAILS - Object not a method
</script>

Apparently, by creating a form element called “submit” you overwrite the native function that exists in every form element in JavaScript. Because it’s native, it also means you can’t just willy-nilly redefine it. And to make things worse, you cannot (at least not in a cross browser manner), successfully re-assign the submit() method because some browsers will disregard any attempt to reassign its value. As in:

<script>
document.getElementById(’myform’).submit = ‘This gets ignored’;
</script>

Fortunately, there is a fix. This fix requires modifying the actual DOM. Because this tends to be inconsistent across browsers, I’m doing this fix in MooTools (which is my JS library of choice). However, the fix is fairly straight forward and can easily be done with (or without) any JS framework, as you will see. The steps are:

  1. REMOVE the form element in question. This is an absolute requirement to make the solution cross browser compatible. This can be skipped, but it will cause quirks. However, the good news is that we can assume that 99.99% of all form elements named “submit” are due to designers being ignorant — thus, such cases are exclusive to submit buttons. Luckily, these are almost NEVER needed in the server side code and really just act as wall flowers.
  2. Check if step #1 completed successfully
  3. If it did not, create a new Form element and copy its submit function over
  4. Submit

The code looks like this:

<script>
var formObject = document.getElementById(’myform’);
// Removes the node
formObject.submit.remove();
// Functions don’t have tagName defined

if(’undefined’ == (typeof formObject.submit.tagName)) {
    // create a form and assign its submit function
    formObject.submit = new Element(’form’).submit;
}
formObject.submit()
</script>

Let me know if you encounter any problems.

Filed under: Javascript — Michi @ 1:54 am

February 17, 2008

Neat Idea: Creating Alphanumeric IDs

UPDATE: For those of you looking for a great way to generate highly unique ID that is shorter than what you might get using a hex number, try this (it will generate a ~17 character ID):

list($hex, $dec) = explode(’.', uniqid(null, true));
$id = (base_convert($hex, 16, 36) . base_convert($dec, 10, 36));

Ever needed to create an ID has that looked something like f39a2xm91? You might not have, but some day you’ll want to. The easy way out is to use the native md5() function, but that creates a long 32 character hash which may be a total waste of (database) space. These types of IDs are often used to mask integer IDs so that your users can’t just type in user_id=10000, user_id=10002, user_id=10003, and so forth to look at your records. Some might even call it security through obscurity. Well, let’s be clear: this sort of activity does not add security, but it does make for making “browsing” behavior more difficult.

Either way, if you desire to move away from the classic integer format IDs, I have a different solution for you:

base_convert($someId, 10, 36);

This will convert the number 10,001 into 7pt, 10,002 into 7pu, and 100,000,000 into 1njchs. As you can see, you can store a heck of a lot of numeric information in a very tiny amount of (character) space. I am not saying this will save you database storage space, but it will make your URLs shorter.

One of the main benefits is this is that you can store more data in a smaller human-readable space, thereby allowing you to create smaller unique IDs. So for example, in our logging system at work, I use this method to generate issue IDs that end-users can send to us when they have a problem. This issue ID is based on the last eight digits of current time with microseconds concatenated with (using “.”) a random seven digit number in front. I then base-36 encode this resulting number (stripping out the decimal point).

Note: my solution is specific to the problem I was facing. It’s not necessarily a full proof way to generate unique values, but it’s what you would call “good enough”. Do not use the solution unless your solution does not hinge on absolute unique values.

A warning about base_convert is that large numbers breaks down in PHP, so be careful (we’re talking very large numbers). This means pasting together the current timestamp, the user ID, the session ID, and a fourteen digit number into one 50 character long number will probably result in some precision errors (not a huge deal for most implementations, but be warned). From the PHP manual:

This is related to the fact that it is impossible to exactly express some fractions in decimal notation with a finite number of digits. For instance, 1/3 in decimal form becomes 0.3333333. . ..

 

So never trust floating number results to the last digit and never compare floating point numbers for equality.

The “base” refers to the numbering system used to convert the number. In a base 11 system, the counting goes to 10, then the letter A, and then loops around back to 1. So in other words, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, A, 1, 2… In base a base 16 system, you would go all the way to F before going back. So the larger the base, the more “compressed” a number can become.

I used base a 36 scheme (0 - 9, A - Z), but you can use smaller bases to come up with longer conversions. For example, a base 21 conversion (0 - 9, A - K) will convert 10,001 into 11e5, 10,002 into 11e6 and 100,000,000 into 13a3k7g. So in short, if you have a database where your record IDs start at above 7 or 8 digits, maybe you can think about base encoding them into shorter IDs.

Just a neat idea I wanted to share.

Filed under: PHP — Michi @ 4:41 pm

February 3, 2008

My Thoughts on Microsoft Buying Yahoo for $44.6B

The big news of Friday morning was that Microsoft offered Yahoo $44.6 billion for the company. On a financial level, this is a sweet deal for Yahoo. It’s not the most financially sound investment Microsoft has offered, which is why their stocks dipped 6% on Friday. No reply has been made from Yahoo, but I can definitely see them taking this offer seriously. My thoughts are summed up in three bullet points:

  • Yahoo’s management will possibly accept the offer since it is so lucrative.
  • The purchase will piss off some of Yahoo’s top talent and cause them to defect, possibly probably to Google.
  • The purchase will help Google gain a greater lead during one of the most crucial eras since the Internet began: the rise of mobile computing.

The internal culture of Yahoo is not exactly friendly to Microsoft. Yahoo is seen as an ally to the open source community while Microsoft is exactly the opposite. Yahoo is a major contributor to open source (ex. PHP’s lead developer is on Yahoo’s payroll), has an open philosophy which has shown itself in their JS frameworks, Flickr, Pipes, and various other projects, and is a major user/contributor to the open source stack in general. Microsoft is clearly not on the same page.

I’ve read speculation that the looming recession will cause developers to stick around despite a take over from a boss they don’t like. However, my belief is that great developers aren’t scared to leave since they are in high demand no matter what is going on in the economy. Some of the very best and brightest at Yahoo will leave. Any sort of exodus of major talent would destroy the current internal direction. Worse, some of these great minds would likely go knocking on Google’s doors, which is straight up ironic considering Microsoft’s intentions. This leaves gutted, possibly begrudging or de-motivated teams, recipes for not producing innovation.

Which leads to my final point: Microsoft’s goal is to beat Google by merging with Yahoo’s resources. It is my belief that this move could ultimately prove counterproductive. The integration process of merging departments, axing un-needed employees, changing internal processes, shifting internal priorities, introducing new management, and replacing fleeing key talent will cause major stalls over in Yahoo… At Google’s benefit. Microsoft is no stranger to mergers and acquisitions, but Yahoo would be a major, major purchase with a sizeable employee count. Microsoft will have its hands full for months.

All this is going to happen during a period I consider to be a key moment in the rise of mobile computing. A large chunk of search traffic will begin to come from mobile browsers, and the web will shift to the mobile platform. During such a crucial stage of computing, this sort of disruptive purchase may help Microsoft and Yahoo miss the bus.

So while I wouldn’t be surprised if the floundering leadership at Yahoo took the offer, I also expect this to work out as the most counterproductive and costly purchase in Microsoft’s history.

Filed under: Business — Michi @ 5:58 pm

January 28, 2008

Debugging Tips for Database Abstraction

Today I want to talk about database script debugging in large systems. The main problem is that in large applications, it becomes difficult to find the source of rogue queries that, for example, broke in a recent system update.This may not readily apply to most of you, but bear with me: some day it will.

Pretend for a moment you have a database architecture where you have 2 masters (dual replication) and 2 read-only slaves. Now pretend that you have a large application with 100 different pages/scripts. You have 5 web servers with mirror copies of the application. This would be a fairly typical setup for a small, but growing company.

One day, you come into work and find out that you had a bad transaction lock that caused your system to hang all weekend. So you look at the process list and you know what query is causing the problem (because it’s still stuck). The problem is that it looks suspiciously like the queries you’d find on virtually every page in your application. How do you fix this problem? An different (but related) problem is when an update initially executed on one master database server replicated to a slave and got stuck on the slave but executed fine elsewhere. What happened? Which master server got the initial query? This sort of debugging is very difficult to track down without more information such as where the query was initially sent and from what page it originated.

The primary challenge is figuring out which query came from what page in your application. The solution is to add logging straight into your queries. The implemented looks something like this:

//Get the current page or script file
$source = $_SERVER['REQUEST_URI'] ? $_SERVER['REQUEST_URI'] : $_SERVER['SCRIPT_FILENAME'];
//Replace out any comment tags and add in the database being connected to
$metaData = str_replace(array(’/*’, ‘*/’), array(’/ *’, ‘* /’), $source) . ” ($databaseHost)”);
//Escape the query so the URI can’t be used to inject data
$metaData = mysql_real_escape_string($metaData);
//Execute the query
$result = mysql_query(”/* $metaData */ ” . $query, $connection);

This solution inserts a comment into your query that gives you useful information that can be seen when looking at the raw query. MySQL uses C++ style comment blocks (the /* */) which are ignored by the parsing engine. This means you can pass data to the engine which can be useful for debugging. These comments are also replicated down to the slaves, which can be useful when you find a slave having problems with a query that came from a master server. For those of you unaware, the “URI” refers to the full URL that was typed in the address bar to access a page.

But make sure that you correctly sanitize the URI so that somebody can’t arbitrarily end your comment block (with a */) and inject their own nonsense into your query. Also, considering issues like multi-byte character attacks, I don’t even want to take the risk of not further escaping the data with a call to mysql_real_escape_string.

The solution we use at my work logs the web server IP, database server IP, and script path/URI. Other potential ideas are local timestamps, version information, user IDs, and session IDs.

In conclusion, this solution will help you identify the source (and sometimes the destination) of queries that are causing problems. This has been used in our production environment at work often when trying to determine what pages are producing extremely slow queries. This solution should work with any database, although my example is written for MySQL.

Happy debugging!

Filed under: MySQL, PHP — Michi @ 1:24 pm

January 26, 2008

The Wonders of Makeup (non-geeky post)

This is totally un-techy, but I came across a very interesting post about putting on makeup.

To summarize the article… Take a look at the “after” picture.

Now look at the “before” picture. I still can’t believe it’s the same person.

Being a guy, I was never conciously aware that makeup could change someone’s looks so dramatically (aside from the professional jobs on movie sets). Amazing!

Filed under: Off-beat — Michi @ 11:17 pm

January 23, 2008

PHP Best Practice: Don’t use INC extensions

I have been bad about updating, and this goes back to an old habit that probably has to do with human nature: as time between updates increases, there’s a desire to write a “big” update, which is increasingly difficult as news-worthy events happen and are ignored. There’s so many things for me to update about that I could touch on, such as the iPhone SDK update, news on IE8 passing the ACID2 test, my predictions from a year ago that were spot on (until about a week ago when all stocks tanked), and Sun buying MySQL. But I wont. Perhaps next time. So this post is small, but serves as a feeler post to help me get back into the routine. The truth is that I have several programming post drafts setting on my machine that could have been posted a long time ago if I had given them a final read-over. Those things take a lot longer than they look from the casual observer.

Today’s post is a best practices post. The tip is simple: When creating a naming convention, never rely on the .inc extension. The .inc is used in some shops to denote files that serve as libraries. This is a terrible practice for a number of reasons.

First, it means deploying your library ANYWHERE requires adding the extension to your server’s configurations so that it knows these files are for PHP executables. This isn’t a deal breaker in most cases, but beware that if you use shared hosting environments, this sort of thing can be annoying and stall development.

The second far more practical reason is for security. When these library files are moved to a new server which has yet to be configured, they are wide open for public viewing. Because the server doesn’t know they are PHP files, they are served up as text files, essentially exposing your code base for the world to see. I’ve seen this issue pop up in production environments where a new web server was brought online without being fully configured, causing pages to become exposed. This is the sort of business that helps cause source code leaks (remember the Facebook code leak late last year?).

Of course, this points to the greater issue that library files shouldn’t be web accessible, but I have also seen this paradigm used in common CMS applications where you have a .php file include a .inc file that contains the bulk of the page logic. Here, again, you would be exposing highly sensitive application logic to the world.

If you really want to denote files differently, I prefer to use file prefixes. As in, classes might get a prefix like “class.[rest-of-filename]“. Or perhaps “function.[rest-of-filename]“. There’s even “include.[rest-of-filename]“. The point is, a prefix can’t kill you because the files retain the .php extension. :) Happy coding!

Filed under: PHP — Michi @ 1:38 pm

December 10, 2007

BUG: Constructors, Interfaces, and Abstracts Don’t Mix Well

I just discovered a bug today in PHP 5.1 (haven’t confirmed if it was fixed in newer versions). When trying to enforce interface arguments on constructors, PHP behaves unexpectedly. Normally, interfaces allow you to enforce argument counts or types in child class methods, but not with the constructor (and probably destructor).

Crash course on interfaces: An interface lets you as a developer dictate a standard for a class. For example, you might write an interface class for interacting with your class. Then other people who want to interact with your class would “implement” your interface class. This would force their classes to have a certain set of methods, of which you dictate their names and argument counts (and types). This way, your class is always guaranteed these implementer classes have certain key methods. In the real life example, it’s like saying an interface for a Car would have methods like brake($amount), gas($amount), steer($direction), etc, and the User class would be able to have a guaranteed way of interacting with the Car object (i.e., $user->getCar(’Ferrari’)->steer(’left’)). Abstract methods exist in abstract classes and are essentially the same thing. Read more about these here and here.

First, here is an example of a typical interface:

class ExampleClass {}

interface TestInterface {
	public function output(ExampleClass $var);
}

class Test implements TestInterface {
	// error, no output() method was defined
}

The following fails too:

class ExampleClass {}

interface TestInterface {
	public function output(ExampleClass $var);
}

class Test implements TestInterface {
	public function output($var) {} // error, wrong argument type
}

Here is the same example but with the __construct method instead:

class ExampleClass {}

interface TestInterface {
	public function __construct(ExampleClass $var);
}

class Test implements TestInterface {
	// error, no __construct() method was defined
}

Up to here, it works as expected. However, if you define the constructor, the __construct method argument datatype/count checks go out the window:

class ExampleClass {}

interface TestInterface {
	public function __construct(ExampleClass $var);
}

class Test implements TestInterface {
	public function __construct() {} // NO ERROR
}

Despite the data types and argument count being off, PHP doesn’t care. Even if I define an argument in the constructor, the datatype check is ignored. So the best you can do is force a __construct() definition to be required, but you can’t dictate its arguments (i.e., interfaces for constructor methods are useless). And finally, for those of you really astute readers:

class ExampleClass {}

abstract class AbstractTest {
	abstract public function __construct(ExampleClass $var);
}

class Test extends AbstractTest {
	public function __construct() {} // NO ERROR
}

This problem produces the SAME results if instead of an interface, abstract methods in an abstract parent class are used.

Filed under: PHP — Michi @ 4:04 pm

December 6, 2007

Google Chart API Released

Google just released a Chart API. It lets you link to a dynamic image which can then be used to generate graphs and charts. The API is amazingly robust. It supports all sorts of charts. It lets you make an image like the one below using a simple URL (see the image URL for an example):

Why is this better than hosted solutions? For 99% of web masters out there, Google’s up-time will beat the pants off of them. There’s really little to no question about the availability of their solution. Not to mention if it’s really an issue, these images could easily be cached by your application after generating them. The biggest draw, of course, is that unlike other hosted solutions, this one doesn’t use proprietary formats (flash), doesn’t introduce security vulnerabilities (for installing some foreign server-side package), and doesn’t add CPU or memory overhead to your application.

Charts like these aren’t using Flash or JavaScript, which means they work in mobile browsers and RSS readers. Since that’s where things are going now, this is Google’s way of getting a small but important piece of the web off the proprietary Flash format. This is especially important given the recent iPhone’s arrival and its lack of Flash support.

“To control and organize the world’s information.” This project is certainly a reflection of their motto.

Filed under: News — Michi @ 11:54 pm

November 29, 2007

Google Maps Adds Terrain View

I just noticed this today. Google Maps now has a Terrain View. This mode makes the map extremely easy to read and highlights general terrain features. See the image.

Terrain mode

Filed under: News — Michi @ 10:44 am

November 26, 2007

Falling Dollar Kills Off-shoring

For anybody following economic news, the US dollar is in shambles. Initially, I thought this would bring about a depression in the IT sector and possibly a bubble collapse. Don’t get me wrong: I believe there will be a softening in our industry due to things like the crashing housing market (think of all the Adwords purchases that will evaporate). However, another unintended consequence is that off-shoring (the practice of hiring foreign programmers), which was all the rage 2 years ago, is becoming impractical.

As evidence, check out the exchange rate between the Dollar and Rupee: it’s down 15%, and the vast majority only in the past few months. We can sit here and argue if the drop will continue, but we can probably agree that the dollar isn’t going to rise back up anytime soon.

Dollar vs Rupee

This sort of exchange rate suicide means off-shoring practices are now 15% more expensive across the board compared to 24 months ago. Any company doing this has more and more reason to boot the practice, especially considering the added cost of managing remote, non-native English speaking employees. Considering things like term contracts and the threat of a further falling dollar, the practice actually becomes quite dangerous. Just think: you sign a $3M two year off-shoring agreement only to have the dollar drop 10% next year — suddenly you owe an extra $300K.

So aside from all the arguments that exist about money flowing from the housing market back into IT in the form of capital investment (rather than through ad purchases), we can also safely assume that in-house IT operations themselves will become more valauble in 2008. :)

Filed under: Politics, Predictions — Michi @ 12:09 pm
Next Page »