View Single Post
  #4  
Zero Signal Zero Signal is offline
Senior Member
 
Join Date: Feb 2003
Location: /dev/null
Zero Signal is probably a spambot
Old Jul 13th, 2003, 09:26 PM       
This is really interesting, too.

http://www.google-watch.org/broken.html



Quote:
Let's speculate. Most of Google's core software was written in 1998-2000. It was written in C and C++ to run under Linux. As of July 2000, Google was claiming one billion web pages indexed. By November 2002, they were claiming 3 billion. At this rate of increase, they would now be at 3.5 billion, even though the count hasn't changed on their home page since November. If you search for the word "the" you get a count of 3.76 billion. It's unclear what role other languages would have, if any, in producing this count. Perhaps each language has it's own lexicon and it's own web page IDs. But any way you cut it, we're approaching 4 billion very soon, at least for English. With some numbers presumably set aside for the freshbot, it would appear that they are running out available web page IDs.

If you use an ID number to identify each new page on the web, there is a problem once you get to 4.2 billion. Numbers higher than that require more processing power and different coding. Our speculation makes three major assumptions: a) Google uses standard functions for the C language in their core programming; b) when Google's programs were first developed four or more years ago, a unique ID was required for every web page; and c) it seemed reasonable and efficient at that time to use an unsigned long integer in ANSI C. In Linux, this variable is four bytes long, and has a maximum of 4.2 billion before it rolls over to zero. The next step up in numeric variables under Linux requires different standard functions in ANSI C, and more CPU cycles for processing. When the core programs were developed for Google several years ago, it's reasonable to assume that the 4.2 billion upper limit was not seen as a potential problem.
__________________
I-Mockery Forums: Turn-based stupidity in a real-time world
Reply With Quote