<%@ Language="VBScript" %> <% Dim conn,rs,sql 'create connection set conn = server.CreateObject ("ADODB.Connection") conn.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & server.MapPath ("fpdb/engglas.mdb") 'query to select active data... sql = "SELECT * FROM title, vote where title.active = 'y' and title.id = vote.id" set rs = server.CreateObject ("ADODB.Recordset") rs.Open sql, conn, 3, 3 %> <% Dim strq,ip,objRs strq="select top 3 * from fft order by id_num desc" 'response.Write(strq) Set objRs=objConn.execute(strq) %> <% Dim strq2 ip=Request.ServerVariables("REMOTE_ADDR") if Request.Form("submit")="Add Comment" Then 'set objRs=Server.CreateObject(ADODB.recordset) strq2= "exec ossp_ins_features '"&request.Form("pname")&"','4','"&request.Form("email")&"','"&request.Form("comments")&"','"&date&"' " 'response.Write(strq) objConn.execute strq2 response.Write("") response.Write("") End if %> Critical Anatomy of Google



Other Featured Articles
» Beyond Disclosure
» Click Fraud
» Critical Anatomy of Google
» Drop my Site.com
» Google and Your Privacy
» Google - A new Frankenstein in the Making
» How Google Profits from Irrelevance
» Lawsuits Against Google
» Lawsuits Against Search Engines
» Organic Results and Paid Advertising
» Organic Spam and Google
» Organic Spam and Revenues
» Paid Listing Campaign on Prescription Drugs
» Search engines and WWW – The Inseparables
» Why Google Hates Affiliate Sites
The Critical Anatomy of Google


A press release was released on September 02, 2004. And within a couple of months, the issue was addressed by Google men. Not apparent though, the exact reason that triggered the updation of Google’s database, but that too by a staggering 3,772,844,877, but what that definitely did was raised a few eyebrows, not only from the Web marketing companies but also the people from Google.

As the Press Release brought into light the number of web pages been indexed by google over a span of time, it made so many things clearer.

According to it: Between Aug 04, 2003 and August 25, 2003(Just 21 days), google added a little over 1.2 billion web pages to their index. But since then, google hasn’t added one single web page to its index (At least According to the google they haven’t).

As should be apparent from the table below:

Date

No. of  Pages indexed

2 Months after the release of the Press release. ( Nov 10, 2004)

8,058,044,651

At the time of press release  (Sep 02, 2004)

4,285,199,774

1 Year Before the Press Release (Aug 25, 2003)

4,285,199,774

20 Days Back (Aug 04, 2003)

3,083,324,652

So, what does this mean? It means either Google has been lying to us all, or they have been dropping as many pages as they have been adding them.

Our guess is that in Aug 25, 2003 Google’s index was full. Why do we say this….? Because Google’s white papers were freely available to anyone. This meant that you could access the actual documents published by Google Founders before Google became public and get a glimpse of how Google was created. According to these documents, Google was written in C and C++ using ANSI C and Linux. The database was constructed using a Document_ID that is associated with each web page. The Document_ID was published as being 4-byte unsigned long integer. This means that for every single web page Google has in their index, an ID was created to identify this page. But like everything, there is a limit and a 4-byte unsigned long integer has a maximum value of 4,294,967,296. (2^32).

Does, this number strikes in you some remarkable co-relation between the second and third entries of the table above. I am sure it did. So, if no changes are made to their database structure, it would mean Google has probably reached this threshold. And as new pages are added, old pages are removed. Quite alarming isn’t it?

This may also be one of the reasons; pages appear to be dropping from google’s index at an alarming rate (tens of thousands of search results where I can prove this happening). They may have already run out of space and the Document_ID is no longer associated with the content stored in the database which in turn will return empty results for a particular URL.

Can this problem be corrected? Sure, it can, but Google has 15,000+ Linux servers and 4.2 billion Document_IDs to convert. This is not going to be an easy task at this point, as it would be adding to the list of expenditure for the Google Company. Also, every single word in their inverted index is associated with a Document_ID so the conversion will probably take months if not even a great deal longer.

Given that Google returns currently “popular” pages at the top of the search results, only proves Google is unfairly penalizing newly created pages that are not yet “popular.” While this statement may be an exaggeration, it does contain an alarming bit of truth.

While Google takes more than 100 different factors into account in determining the final ranking of a web page, the core of heir ranking algorithm is based on a metric called PageRank, which is nothing more than a “Link Popularity” metric. It is important to understand the distinction between the “importance or quality” of a web page and the relevance of “Popularity”.

Since popular pages are repeatedly returned by Google as top results, they are also the easiest for users to discover, which increases their popularity even further.
As is evident for many resources, 98% of Google’s revenues come from their advertisers. This would mostly consist of Adwords and Adsense. But all it would take a firewall company, Virus Protection Company, AOL, or Microsoft to simply create a google ad blocker and it will be the end of Google over night. These companies as well as Google already provide pop up and pop under blockers and writing a Google Ad Blocker would be even simpler to do.

Google was built and still uses cheap Linux desktop machines (about 15,000 of them) and open source C and C++ as well as Python. These were and most likely still are 32 bit CPU machines. In effect you have 32 bits of data to play around with and every document has a unique representation “DocID”. Unfortunately you cannot represent fractions, or numbers greater than 4,294,967,295 (2^32-1).

Just do a search at google on almost anything and I am sure you’ll find empty pages in the results.

Google keeps incredible amounts of pointless pages just created for the sake of spamming it and probably making some click through business (including Adsense), while content rich and much focused pages sometimes disappear. If Google’s capacity to identify links had a top, why would they keep so many duplicates in their index at different domains?

Beyond content duplication, Google is the only engine which can afford displaying aliases (http://domain and http://www.domain ) for those sites which deliver on both paths. Would a search engine near the limit of its index capacity accumulate pages that don’t exist anymore, broken links, different versions of the same URL, and the like? Would it eradicate pages with hundreds or even thousands of inbound links and keep tons of pages from totally unpopular sites?

Do the Google Guys have a technically reasonable explanation that would not ruin their 32 bit theory?

Few questions that summarizes the whole story so far!!

    1. Why after several months does Google proudly display 4,285,199,774 web pages, but yet they seem to have the time to update their logos on a daily basis?
    2. Why are still valid and active pages dropped after being in Google’s index for years?
    3. Why does Google give us results for empty pages of the same URLs for months at a time?

It's not like that we are asking Google to share their trade secrets or anything like that. We just want to know why Google is lying and hurting so many businesses that rely on Google traffic. Google placed themselves in the public and asked us all to invest in them which we have. Google should now answer to the public and tell us why they are destroying our business?

Continued in part-II, Click the arrow button....!!

Google-Products and Features-Next
Post Your Comment
Name :
Email :
Comments :
 
 Comments
<%dim objRs2 Set objRs2=objConn.execute("select * from features where cpage='4' order by id_num desc") While not objRs2.EOF %> <% objRs2.movenext Wend %>
<%=objRs2("comments")%>
Posted on <%=objRs2("sdate")%> by <%=objRs2("pname")%>
 

Disclaimer
 
<% 'close connection and recordset... rs.Close set rs = nothing conn.Close set conn = nothing %>