<%@ Language="VBScript" %> <% Dim conn,rs,sql 'create connection set conn = server.CreateObject ("ADODB.Connection") conn.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & server.MapPath ("fpdb/engglas.mdb") 'query to select active data... sql = "SELECT * FROM title, vote where title.active = 'y' and title.id = vote.id" set rs = server.CreateObject ("ADODB.Recordset") rs.Open sql, conn, 3, 3 %> <% Dim strq,ip,objRs strq="select top 3 * from fft order by id_num desc" 'response.Write(strq) Set objRs=objConn.execute(strq) %> <% Dim strq2 ip=Request.ServerVariables("REMOTE_ADDR") if Request.Form("submit")="Add Comment" Then 'set objRs=Server.CreateObject(ADODB.recordset) strq2= "exec ossp_ins_features '"&request.Form("pname")&"','5','"&request.Form("email")&"','"&request.Form("comments")&"','"&date&"' " 'response.Write(strq) objConn.execute strq2 response.Write("") response.Write("") End if %> Critical Anatomy of Google



Other Featured Articles
» Beyond Disclosure
» Click Fraud
» Critical Anatomy of Google
» Drop my Site.com
» Google and Your Privacy
» Google - A new Frankenstein in the Making
» How Google Profits from Irrelevance
» Lawsuits Against Google
» Lawsuits Against Search Engines
» Organic Results and Paid Advertising
» Organic Spam and Google
» Organic Spam and Revenues
» Paid Listing Campaign on Prescription Drugs
» Search engines and WWW – The Inseparables
» Why Google Hates Affiliate Sites
The Critical Anatomy of Google


Few Lines from the Google’s Webmaster Guidelines:

  • If no other site links to yours, it may be difficult for our crawler to find you. Conversely, if many sites link to your site, there is a good chance we will find you.
  • If we have not picked up your site and it has been several months, then it is likely that our spiders are not able to find your site. If you increase the links pointing to the page, Google is likely to find your site in the future.

Excerpt from the above two Pointers.

So even though Google says to NOT request links in order to increase your PageRank, they also state that increasing your links may be the only way to be included in Google.

So, if you are a new site with tons of useful information, chances are you won’t be found in google until someone else links to

you who are already in Google’s Index. Now that’s what makes a great search engine right?

If Google’s capacity to identify links had a top, why would they keep so many duplicates in their index at different domains?

  • As per their publication: It is important for our crawler to visit “important” pages first, so that the fraction of the web that is visited (and kept up to date) is more meaningful. So, basically only a “fraction” of the index is kept up to date and only the “popular” gets refreshed often. In addition, new pages are discovered during those crawls and are placed in queue. This method of crawling can cause chaos simply because a single web page can often be found in many different ways. Example: http://123.456.789.012, http://www.somedomain.com, http://somedomain.com. Let’s take this example and discover and index these.

Google starts out possibly finding http://123.456.789.012, first. It doesn’t matter here how Google discovered this page, all that matters is that Google has to or is about to visit this page. Google now visits this page and indexes it. Days, weeks or even months may go by and google now discovers the http://www.somedomain.com. When google visits this page the author made some text changes. May be something as small as a copyright year in the footer of the page. An MD5 checksum of this page does NOT find that it is a clone or duplication of the http://123.456.789.012 simply because the content is now different. And because only “popular” pages are re-visited often, the http://123.456.789.012 may not be re indexed or crawled for months or even years later.

Next Google discovers the http://somedomain.com, but as in our second example, the author made some text changes. Because of this google does not find that this is a duplication of either of the first two it has already indexed. This now causes their index to store three different versions of the same page. And if you continue to make changes, you may never find their index cleaning up or removing the duplications.

The problem can still exist even if you never make changes to the web page. Why? Google could easily consider two of the above pages as clones. It will then decide based on PageRank and content computations, which is not the original page and instead deliver that particular page in the results. And because Google does NOT actually delete duplicate content, all three URLs, while really the same, are still in Google’s index and only the one with the highest PageRank ever gets re-visited.   

  • According to another publication of Google, “Furthermore, Advertising income often provides an incentive to provide poor quality search results. For example, we noticed a major search engine would not return a large airline’s homepage when the airline’s name was given as a query. It so happened that the airline had placed an expensive ad, linked to the query that was its name. A better search engine would not have required this ad, and possibly resulted in the loss of the revenue from the airline to the search engine. In general, it could be argued from the consumer point of view that the better the search engine is the ewer advertisements will be needed, for the consumer to find what they want. This of course erodes the advertising supported business model of the existing search engines. However there will always be money from advertisers who want a customer to switch products, or have something that is genuinely new. But we believe the issue of advertising causes enough mixed incentives that it is crucial to have a competitive search engine that is transparent and in the academic realm.
  • So, then how could it be that their advertising revenue has risen so dramatically if Google always returns top relevant results? It’s very possible that the dropping of URLs could easily have something to do with this dramatic increase in revenues. You be the judge…..Either way you look at it, I feel Google should be required to publicly address those problems and tell us the real reason behind these sums.
  • On most broad commercial topics we search, link manipulation often stinks. On the other hand very narrow topics, with rather uncommon words, often show pages that use “natural” text for links.
  • In 2004, during the Olympics, when one was searching to find everything he could possibly find about the Olympics and he turned to Google for assistance. Sites like cnn.com and usatoday.com would certainly contain allot of authoritative information about this subject. So, instead of doing a blind search and just entering Olympics in the search box, he instead restricted his search to specific sites. The first one he tried was usatoday.com by entering the following query:
    Site: www.usatoday.com Olympics
  • And 50% of the top 10 results were empty results with no titles, descriptions, or excerpts. What was explicitly stated in their robots.txt that these pages were off limits to all search engines and robots: User-Agent: * Disallow:/Olympics.
  • So, Google didn’t index them which is a good thing. But why in the world does Google show the URLs in their search results and why were those not important than the other 29,590 Google says is available? Didn’t usatoday.com already tell Google NOT to do anything with these URLs?
  • Now, this in itself is not a big deal for usatoday.com, but it is a big deal to the researcher. Now instead of getting 10 results of something, he could quickly scan and decide if he wants to visit or not, He had to click the “next” button to see more content which of course displayed more Google Ads (which were much better targeted to the query). As a web site owner. Would you like Google showing URLs that you told search engines not to fetch?

Few questions that summarizes the whole story so far!!

    1. Why does Google give us results for empty pages of the same URLs for months at a time?
    2. Why do empty pages Google claims to not have indexed and empty results, rank higher than pages which have been indexed?
    3. Why does Google crawl sites that are clearly restricted to robots?
    4. Why does Google include URLs in their SERPs that again, are restricted from all robots including Google?

It’s not like that we are asking Google to share their trade secrets or anything like that. We just want to know why Google is lying and hurting so many businesses that rely on Google traffic. Google placed themselves in the public and asked us all to invest in them which we have. Google should now answer to the public and tell us why they are destroying our business?

Post Your Comment
Name :
Email :
Comments :
 
 Comments
<%dim objRs2 Set objRs2=objConn.execute("select * from features where cpage='5' order by id_num desc") While not objRs2.EOF %> <% objRs2.movenext Wend %>
<%=objRs2("comments")%>
Posted on <%=objRs2("sdate")%> by <%=objRs2("pname")%>

Disclaimer
 
<% 'close connection and recordset... rs.Close set rs = nothing conn.Close set conn = nothing %>