Google [bot]; Stalker?

Talk about anything that comes in your mind and isn't related to Vortex Wars

Re: Google [bot]; Stalker?

Postby >(FM)< » Sat Feb 02, 2013 9:03 pm

Fangfallen wrote:Googlebot is the search bot software used by Google, which collects documents from the web to build a searchable index for the Google search engine.
If a webmaster wishes to restrict the information on their site available to a Googlebot, or another well-behaved spider, they can do so with the appropriate directives in a robots.txt file,[1] or by adding the meta tag <meta name="Googlebot" content="nofollow" /> to the web page. Googlebot requests to Web servers are identifiable by a user-agent string containing "Googlebot" and a host address containing "googlebot.com".
Currently, Googlebot follows HREF links and SRC links.[1] There is increasing evidence Googlebot can execute javascript and parse content generated by Ajax calls as well. Googlebot discovers pages by harvesting all of the links on every page it finds. It then follows these links to other web pages. New web pages must be linked to other known pages on the web in order to be crawled and indexed or manually submitted by the webmaster.
A problem which webmasters have often noted with the Googlebot is that it takes up an enormous amount of bandwidth.[citation needed] This can cause websites to exceed their bandwidth limit and be taken down temporarily. This is especially troublesome for mirror sites which host many gigabytes of data. Google provides "Webmaster Tools" that allow website owners to throttle the crawl rate.

-Mind Fucked-
PREACH THE TRUTH
Angels forever. AoA - Don't follow me on Tumblr, Seriously
ImageImageImage
User avatar
>(FM)<

 
Posts: 3965
Joined: Thu Dec 06, 2012 2:37 am
Location: Cold Side Of The Pillow

Re: Google [bot]; Stalker?

Postby Autumnwolf17 » Sun Feb 03, 2013 1:45 am

>(FM)< wrote:
Fangfallen wrote:Googlebot is the search bot software used by Google, which collects documents from the web to build a searchable index for the Google search engine.
If a webmaster wishes to restrict the information on their site available to a Googlebot, or another well-behaved spider, they can do so with the appropriate directives in a robots.txt file,[1] or by adding the meta tag <meta name="Googlebot" content="nofollow" /> to the web page. Googlebot requests to Web servers are identifiable by a user-agent string containing "Googlebot" and a host address containing "googlebot.com".
Currently, Googlebot follows HREF links and SRC links.[1] There is increasing evidence Googlebot can execute javascript and parse content generated by Ajax calls as well. Googlebot discovers pages by harvesting all of the links on every page it finds. It then follows these links to other web pages. New web pages must be linked to other known pages on the web in order to be crawled and indexed or manually submitted by the webmaster.
A problem which webmasters have often noted with the Googlebot is that it takes up an enormous amount of bandwidth.[citation needed] This can cause websites to exceed their bandwidth limit and be taken down temporarily. This is especially troublesome for mirror sites which host many gigabytes of data. Google provides "Webmaster Tools" that allow website owners to throttle the crawl rate.

-Mind Fucked-

Well, either he copied that off someplace, or he didn't, but either way, that is impressive.
~ Wolf
User avatar
Autumnwolf17

 
Posts: 10764
Joined: Thu Aug 23, 2012 1:29 am

Previous

Return to General discussion

Who is online

Users browsing this forum: No registered users and 43 guests