Caching & Indexing

Tuesday, 17 May 2011


What is the difference between Indexing and Caching?

Indexing is a process to make a webpage searchable on search engine whereas the process of caching refers to providing a reprinting content snapshot.

For example, if we uploaded a new website, then first of all search engine crawler will read the site and after that, it will store all its contents in its Index Data Base in a different format (by giving priority to the h1, h2, or bold, title, meta tags etc.), it will not place content as it was published. As a result, the site will appear in search results for optimized keywords.

Google also takes a snapshot of each page on a website and stores it in a different data base which is known as cache data base. If you click on the “Cached” link, you will see the web page as it looked when we indexed it. while, Google creates the index and the database of documents that it accesses when processing a query.

  =============   How long to get indexed   =================

There is No set time in for Google to initially index your site - the time taken can vary.
The time it does take may vary based upon factors such as;

   * Popularity of the site (Whether it has any links to it)

   * Whether the content is Crawl-able (Server Responses and Content type)

   * Site structure (how pages interlink)

It is possible for a site to be crawled/indexed within 4 days to 4 weeks.
In some cases, it may take longer.


SEARCH ENGINE FRIENDLY WEBSITE

Monday, 16 May 2011


Search Engine Optimization – the process of making the site “search engine friendly” is also termed as SEO – is probably the most important aspect of website design.

  • Pick good Domain Name: This step is not very important - but every little bit helps. For the perfect domain name match-up in a search engine so that a page of our website comes up #1 in the search results.

  • Pick a good Web Hosting Company: This can be very important to the search engines. Free website hosting is usually bad for search engine rankings, for several reasons. The most important factor is that your website should have its own "static" IP address.

  • Clear hierarchy & Text links: Every page should be reachable from at least one static text link. The most important thing to know about search engines is that SEARCH ENGINES ONLY INDEX TEXT. They don't index images well. Keep the links on a given page to a reasonable number. Create a useful, information-rich site, and write pages that clearly and accurately describe your content.

  • Images: Don't embed text inside images. Search engines can't read text embedded in images. If you want search engines to understand your content, keep it in regular HTML. Give your images detailed, informative file names. The file name can give Google clues about the subject matter of the image. The alt attribute is used to describe the contents of an image file.

  • Anchor Text: External anchor text (the text pages use to link to your site) reflects how other people view your pages. This improves the user experience and helps the user understand the link's destination.

  • Tags & Phrases: You need to make a short list of 1 or 2 key word phrases for each page of the site you want to optimize. Each phrase should be no more than three or four words. Longer phrases are less effective. Single words are often useless. Tags to be use are as <h1> to <h6>, <b>, <i>, <strong>, <em>, <acronym>, <dfn>, <abbr>.

  • Sitemaps: We need two kind of Sitemaps one for site and one for search engines in the form of XML.
    For Search Engines: Sitemaps are XML files that list the URLs available on a site. The aim is to help site owners notify search engines about the URLs on a website that are available for indexing.
    On Site: Offer a site map to your users with links that point to the important parts of your site. If the site map has an extremely large number of links, you may break the site map into multiple pages.

  • Copyright: Think about the best ways to protect your images. Make your images available under a license that requires attribution such as a creative commas license that requires attribution.
  • Broken Links: Also known as dead links; links that are not working. These results in:
  • 404 Error: It means web server responding but the specific page not found.

  • DNS Error: When server that hosts the target page stops working or relocates to new domain pages.
  • Dynamic Pages: If you decide to use dynamic pages (i.e., the URL contains a "?" character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them few. Don’t use dynamic URL because it don’t contain keywords so its not search engine friendly. If you are using any script which shows dynamic pages then make sure at least it should include one keyword.

  • W3C Validation: W3C stands for World Wide Web Consortium, and they are an organization who develops standards for code on the web. Validating your site means you are comparing the code you have written to the rules laid out by the W3C. Code which validates usually stands a better chance of working on different browsers and different operating systems. You can validate your web site at http://validator.w3.org/
  • Use Cascading Style Sheets (CSS): Using tables to structure pages is fast becoming a search engine unfriendly way to build pages. An HTML/CSS coded site also boasts a fast loading time, as much of the code is in separate style sheets. Using ID’s to create CSS ‘hooks’ allows you to keyword load your page.

  • Root Domain Pages: Keep your pages as close to the root domain as possible Too many levels down - search engines will find it difficult to find and index pages this far down.

  • Page Load Time: Make sure your site loads fast and rarely go down. Keep size of your Web Pages less than 50KB so it is downloaded fast and visitors don’t have to wait for long. For good SEO site page size ideal should be 15KB.
  • Renaming of Pages: Never rename your web page unless your site is new.

  • Naming as per Keywords: Use keywords for folder names and file names. Try to include your most important keyword in hyper-linked text and text that immediately precedes or follows the hyperlink.

  • Unique Title: Each page should have its own unique page title.

  • Overuse of Ajax: A lot of developers are trying to impress their visitor by implementing massive Ajax features (particularly for navigation purposes), but did you know that it is a big SEO mistake? Because Ajax content is loaded dynamically, so it is not spiderable or indexable by search engines. Another disadvantage of Ajax - since the address URL doesn't reload, your visitor can not send the current page to their friends.

  • Versioning of Theme Design: For some reason, some designers wants to version their theme design into sub level folders (I.e.domain.com/v2, v3, v4) and redirect to the new folder. Constantly changing the main root location may cause you to lose backlink counts and ranking.

  • “Click Here” Link Anchor Text: People put up just a big banner image and a link "Click here to enter" on their homepage. If you want to tell the search engine that your page is important for a topic, than instead of using “Click here”, “Learn More”, use topic/Keyword in your link anchor text. The worst case — the "enter" link is embedded in the Flash object, which makes it impossible for the spiders to follow the link.
  • Flash/Videos/Splash Pages: Flash sites are similar to sites built in frames, in which visitors cannot bookmark individual pages. This prevents visitors from being able to locate and re-visit specific pages of interest and may prevent a conversion or sale. Also, your visitors may not have the latest version of theFlash plug-in installed or may not have any version at all. This forces your user to download and install software they may not want in order to properly view some or your entire site. Flash sites are also slow to load, depending on your visitor's connection. Not everyone boasts a lightning-fast internet connection and some Flash-heavy sites can take a minute or more to load. The more real estate of your website that you have locked up in an animation or Flash movie, the less there is to be found, evaluated and indexed by the search engines.
  • Non-Spiderable Flash Menus: Many designers make this mistake by using Flash menus such as those fade-in and animated menus. They might look cool to you but they can’t be seen by the search engines; and thus the links in the Flash menu will not be followed.
  • Image and Flash content: Web spiders are like a text-based browser, they can't read the text embedded in the graphic image or Flash. Most designers make this mistake by embedding the important content (such as target keywords) in Flash and image.
Top