Originally Posted by beluga
How exactly does google find you its websites? Does it have a list of websites that it has collected into a huge directory, or when you search is it actually somehow reaching out to other servers to find your site?
Three steps: Google crawls, Google indexes, Google Serves (results).
Google has an indexing program they refer to as Googlebot
. It follows links from one page to another, from one site to another, and collects information about the web. It basically recreates the web - though not an exact copy of what is actually out on the web. My link above is to the Google page on their crawler, and Google's pages are actually a good starting point to learning more about how Google works. If you want to get a basic understanding of what they find to be good practices, it's worth spending some time in their section on Google Information for Webmasters
There are a lot of articles, whitepapers, and patents that you could look through to try to find out more about how Google indexes pages. There's an introduction to the architecture of Google, and how it works in one of the first papers on the subject from Sergey Brin and Lawrence Page called The Anatomy of a Large-Scale Hypertextual Web Search Engine
A newer paper that describes the architecture of Google's indexing (and serving) system is at: Web Search for a Planet: The Google Cluster Architecture
The patent application that jlknauff pointed to is one of many that the people of Google have filed that describes possible ways that Google may collect and index information about pages. It's not the only one. But, just because it's something that Google has released into the world doesn't mean that they are using it - though some parts may be in use. Keep in mind that those patents are written to protect Google's intellectual property, and not as guidelines to people on how the search engine works.
When Google collects information about sites during its crawl, and it indexes that information, it gathers a fair amount of data: what is presented on pages, how it is presented, how it may compare to information on other pages on the web, and on the connections - the links between pages.
When someone performs a query on Google, and is given a set of results, the results have to be served fairly quickly. The collection of information, and the organization of that information is mostly already done at that point. There's likely some sorting happening as a result of the indexing that has been done. Duplicated information and other information may be filtered out at this point, and different relevance rules may be applied to information in the index to serve what the people at Google hope are good responses to queries.
What do you need to know to make it more likely that your page will show up in rankings in Google?
Mel's brief overview is a good start. As I stated above, the webmaster guidelines from Google also are pages that you should read through carefully.
Make sure that your pages can be crawled by googlebot. It has some limitations, like potential problems with java script based links, flash, text as images, complex dynamic sites with multiple variables in URLs, and more.
Try to use semantically correct html, so that your page titles are unique to each page and actually describe the content of that page. Use heading tags (<h1>, <h2>, etc.) that work the way headings on a page should - as descriptions of the content that they are headlines for. Use alt text for images that truly is alternative text for those images.
Use words on your pages that your targeted audience or audiences will actually search for, and will expect to see on your pages.
My brief answer is a little more detailed than Mel's, but it's still pretty brief.