By George Norman - Software News Editor
Added on 31 Oct 2008(640 Views)
If you work on something that you believe may have an impact on the outside world (like scientific research or an academic paper) you will want to scan that work and post it on the web for everyone to see. Even though countless scanned media makes it to the web on a daily basis, it is quite hard for Google to index it and make them search-friendly. Or is it?

Evin Levey, Product Manager with Google, comments: “In the past, scanned documents were rarely included in search results as we couldn't be sure of their content. We had occasional clues from references to the document - so you might get a search result with a title but no snippet highlighting your query. Today, that changes.”


According to Evin Levey, as long as the scanned document is saved in PDF format, then Google can perform OCR (optical character recognition) on those files, convert the scans into digital text files, and then unleash the web spiders on them so as to index all the information included in those files. This is no simple feat, as it requires vast amounts of processing power, but it is better than not being able to do it at all.

The simple truth of the matter is that computers, mighty as they are, tend to have serious problems when it comes to performing some actions (such as text recognition). To a human being, reading a text document or looking at a picture of a text has the same result – one has no problems reading it. Computers on the other hand cannot adapt so quickly and that is why up to this point indexing scanned files was a problem for Google. It may be simple for you to tell the difference between the letter O and the number zero 0, but it is not that simple for a computer.

A word of warning for those that posted scanned files on the web knowing very well that no one will ever reach them by means of Google search: if you don’t want people reading them, take them down.





Don't forget to:

RSS


Tags: Google

Link to this article:



Add comment:
Name(Required)
Email(Required - Never shown)
Website(Optional)
Comment(Required):

Insert the following code:
Software News
This Week Only: One Opera Unite App per Day
Opera Software, the company behind the innovative Opera web browser has just announced the release of Opera 10.10 as a final, stable software application. That is good news for Opera users, but here comes one better...
23 Nov 2009
Reinvent the Web: Opera 10.10 Final with Opera Unite
Earlier this year Opera Software announced that it would “reinvent the web” – then on the 16th of June do this (reinvent the world I mean) with Opera Unite, a new technology that makes the old client-server computing model look outdated....
23 Nov 2009
Palm Delivers WebOS 1.3.1 to European Customers
Palm recently announced that it updated the WebOS (the operating system that powers the Palm Pre and the Palm Pixi) to version 1.3.1 and that it released it to its...
23 Nov 2009
Chromium OS Goes Open-Source
This summer Google let the world know that it is working on a new operating system meant for the user that spends most of his time online. The operating system – aptly named Chrome OS because it is a natural extension...
20 Nov 2009
Office 2010 Beta Downloads Available to the Public
Earlier this week Redmond-based software giant Microsoft announced that Office 2010 became available for download as a Beta. The catch was that only ...
20 Nov 2009
Mozilla Releases: Firefox 3.6 Beta 3
The development process of the Firefox 3.6 browser is moving along rapidly. The first Beta version was released at the start of the month; Beta 2 was released about two weeks after Beta 1. About a week has passed since...
20 Nov 2009
Recommended Tools

Top Downloads