Scanned Media now Google Search Friendly
Article by George Norman
On 31 Oct 2008
If you work on something that you believe may have an impact on the outside world (like scientific research or an academic paper) you will want to scan that work and post it on the web for everyone to see. Even though countless scanned media makes it to the web on a daily basis, it is quite hard for Google to index it and make them search-friendly. Or is it?

Evin Levey, Product Manager with Google, comments: “In the past, scanned documents were rarely included in search results as we couldn't be sure of their content. We had occasional clues from references to the document - so you might get a search result with a title but no snippet highlighting your query. Today, that changes.”

Advertising

According to Evin Levey, as long as the scanned document is saved in PDF format, then Google can perform OCR (optical character recognition) on those files, convert the scans into digital text files, and then unleash the web spiders on them so as to index all the information included in those files. This is no simple feat, as it requires vast amounts of processing power, but it is better than not being able to do it at all.

The simple truth of the matter is that computers, mighty as they are, tend to have serious problems when it comes to performing some actions (such as text recognition). To a human being, reading a text document or looking at a picture of a text has the same result – one has no problems reading it. Computers on the other hand cannot adapt so quickly and that is why up to this point indexing scanned files was a problem for Google. It may be simple for you to tell the difference between the letter O and the number zero 0, but it is not that simple for a computer.

A word of warning for those that posted scanned files on the web knowing very well that no one will ever reach them by means of Google search: if you don’t want people reading them, take them down.



Tags: Google
About the author: George Norman
George is a news editor.
You can follow him on Google+, Facebook or Twitter

I Hope you LIKE this blog post! Thank you!
What do YOU have to say about this
blog comments powered by Disqus
Popular News
By George Norman on 17 Aug 2017
With the blockbuster movie season upon us, Sony decided to celebrate the occasion with a sale: the Attack of the Blockbusters Sale that offers discounts of up to 50% (60% if you’re a PlayStation Plus member) on a ton of PS4 video games.
By George Norman on 17 Aug 2017
Samsung’s new T5 portable solid-state drive (PSSD) uses the latest 64-layer V-NAND technology, offers between 250GB and 2TB of storage capacity, has a lightweight and shock-resistant design that’s smaller than the average business card, and delivers industry-leading transfer speeds of up to 540 MB/s.
Related News
By George Norman on 17 May 2017
Google once again drew our attention to the fact that the way people watch TV is fundamentally changing. This time, Google highlighted the fact that watching YouTube on a TV screen is on the increase, with 2 out of 3 YouTube viewers saying that they watch YouTube on their TVs.
By George Norman on 21 Jun 2017
Fidget spinners, the toys that the internet loves to hate, have managed to grab Google’s attention. The search engine is offering a virtual fidget on desktop as well as mobile. Simply search for "spinner" and Google Search will bring up a fidget spinner quick answer card.
By George Norman on 27 Apr 2017
The new McAfee has a new mobile app to offer: McAfee Mobile Booster (Boost & Clean). But since the new McAfee isn’t really new, neither is this mobile app. So to get things started, the first important...
By George Norman on 14 Aug 2017
Opera Max, the Android app that uses compression technology to help you save data and get up to 50% more from your data plan, has been discontinued. The app is no longer featured on Opera.com and it’s no longer listed on Google Play.
Sponsored Links
Hot Software Updates
Top Downloads
Become A Fan!
Link To Us!
Scanned Media now Google Search Friendly
HTML Linking Code