Webpage Thumbnails — Screenshots via Page Glimpse in JavaScript

For quite some time I’ve had a desire to fetch screenshots of webpages in thumbnail form. My last round of development in the area involved a somewhat overly complex solution using Amazon’s AWS Alexa Site Thumbnail service. I chose to integrate with the AWS Alexa Thumbnail service over other services because it just returned an image, no extra crap (Snap’s thumbnails are grotesque) and no ads. Although the service wasn’t free, it only cost a few pennies to use.

The other requirement I have is to retrieve the thumbnails in JavaScript. This lead to the creation my Ajax Alexa Thumbnails project. The AWS Alexa Thumbnail service required a client to interact via a XML web service (similar to the other AWS services), this means signing the request with your AWS credentials; something I wasn’t going to do in JavaScript. The project became too complex for the task; it involved making an Ajax request to a local PHP file which dealt with sending the request to and receiving the response from Amazon. Crazy I know, which is why I’m deprecating the Ajax Alexa Thumbnails project along with the deprecation of the AWS Alexa Thumbnail service in favor of my new solution [below] using Page Glimpse.

Page Glimpse

On March 18, 2009 I received the email from Amazon AWS telling me the Alexa Site Thumbnail service is being deprecated:

Dear Alexa Developer,

We are announcing the deprecation of the Alexa Site Thumbnail service as of March 13, 2009. After this date, the service will be closed to new subscriptions. The Alexa Site Thumbnail service will continue to be operational for existing subscribers for 90 days, until June 12, 2009.

Use of the service has been relatively low, and we have decided to focus our resources on more broadly used services in order to provide the greatest benefit to Alexa customers.

Thank you for your use of the service. We regret any inconvenience to you.

Thank you,

The Alexa Web Services Team

Luckily that same day TechCrunch ran an article about this announcement and users posted alternate services in the comments, which included Page Glimpse. Based on Marcio Castilho comment, I figured it was worth a look:

You should check http://www.pageglimpse.com. It is also run on scalable Amazon AWS infrastructure, and it is much more faster than Alexa and other services.

Page Glimpse is a free service providing programmatic access to thumbnails of any web page (AWS Alexa Site Thumbnail service just provided thumbnails of hostnames, not interior pages). They have a well-documented RESTful HTTP API; to retrieve a thumbnail for a webpage you construct a URL to their API with query-string parameters. If Page Glimpse hasn’t captured the webpage you are requesting, it will do so, automatically, within a few seconds. They also have API methods to check if a thumbnail already exists, and request the service to capture a thumbnail. Just like Marcio stated, Page Glimpse is very fast; I’ve been pretty amazed at the speed durning my testing.

To start using Page Glimpse, Sign-up for a free account to get your developer key.

After signing-up and trying a few requests with my shinny new developer key I was pretty curious how they are offering this service for free. I gathered from their Terms of Use that if you are a heavy user (> 300 GB/month) you might have to pay a monthly fee (this seems completely reasonable):

If your bandwidth usage exceeds 300 GB/month, or significantly exceeds the average bandwidth usage (as determined solely by RADSense) of other PageGlimpse customers, we reserve the right to terminate your use of the service, but we will make the best effort possible to accomodate the increased bandwidth, although a monthly fee might apply in this case.

Page Glimpse Thumbnails in JavaScript

With the Page Glimpse API being HTTP based and free, I am okay with including my developer key in client-side JavaScript code; it’s just not worth the complexity to hide it.

One could easily use Page Glimpse in JavaScript by constructing a new img DOM Node and set it’s href property to the Page Glimpse API URL with all the parameters. I want something more convenient. Even with Page Glimpse being a speedy service, I also want to make sure the image has fully downloaded before showing it on the page.

Thumbnailsjs — JavaScript Interface To Page Glimpse’s API

I’ve created a small (1.2KB minified, 0.7KB gzipped) utility Class to provide a convenient interface to Page Glimpse in JavaScript and to ensure the images have been fully-downloaded by the client before displaying the it on the page (thanks to Luke Smith’s Is my image loaded? post).

The fully-documented source code along with a simple example is available in the files section: http://925html.com/files/thumbnailsjs/.

Example of Client Code
var container = document.getElementById('container'),
	thumbs = Thumbnails({ devkey:'xxx' });

thumbs.get('http://google.com', append);
thumbs.get([
	'http://eric.ferraiuolo.name/',
	'http://925html.com',
	'http://oddnut.com'
], append);

function append ( url, img ) {
	var link = document.createElement('a');
	link.href = url;
	link.appendChild(img);
	container.appendChild(link);
}
Details of Thumbnailsjs API

The Class’ constructor takes a config Object which can have the following properties:

  • devkey {String}: Your PageGlimpse API deveoper key. All requests to PageGlimpse service require a devkey (required).
  • size {String}: The size of thumbnail. Available sizes are: small, medium, large.
  • root {Boolean}: Indicates if the thumbnails for the domain root should be displayed. The root thumbnail image will only be used if an interior page’s thumbnail hasn’t been resolved.
  • nothumb {String}: If the thumbnail for the website is not yet taken, the URL for this property will be used. If this parameter is not set a PageGlimpse default image will be returned.

The Class only has one public method, get, which takes two arguments (both required):

  1. url {String | Array}: Location of webpage(s)
  2. callback {Function}: Function to pass the webpage URL and image Node once fully downloaded

Page Glimpse + Thumbnailsjs

I’m planning to update my personal webpage to use this setup for displaying tooltips for external links with a thumbnail of the webpage; retiring the current setup of using AWS Alexa Site Thumbnail + Ajax-Alexa-Thumbnails.

I have to say to the folks over at RADSense Software who put together Page Glimpse; Thanks for developing such a cool and useful service!

Hope you find Page Glimpse + Thumbnailjs useful; let me know if you questions, thoughts, or requests.

2 Tweets 32 Other Comments

20 Responses to “Webpage Thumbnails — Screenshots via Page Glimpse in JavaScript”


  • good stuff. Ill book mark this, maybe i can use it later.

  • Hi, nice post, thank you. Is there a way to know whether you get a placeholder for the thumb (because it has not been taken yet) or the real thumb? I tried to pass an own placeholder with a 1 pixel difference in width and check the naturalwidth, but this is only working in FF. Any better suggestion? If we can solve this, we can put a JS timer on the page to check every 5 seconds (progressive) if a thumb is available. Greetings from Berlin.

  • @Vlado Excellent question. You should use the solution you came up with, but modified slightly to get better browser compatibility:

    In your callback function passed to the get method

    function callback ( url, img ) {
    
    	var placeholderWidth = 151,
    		prop = img.naturalWidth ? 'naturalWidth' : 'width';
    
    	if ( img[ prop ] === placeholderWidth ) {
    		// img hasn't been captured
    	}
    
    }

    IE doesn’t have a naturalWidth property, width will be used instead. When I was looking into this, I noticed a bug, which is now fixed, for setting the root property in the config Object passed to the constructor; it’s now possible to set root : false.

    Sorry I couldn’t come up with a better solution. The Page Glimpse API has an exists method, but calling that RESTful HTTP method returns a Content-Type of application/json. From the server-side, using this method would be trivial.

  • Thank you Eric, that was quick. Well, I used “naturalWidth” instead of “width” because “width” already HAS the value 150 due to the HTML img src=”…” width=”150″ parameters to avoid page flickering. Uhh, maybe I should do this via CSS.

    You can take a look at the application in alpha-stage if you add port :81 to the URL above. Search something like “berlin” and scroll down e.g. to the blog results. The empty rectangles are the ones to be filled will the real thumbs.

    I will be resend a mail to the PageGlimpse guys, didn’t get an answer a few weeks ago.

  • PageGlimpse is free at the moment… You know that the service won’t remain free. Just the time for them to use your requests to build their thumbnails base…

  • This is some useful script!
    Thanks so much for sharing.

  • Very nice code, very well written.

    Can I ask, why did you decide to declare the class within an anonymous namespace if you’re going to assign the class to the window object at the end anyway?

    Is there some performance benefit?

    Thanks again, great work!

  • @jacob You might be right; it’s unclear what motivation RADSense Software has offering PageGlimpse as a free service. One can only imagine the number of developers using the service will increase with the AWS Alexa Site Thumbnail service going away in the coming months.

  • @Mikuso My use of an anonymous function is mainly out of convention; there isn’t a performance benefit to wrapping a Class with an anonymous function, it’s just an organizational thing. While writing this Class I had some String constants I was using that were defined as local vars within the anonymous function, which I happened to not need, so they were removed.

  • Thanks for all that have tried our PageGlimpse service. For the ones trying to figure out whether we are returning the placeholder image, we have a REST request you can make to verify if a thumbnail already exists. Check out documentation in our site.

    Let me know any feedback.

  • @Marcio I was trying to figure out how to interact with that method via JavaScript and no apparent solution came to mind. Could you guys look at having the response of the http://images.pageglimpse.com/v1/thumbnails/exists? resource return a JSONP response (with a text/javascript Content-Type) if a callback query-param was passed with the request? This way I could write a script tag to the DOM and give it my callback function to execute.

    A request to: http://images.pageglimpse.com/v1/thumbnails/exists?devkey=xxx&url=http://example.com/doesnt_exist/&callback=checkExists would return a response of (as text/javascript):

    checkExists(["result", "no"]);
  • @Eric Ferraiuolo The exists method returns currently a JSON result.

    Take a look on how Adjix.com implemented the check for thumbnails via the exists method, using javascript.

    Maybe I didn’t understand what you were asking.

  • @Marcio Since XDRs in JavaScript are still pretty hairy, I would have to create a server-side proxy to send off the requests to you guys for the exist API call (which appears to be what Adjix.com is doing via a DNS A record) and call my poxy via XHRs (which I would prefer not to do).

    JSONP is an extension to JSON; basically adding the idea of function callbacks— making it more like JavaScript. The response’s Content-Type would also have to be text/javascript since a script node would have it’s src attribute making the API call. The contents of the script tag would then be the response body (JSON) as the argument to the callback function specified in the request’s query-parameter.

  • This looks fantastic. I’m pretty lame at js, is there any plan for allowing some type of hookup so a class name can be placed near any url’s we choose to have ‘pageglimpsed’ to make it even easier to integrate?

  • @Andrew Adding a class attribute to the image DOM Node would be trivial using any JavaScript library. The callback function passed to the get method will be passed a standard DOM Node representing the thumbnail; the client can do what they want with this node. In my example I’m wrapping it with an anchor (link) Node and appending it to the DOM.

  • Thanks Eric, not trivial for me! I think I’ll have to wait until someone/radsense produce a simple implementation like websnapr’s.

    :)

  • I though i might add this, this is PHP5 only code which i used to pull the images from the pageglimpse and copy them to my server – not that i dont believe they will continue with the free service :D

    http://gist.github.com/127449

  • I found http://www.sitethumbshot.com/ through search engine. It looks promising. Comments or feedback. It looks cheap as well (http://www.sitethumbshot.com/premium_service.php).

  • Substantially, the article is really the greatest on this deserving topic. I fit in with your conclusions and will thirstily look forward to your upcoming updates. Just saying thanks will not just be sufficient, for the wonderful lucidity in your writing. I will directly grab your rss feed to stay abreast of any updates. Good work and much success in your business dealings!

Comments are currently closed.

Additional comments powered by BackType