npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

googlebot

v0.1.41

Published

Express middleware that returns the resulting html after executing javascript, allowing crawlers to read on the page

Downloads

22

Readme

GoogleBot ExpressJs

This module implements a middleware for express that allows to render a full Html/JS/Css version of a page when JS is not available in the client and the site relies heavily on it to render the site, like when using ember/angular/jquery/backbone; I needed to code this for work to be able to deliver a SEO friendly version of the website to the Google Crawler, and found no solution available.

Docs

Google Crawler will attempt a different url when certain characteristics are met, make sure your site complains with them, you have two options for this

  • You must replace your # with #!
  • You can add a meta tag to your layout <meta name="fragment" content="!"> this must be done server side, if this is not found in the initial response it won't work Later we will try to figure out the user agent and make it available to more crawlers, or prevent crawling.

Google will replace the hashbang (or the url) with ?_escaped_fragment_= and append the rest of the url there and expects a different, completely rendered version of the site, the middleware will realize when the request has this and instead of retrieving the normal response it will return the full rendered version that phantomJS creates.

The url fragment that triggers the rendering in phantom can be customized, and something can be appended to it to create conditionals that will restrict crawling or hide certain parts from Google, this too can be customized.

I tried to make it as custom as possible to create different uses withouth having to modify the core files, so you can even serve static files from a different server if it was the case; since this is technically a proxy you can use it for many things. Pull request are welcome and encouraged tho.

Getting Started

Installing Phantom in a server

Since we are probably hosting this in a virtual machine installing a new program might not be as trivial as installing it in our shiny macbooks. This is how you download phantom, uncompress it and add the binaries to the path.

cd ~/
mkdir phantom
cd phantom
wget https://phantomjs.googlecode.com/files/phantomjs-1.9.2-linux-x86_64.tar.bz2
sudo mv phantomjs-1.9.2-linux-x86_64.tar.bz2 /usr/local/share/.
cd /usr/local/share
sudo tar -xf phantomjs-1.9.2-linux-x86_64.tar.bz2
sudo ln -s /usr/local/share/phantomjs-1.9.2-linux-x86_64 /usr/local/share/phantomjs
sudo ln -s /usr/local/share/phantomjs/bin/phantomjs /usr/local/bin/phantomjs

Installing GoogleBot

Remember this is middleware for express, I don't know how it works in other frameworks, if you do fork it and make it better :)

There's probably no point on installing globally, but if you wish to it will install

npm install --save googlebot

To install locally, or add googlebot in your package.json

Configuring the middleware

In your server.coffee o server.js when you launch the server add the line for googlebot, tada!

app.use googlebot {option:value}

if javascript

app.use(googlebot({option:'value', option2:'othervalue'}));

More complete example

googlebot = require 'googlebot'
express = require 'express'

app = module.exports = express()
app.configure ->
  app.set 'views', __dirname + '/views'
  app.use googlebot {delay: 5000, canonical: 'http://dvidsilva.com'}
  app.use (req, res) ->
    res.render 'app/index'

app.startServer = (port) ->
  app.listen port, ->
    console.log 'Express server started on port %d in %s mode!',
      port, app.settings.env

Options

allowCrawling:

default: true

whether or not to respond to google requests or request that meet a particular requirement(someday)

trigger:

default: '?_escaped_fragment_='

Which string in the url triggers the phantom rendering instead

append:

default: '&phantom=true'

Add something to the new request, I use to prevent Google from seeing certain stuff

delay:

default: 1000

Number of miliseconds to wait for the page to render before sending the request

protocol:

default: 'http'

In case you want to redirect the request to a different one

host:

default: undefined

In case you want to redirect phantomJS requests to a different host even, where you store the static files or something

canonical:

default: undefined

ref specify the preferred host for google to associate the page resulting, a header will be sent to tell Google which url you rather show to the people searching for you

evaluate:

default: function(){};

(currently not supported) the idea is to allow you to add more client side javascript that phantomJS will execute before returning the results to Google withouth having to modify the module. An example could be that you don't want to have empty alt tags in your images, because is bad SEO so you can do

$('img').each(function(){ $(this).attr('alt',$(this).attr('src')); });

Dependencies and notes

  • You need to install PhantomJS and make it available in the PATH
  • Node Phantom is used to communicate between Node and the Phantom Browser
  • ExpressJS is a Node web application framework and this GoogleBot is a middleware for it, if you're using a different framework it might or not work, I have no idea, but at least you can get some inspiration and copy what's useful
  • Google Ajax Crawling Google will attempt a different url if certain characteristics are met, you must be complaint with them

Thanks to

Crawlme That implements a simmilar module to use with ZombieJS instead of Phantom

Contact

Use github for issues or questions so everybody can benefit.

feel free to fork me, make changes and make pull request.

@dvidsilva