One of my job responsibilities is configuring and running reports from Google Analytics. Google Analytics is great, and it's free, but there are some limitations that we sometimes need to work around in order to generate helpful data. The biggest limitation is Google Analytics' tendency to report on sampled data. Our site receives millions of visitors each month, so sometimes the samples can be quite small. The longer the date range for the report, the more sampling errors can creep in. Shorter reports have less sampling bias, but this requires running multiple reports. Multiply this by five or six profiles and suddenly you're talking about a lot of labor.
A couple of years ago I wrote a little bash script that would run this routine for me; however, Google has deprecated its old Google Analytics API and my script required a bit of manual tweaking when I switched Google Analytics profiles. I needed to re-write the script using Google Analytics API 3.0 and at the same time make it more robust so that I could write ad hoc reports without tweaking the source each time. I started from scratch using nodejs assuming this work might be integrated into a future analytics dashboard.
OAuth 2.0Google Analytics 3.0 recommends authenticating with OAuth 2.0. I used bsphere's gapitoken package to do some of this lifting. The application also had to be registered with Google API console as an installed application.
GOTCHA although OAuth was configured correctly, a 403 Error: User does not have permission to perform this operation was thrown when the application ran. It turns out the application was registered correctly under my Google email address, but I assumed that was enough to have permission to query the date. NOPE! The application's service account address actually need to be added as an authorized user to each Google Analytics profile in order to perform the query.
CLI argumentsCommand line arguments are handled using trentm's dashdash package. I picked some reasonable defaults for report dimension, metrics, and profile ID.
The big advantage to running from the command line is that multiple queries having only slightly different parameters can be run back to back without having to futz with a user interface. With dashdash I'm able to also easily provide multiple dimensions or metrics to a request.
Our organization transitioned to a unified Google Analytics profile almost two years ago. Sometimes we want to run queries across this gap. A JSON file stores our profile information. If a single property (say, the main website) has multiple associated profiles, the profiles are stored as key value pairs where the key is a JSON-encoded date value signifying the first day the profile was available and the value is the profile ID. This allows for queries to be run on that site as though the Google Analytics data were contiguous in one profile (you need to be careful of what metrics and dimensions are used though!)
var d = new Date(2011, 1, 0)will result in a new Date object created on the last day of the previous month. Nice! Using this, I made a function that takes a date, finds the month, and returns the first and last day of that month.
Throttling asynchronous connections
Things were working well until I started issuing reports with over 10 months. Google Analytics objected to too many connections, so I used queue from the async package to slow things down a little.
I started with writing tests in nodeunit, but as I ran into walls I stopped writing tests. I'm frankly confused by the variety of testing platforms available for node, and I'm not sure how to write solid tests for asynchronous functions. If you have feedback on how to do this a good way please let me know.