s3-to-logstore
v0.1.0
Published
Creates an AWS lambda function for parsing S3 / CloudFront / Cloudtrail logs stored on S3 and pushes those events into your log storage of choice via Winston.
Downloads
1
Maintainers
Readme
Upload AWS log files of various formats (Cloudfront, S3, Cloudtrail) to your log storage of choice via Winston in a best-effort manner using AWS Lambda and the Node.js v4.3 runtime.
Motivation
Amazon allows you to log Cloudfront activity or S3 accesses to your buckets, but only stores them as log objects on S3, sometimes gzipped (as with Cloudfront logs). While you can set up CloudWatch alerts for e.g. 4xx errors in Cloudfront traffic, there's no easy way to tail or search these logs to track down issues. Now with this bit of Lambda glue, you can!
Example Usage
The module takes the following options and returns a function to serve as our Lambda handler:
- format - required, one of:
cloudfront
,s3
, orcloudtrail
- transport - required, a Winston transport object.
- reformatter - function that takes a json object. If null, the default format is a string of key=value pairs. If you wish to log json, just return the object.* callback - function that takes an error param (may be null) and the Lambda function handler's callback. If this option is null, we log any error and call the handler's callback.
Test your script using the Lambda console, or run the function locally with a test event (step 2.3.2) that points to an object on S3:
// To run the code below, `npm install aws-sdk` then:
// node test.js [lambda_fn_file] test_event.txt
// test.js:
const lambda = require('./' + process.argv[2]);
require('fs').readFile(process.argv[3], (err, data) => {
if (err) throw err;
lambda.handler(JSON.parse(data), {}, (err) => {
if (err) console.log(`Error: ${err}`);
console.log('Finished.');
});
});
Notes
- How to set up an AWS Lambda function with S3: http://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html
- General docs on setting up S3 as an event source
AWS Lambda from the command line
First ensure that you've set AWS_SECRET_ACCESS_KEY
and AWS_ACCESS_KEY_ID
in
your environment corresponding to an IAM user with permissions to run these functions.
# ----- EXAMPLE -----
function=s3LogsToPapertrail
bucket=MY_BUCKET
accountid=MY_ACCOUNT_ID
region=us-west-2
indexname=papertrail
# Zip your js file and modules
zip -r $indexname.zip $indexname.js node_modules/
# Create your lambda function. Assumes you already created an execution role (see tutorial).
aws lambda create-function \
--region $region \
--function-name $function \
--zip-file fileb://`pwd`/$indexname.zip \
--role arn:aws:iam::$accountid:role/lambda-s3-execution-role \
--handler $indexname.handler \
--runtime nodejs4.3 \
--timeout 10 \
--memory-size 128
# Give S3 permission to invoke this lambda function. 'statement-id' is just some unique string.
aws lambda add-permission \
--function-name $function \
--region $region \
--statement-id $function \
--action "lambda:InvokeFunction" \
--principal s3.amazonaws.com \
--source-arn arn:aws:s3:::$bucket \
--source-account $accountid
# Need to update your package?
aws lambda update-function-code \
--region $region \
--function-name $function \
--zip-file fileb://`pwd`/$indexname.zip
Caveats
For a static site hosted on S3 to publish logs to a bucket, the destination bucket must also be in the same region. For S3 to post event notifications to Lambda, the bucket must already be in a region supported by Lambda.
Therefore your static site must also be hosted in a lambda-supported region.
Currently in the US that is only us-east-1
and us-west-2
. Cloudfront
distributions fortunately can log to any S3 bucket, so it's easier to
reconfigure an existing setup to log to one of these regions.
You also can't have event notifications fire to two different lambda functions for overlapping object prefixes.
Why "best effort"?
From Amazon's docs:
The completeness and timeliness of server logging, however, is not guaranteed.
The log record for a particular request might be delivered long after the
request was actually processed, or it might not be delivered at all. The purpose
of server logs is to give you an idea of the nature of traffic against your
bucket. It is not meant to be a complete accounting of all requests. It is rare
to lose log records, but server logging is not meant to be a complete accounting
of all requests.
Acknowledgements
Adapted from WatchKeep and inspired by convox/papertrail; thanks!