npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

juttle-elastic-adapter

v0.7.0

Published

Juttle adapter for Elasticsearch

Downloads

5

Readme

Juttle Elastic Adapter

Build Status

The Juttle Elastic Adapter enables reading and writing documents using Elasticsearch. It works with Elasticsearch version 1.5.2 (including AWS Elasticsearch Service) and above, such as version 2.1.1.

Examples

Read all documents stored in Elasticsearch timestamped with the last hour:

read elastic -from :1 hour ago: -to :now:

Write a document timestamped with the current time, with one field { name: "test" }, which you'll then be able to query using read elastic.

emit -limit 1 | put name="test" | write elastic

Read recent records from Elasticsearch that have field name with value test:

read elastic -last :1 hour: name = 'test'

Read recent records from Elasticsearch that contain the text hello world in any field:

read elastic -last :1 hour: 'hello world'

An end-to-end example is described here and deployed to the demo system demo.juttle.io. The Juttle Tutorial also covers using elastic adapter.

Installation

Like Juttle itself, the adapter is installed as a npm package. Both Juttle and the adapter need to be installed side-by-side:

$ npm install juttle
$ npm install juttle-elastic-adapter

Configuration

The adapter needs to be registered and configured so that it can be used from within Juttle. To do so, add the following to your ~/.juttle/config.json file:

{
    "adapters": {
        "elastic": {
            "address": "localhost",
            "port": 9200
        }
    }
}

To connect to an Elasticsearch instance elsewhere, change the address and port in this configuration.

The value for elastic can also be an array of Elasticsearch host locations. Give each one a unique id field, and read -id and write -id will use the appropriate host.

The Juttle Elastic Adapter can also make requests to Amazon Elasticsearch Service instances, which requires a little more configuration. To connect to Amazon Elasticsearch Service, an entry in the adapter config must have {"aws": true} as well as region, endpoint, access_key, and secret_key fields. access_key and secret_key can also be specified by the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY respectively.

Here's an example Juttle Elastic Adapter configuration that can connect to a local Elasticsearch instance running on port 9200 using read/write elastic -id "local" and an Amazon Elasticsearch Service at search-foo-bar.us-west-2.es.amazonaws.com using read/write elastic -id "amazon":

{
    "adapters": {
        "elastic": [
            {
                "id": "local",
                "address": "localhost",
                "port": 9200
            },
            {
                "id": "amazon",
                "aws": true,
                "endpoint": "search-foo-bar.us-west-2.es.amazonaws.com",
                "region": "us-west-2",
                "access_key": "(my access key ID)",
                "secret_key": "(my secret key)"
            }
        ]
    }
}

Schema

To read or write data, the adapter has to know the names of the indices storing that data in Elasticsearch. By default, the adapter writes points to an index called juttle and reads from all indices.

You can choose indices to read and write from with the -index option, or you can specify an index for each configured Elasticsearch instance the adapter is connected to.

For schemas that create indices at regular intervals, the adapter supports an indexInterval option. Valid values for indexInterval are day, week, month, year, and none. With day, the adapter will use indices formatted ${index}${yyyy.mm.dd}. With week, it will use ${index}${yyyy.ww}, where ww ranges from 01 to 53 numbering the weeks in a year. With month, it will use ${index}${yyyy.mm}, and with year, it will use ${index}${yyyy}. With none, the default, it will use just one index entirely specified by index. When using indexInterval, index should be the non-date portion of each index followed by *.

Lastly, the adapter expects all documents in Elasticsearch to have a field containing a timestamp. By default, it expects this to be the @timestamp field. This is configurable with the -timeField option to read and write.

Specifics of using the default Logstash schema are described here, including handling of analyzed vs not_analyzed string fields.

Usage

Read options

In addition to the options below, read elastic supports field comparisons of form field = value, that can be combined into filter expressions using AND/OR/NOT operators, and free text search, following the Juttle filtering syntax.

Name | Type | Required | Description | Default -----|------|----------|-------------|--------- from | moment | no | select points after this time (inclusive) | none, either -from and -to or -last must be specified to | moment | no | select points before this time (exclusive) | none, either -from and -to or -last must be specified last | duration | no | select points within this time in the past (exclusive) | none, either -from and -to or -last must be specified id | string | no | read from the configured Elasticsearch endpoint with this ID | the first endpoint in config.json index | string | no | index(es) to read from | * indexInterval | string | no | granularity of an index. valid options: day, week, month, year, none | none type | string | no | document type to read from | all types timeField | string | no | field containing timestamps | @timestamp idField | string | no | if specified, the value of this field in each point emitted by read elastic will be the document ID of the corresponding Elasticsearch document | none optimize | true/false | no | optional flag to disable optimized reads, see Optimizations | true

Write options

Name | Type | Required | Description | Default -----|------|----------|-------------|--------- id | string | no | write to the configured Elasticsearch endpoint with this ID | the first endpoint in config.json index | string | no | index to write to | juttle indexInterval | string | no | granularity of an index. valid options: day week, month, year, none | none type | string | no | document type to write to | event timeField | string | no | field containing timestamps | @timestamp idField | string | no | if specified, the value of this field on each point will be used as the document ID for the corresponding Elasticsearch document and not stored | none chunkSize | number | no | buffer points until chunkSize have been received or the program ends, then flush | 1024 concurrency | number | no | number of concurrent bulk requests to make to Elasticsearch (each inserts <= chunkSize points) | 10

Optimizations

Whenever the elastic adapter can shape the entire Juttle flowgraph or its portion into an Elasticsearch query, it will do so, sending the execution to ES, so only the matching data will come back into Juttle runtime. The portion of the program expressed in read elastic is always executed as an ES query; the downstream Juttle processors may be optimized as well.

Fully optimized example

read elastic -last :1 hour: -index 'scratch' -type 'tag1' name = 'test'
| reduce count()

This program will form an ES query that asks it do count the documents in scratch index with document type tag1 whose field name is set to the value test, and only a single record (count) will come back from Elasticsearch.

Less optimized example

read elastic -last :1 hour: name = 'test'
| put threshold = 42
| filter value > threshold

In this case, Juttle will issue a query against ES that matches documents whose field name is set to the value test (i.e. Juttle will not read all documents from ES, only the once that match the filter expression in read elastic). However, the rest of the program that filters for values exceeding threshold will be executing in the Juttle runtime, as it isn't possible to hand off this kind of filtering to ES.

List of optimized operations

  • any filter expression or full text search as part of read elastic (note: read elastic | filter ... is not optimized)
  • head or tail
  • reduce count(), sum(), and other built-in reducers
  • reduce by fieldname (other than reduce by document type)
  • reduce -every :interval:
Optimization and nested objects

There are a few fundamental incompatibilities between Elasticsearch's model for nested object and array fields and Juttle's. This can lead to some odd results for optimized programs. For objects, an optimized reduce by some_object_field will return null as the only value for some_object_field. For arrays, an optimized reduce by some_array_field will return a separate value for some_array_field for every element in every array stored in some_array_field. For results conforming to Juttle's reduce behavior, disable optimization with read elastic -optimize false.

In case of unexpected behavior with optimized reads, add -optimize false option to read elastic to disable optimizations, and kindly report the problem as a GitHub issue.

Contributing

Want to contribute? Awesome! Don’t hesitate to file an issue or open a pull request. See the common contributing guidelines for project Juttle.