Intelie Pipes

Language Reference Documentation

version 0.8

Copyright 2014 Intelie


Table of Contents

1. Introduction
1.1. Events and filters
1.2. Chained computation model
1.3. Output rates and aggregation windows
1.4. Expressions
1.4.1. Type system
1.4.2. Row type metadata
1.4.3. Scalars and aggregations
1.4.4. Aggregations state representation
1.4.5. Aggregation TTL
1.4.6. Property access
1.4.7. Special expressions
1.4.8. Function call
2. Filters
2.1. * Special filter that allows all records through.
2.2. <term> Selects records where one of the current fields matches <term>.
2.3. <term>~<number maxEdits> Selects records where one of the current fields matches <term> with at most <maxEdits> edits.
2.4. [<term lower> TO <term upper>] Selects records where one of the current fields is between <lower> and <upper>.
2.5. <field>: <filter> Sets the current field to <field>. Usually followed by <term> (e.g. somefield:someterm).
2.6. <filter> && <filter> Selects the intersection of two other filters.
2.7. <filter> || <filter> Selects the union of two other filters.
2.8. -<filter> Selects the complement of another filter.
3. Pipes
3.1. <named...> [by <named...>] [over <window>] [every <period> | at the end] Transforms or aggregates records over configurable data window and output.
3.2. <pipe> product <pipe> Computes the cartesian product of the outputs from two pipes with compatible output rates.
3.3. <pipe> union <pipe> Concatenates the outputs from two pipes with compatible output rates.
3.4. @compress <number k>, [<number k2>,] <number... y> [by <object...>] Compresses the result from the previous pipe to at most k most important rows.
3.5. @debounce <period> [by <object...>] Only outputs rows that follows <period> time without any output.
3.6. @filter <boolean condition> Filters the results from previous pipe.
3.7. @latest Keeps the latest batch of input events and output it at the end.
3.8. @sort <sortfield... expr> Sorts the results from previous pipe.
3.9. @throttle [<number k>, ] <period> [by <object...>] Limits the output of the previous pipe to at most <k> rows per <period>.
3.10. @top <number k>, <sortfield... expr> [by <object...>] Sorts the results and gets the first k rows (possibly grouped) from previous pipe.
3.11. @unsafe Marks that any pipe executed after this must run in a non-distributed environment.
3.12. @yield [<object expr>] Extracts one field of the stream to be the output event.
4. Operators
4.1. <object># → <number> Coerces the expression to number. Shorthand to <object>:number() → <number>.
4.2. <object>$ → <string> Coerces the expression to string. Shorthand to <object>:string() → <string>.
4.3. <number> + <number> → <number> Adds two numbers.
4.4. <string> + <string> → <string> Concatenates two strings.
4.5. <number> - <number> → <number> Subtracts one number from another.
4.6. <number> * <number> → <number> Multiplies two numbers.
4.7. <number> / <number> → <number> Divides one number by another (float division).
4.8. <number> // <number> → <number> Divides one number by another (integer division).
4.9. <number> ** <number> → <number> Raises one number to another's power.
4.10. <number> % <number> → <number> Returns the rest of the division of one number by another.
4.11. -<number> → <number> Negates one number.
4.12. <boolean> and <boolean> → <boolean> Returns the logical AND of two booleans.
4.13. <boolean> or <boolean> → <boolean> Returns the logical OR of two booleans.
4.14. <boolean> xor <boolean> → <boolean> Returns the logical XOR of two booleans.
4.15. not <boolean> → <boolean> Returns the logical NOT of a boolean.
4.16. <object> == <object> → <boolean> Checks whether two objects are equal.
4.17. <object> != <object> → <boolean> Checks whether two objects are not equal.
4.18. <comparable> < <comparable> → <boolean> Checks whether the left operand compares lesser than the right one.
4.19. <comparable> <= <comparable> → <boolean> Checks whether the left operand compares lesser than or equal to the right one.
4.20. <comparable> > <comparable> → <boolean> Checks whether the left operand compares greater than the right one.
4.21. <comparable> >= <comparable> → <boolean> Checks whether the left operand compares greater than or equal to the right one.
4.22. <row> -> <identifier> → <object> Extracts a field information from a strongly-typed row value.
4.23. <object> ?? <object> → <object> Returns the first if it is not null; otherwise, returns the second.
4.24. <boolean> ? <object>, <object> → <object> If the condition is true, returns the first object; otherwise, returns the second.
5. Scalar Functions
5.1. <number>:abs() → <number> Calculates the absolute value of a number.
5.2. <number>:acos() → <number> Returns the arc cosine of the argument to an angle in radians.
5.3. <number>:asin() → <number> Returns the arc sine of the argument to an angle in radians.
5.4. <number>:atan() → <number> Returns the arc tangent of the argument to an angle in radians.
5.5. <number>:bytes([<number precision>]) → <string> Formats a number as the best possible byte multiple.
5.6. <number>:ceil([<number precision>]) → <number> Returns the smallest number that is greatest than or equal to the argument.
5.7. <number>:cos() → <number> Returns the cosine of an angle in radians.
5.8. <number>:dateadd(<period>) → <number> Adds <period> to timestamp argument.
5.9. <number>:datefloor(<period>) → <number> Rounds timestamp down to the nearest date that is divisible by <period>.
5.10. <number>:dateformat([<string format>], [<string tz>]) → <string> Formats timestamp using specified format
5.11. <number>:datesub(<period>) → <number> Sutracts <period> from timestamp argument.
5.12. <number>:exp() → <number> Calculates the exponential of a number.
5.13. <number>:floor([<number precision>]) → <number> Returns the largest number that is lesser than or equal to the argument.
5.14. <number>:format([<string format>], [<string locale>]) → <string> Formats a number according to format string and locale.
5.15. <number>:log([<number base>]) → <number> Calculates the logarithm of a number.
5.16. <number>:pow(<number exp>) → <number> Raises one number to another.
5.17. <number>:round([<number precision>]) → <number> Rounds a number to <precision> decimal places.
5.18. <number>:select(<object... list>) → <object> Selects the ith element from a list of arguments. Or null if it doesn't exist.
5.19. <number>:sin() → <number> Returns the sine of an angle in radians.
5.20. <number>:spanend(<string>, [<string tz>]) → <number> Calculates end timestamp of span based on target.
5.21. <number>:spanstart(<string>, [<string tz>]) → <number> Calculates start timestamp of span based on target.
5.22. <number>:tan() → <number> Returns the tangent of an angle in radians.
5.23. <object>:boolean() → <boolean> Converts object to boolean.
5.24. <object>:decode(<object,object... pairs>) → <object> Transforms the parameter using the translation rules defined in <pairs>.
5.25. <object>:get(<object... keys>) → <object> Much like property[keys]. Works for strings, containers and arrays.
5.26. <object>:indexin(<object... list>) → <number> Returns the first index of the value in <list>, or null if <list> does not contain it.
5.27. <object>:isin(<object... list>) → <boolean> Returns true if <list> contains the value, false otherwise.
5.28. <object>:json() → <string> Converts the object to its JSON string representation.
5.29. <object>:keep([<number ttl>]) → <object> When used in a default pipe, delays or disable (if ttl not supplied or < 0) inactive group removal.
5.30. <object>:len() → <object> Tries to get <target>'s size. Works for strings, containers and arrays.
5.31. <object>:number() → <number> Converts object to number.
5.32. <object>:object() → <object> Casts any object to its canonical object representation.
5.33. <object>:string() → <string> Converts object to string.
5.34. <string>:contains(<string>) → <boolean> Returns whether the target string contains the argument.
5.35. <string>:dateparse([<string format>], [<string tz>]) → <number> Parses timestamp using specified format
5.36. <string>:endswith(<string>) → <boolean> Returns whether the target string ends with the argument.
5.37. <string>:format(<object... args>) → <string> Uses the target string as format to arguments.
5.38. <string>:hlleval() → <number> Evaluates compressed base64 HyperLogLog data.
5.39. <string>:indexof(<string s>, [<number fromIndex>]) → <boolean> Returns the index of position of <s> inside the target string. Returns null otherwise.
5.40. <string>:lower() → <string> Converts string to lowercase.
5.41. <string>:parse([<string format>], [<string locale>]) → <number> Parses a number according to format string and locale.
5.42. <string>:regex(<string regex>) → <row> Returns a strongly typed row composed by all named groups in <regex>.
5.43. <string>:regexfind(<string regex>, [<number|string group>]) → <string> Returns the matched string by <regex> in target (or one specific group).
5.44. <string>:regexmatch(<string regex>) → <boolean> Returns true if the target matches <regex>. False otherwise.
5.45. <string>:regexsub(<string regex>, <string replacement>) → <string> Replaces all matches of <regex> in target by <replacement>.
5.46. <string>:replace(<string from>, <string to>) → <string> Replaces all instances of <from> with the string <to>.
5.47. <string>:startswith(<string>) → <boolean> Returns whether the target string starts with the argument.
5.48. <string>:substring(<number from>, [<number to>]) → <string> Returns the substring between the indices <from> and <to>.
5.49. <string>:upper() → <string> Converts string to uppercase.
5.50. compare(<comparable a>, <comparable b>) → <number> Returns a number < 0 if a < b, > 0 if a > b or 0 if a = b.
5.51. hllmerge(<string... data>) → <string> Merge many instances of compressed base64 HyperLogLog data.
5.52. max(<comparable>, <comparable>, <comparable...>) → <comparable> Returns the greatest value of all supplied arguments.
5.53. min(<comparable>, <comparable>, <comparable...>) → <comparable> Returns the least value of all supplied arguments.
5.54. newlist(<object...>) → <object> Creates a instance of java.util.List with the supplied objects.
5.55. newmap(<object,object... pairs>) → <object> Creates a instance of java.util.Map with the supplied keys and values.
5.56. pi() → <number> Returns the constant value of pi.
5.57. random(<number min>, <number max>) → <number> Returns a random value between <min> and <max>.
5.58. random([<number max>]) → <number> Returns a random value of at most <max> (1 if not defined).
5.59. timestamp() → <number> Returns the most appropriate timestamp, whether in scalar or aggregation contexts.
6. Aggregation Functions
6.1. <aggregation object expr>:if(<boolean condition>) → <object> Aggregates only events that evaluates true to <condition>.
6.2. <aggregation object expr>:overall() → <object> Merges all the results from the target aggregation.
6.3. <aggregation object expr>:overlast(<number window>) → <object> Merges the results of the last <window> aggregations.
6.4. <aggregation object expr>:prev([<number prev>]) → <object> Delays and returns the previous <number>th result from target aggregation.
6.5. all(<boolean>) → <boolean> Returns true if all ocurrences evaluate true.
6.6. any(<boolean>) → <boolean> Returns true if any ocurrence evaluates true.
6.7. avg(<number>, [<number weight>]) → <number> Calculates the (possibly weighted) average of some expression.
6.8. dcount(<object>...) → <number> Estimates the field's cardinality (distinct count) using HyperLogLog.
6.9. describe(<aggregation object expr>) → <string> Yields a string json explaining the target aggregation's inner state representation.
6.10. first(<object>) → <object> Yields the ocurrence with least timestamp.
6.11. greatest(<object>, <comparable>) → <object> Yields the greatest ocurrence in the window based on some comparable.
6.12. hll(<number log2m>, <object>...) → <number> Similar to dcount, but allows configuration of log2m parameter.
6.13. hllmerge(<string>...) → <string> Performs union of many HyperLogLog encoded data in a window.
6.14. hllset(<number log2m>, <object>...) → <string> Similar to hll, but it doesn't evaluate final cardinality, just return the sketch data.
6.15. join(<string>, [<string separator>], [<string lastSeparator>]) → <string> Join all the strings in a window.
6.16. last(<object>) → <object> Yields the ocurrence with greatest timestamp.
6.17. least(<object>, <comparable>) → <object> Yields the least ocurrence in the window based on some comparable.
6.18. map(<object key>, <object value>) → <object> Creates a java.util.Map from all events in a window.
6.19. max(<comparable>) → <comparable> Yields the greatest ocurrence in the window.
6.20. median(<number>, [<number weight>]) → <number> Estimates the median value of the population using Count-Min Sketch.
6.21. min(<comparable>) → <comparable> Yields the least ocurrence in the window.
6.22. pcount(<boolean>) → <number> Aggregates the proportion of events that evaluate true to expression.
6.23. quantile(<number q>, <number>, [<number weight>]) → <number> Estimates the q (0..1) quantile of the population using Count-Min Sketch.
6.24. set(<object>) → <object> Creates a java.util.Set from all events in a window.
6.25. smooth(<aggregation number expr>, [<number alpha>], [<number beta>]) → <number> Smoothes the curve of another aggregation.
6.26. stdev(<number>, [<number weight>]) → <number> Calculates the (possibly weighted) standard deviation of some expression.
6.27. sum(<number>) → <number> Sums all evaluations of some expression.
6.28. variance(<number>, [<number weight>]) → <number> Calculates the (possibly weighted) variance of some expression.
6.29. when(<boolean expr>) → <number> Yields the latest timestamp inside window when some condition was true.
6.30. whenfirst(<boolean expr>) → <number> Yields the first timestamp inside window when some condition was true.
6.31. Window Meta-aggregations
6.31.1. WCOUNT() Yields how many outputs are merged in the current window.
6.31.2. WSTART() Yields the minimum allowed timestamp or item for the current window.
6.31.3. WEND() Yields the maximum allowed timestamp or item for the current window.
6.31.4. OSTART() Yields the minimum allowed timestamp or item for the current output.
6.31.5. OEND() Yields the maximum allowed timestamp or item for the current output.
6.31.6. OTIMESTAMP() Yields the timestamp when the output was merged (useful for item batch pipes).
7. Timespan Language
7.1. Period definitions
7.2. Span definitions
7.2.1. now|none (point, relative) Returns the reference timestamp.
7.2.2. today (interval, relative) Equivalent to current day
7.2.3. <year>-<month>[-<day> [<hour>:[<minute>:[<second>]]]] (interval, fixed) Returns the interval relative to the selected date.
7.2.4. timestamp|ts <number> (point, fixed) Returns the point with the speficied timestamp.
7.2.5. from|since <span> to|until <span> (<both>, <both>) Returns a span from the beginning of the first span to the end of the second.
7.2.6. last <period...> (interval, relative) Equivalent to <period...> before now.
7.2.7. current|this <period> (interval, relative) Returns the full period enclosing the reference timestamp.
7.2.8. previous|yester <period> (interval, relative) Returns the period before the current <period>
7.2.9. [<period...>] before <span> (<both>, <both>) The full selected period ending in the beginning of the referenced span.
7.2.10. [<period...>] after <span> (<both>, <both>) The full selected period starting at the end of the referenced span.
7.2.11. <ordinal> <period> [of] <span> (interval, <both>) Selects the nth period inside some span.
7.2.12. <period...> ago (point, relative) Select the exact timestamp of the defined period in the past.
7.2.13. <period> of <span> (interval, <both>) Selects the full period enclosing the referenced span
7.2.14. <span> shifted by <period...> (<both>, <both>) Shifts the selected span by <period> in the past.
7.2.15. <span> shifted to <span> (<both>, <both>) Calculates the span changing the reference using another span.
7.2.16. <span> extend [left|right] [by] <number>% (<both>, <both>) Extends either the start or the end of a span by a percentual value.
7.2.17. <span> align [left|right] to <span ref> (<both>, <both>) Aligns the span's total milliseconds at the left or right side of <ref>

1. Introduction

Pipes is a language and an engine to perform distributed aggregations over real-time streams. It enables stream processing with low latency and minimum memory footprint (constant for most operations).

The language is desined to be as intuitive as possible, expressing more explicitly the data flow. The goal was to provide a language you can read from left to right (no visual backtracking) and still keep the declarative way to express the computations. The name Pipes is an allusion to the unix pipeline, which enables a powerful and elegant functional programming model on unix shells.


1.1. Events and filters

Pipes language operates over events. Events are (usually immutable) sets of (key, value) tuples and may have some time information assigned to it. In the Pipes event model, events have no intrinsic type. They're all part of a single public stream of tuples. When a type information is applicable, it is usually encoded as a property of each event (e.g. type or __type).

The engine makes no assumption on the nature, frequency or schema of the incoming events. Every operation is written keeping in mind that the event may not have the referenced field or the value may not be of same type always. It is safe to say events are pretty well representable by schemaless JSONs. As a Java library, Pipes' default configuration understands instances of java.util.Map container.

It is recommended that events have a property called 'timestamp', with a Java timestamp, that is the number of milliseconds since 1 January 1970 UTC. But even that is not enforced nor required.

Unlike a SQL database, stream processing allows very little preprocessing in the events (e.g. index creation), so the heavy optimization is done in the queries. This is why, most of the times, the queries will run in the same engine: this way, the engine can optimize and share the most processing between them.

The first part of a Pipes query always consist in selecting some events from that stream, using a filter (see Chapter 2: Filters). Filters select values from the public stream. Each filter contains basically predicates about some fields and boolean operations between them. This has a great potential for optimization. For example, given the filters type:http status:404, type:http status:200 and type:http status:(2?? || 3??), the engine builds automata like the ones in the picture bellow:

Filters automata

Filters automata


1.2. Chained computation model

After filtering the public stream, you can chain the result through one or more pipes (see Chapter 3: Pipes). Each pipe may transform, filter or aggregate the results from the previous one. In the language, this chaining is represented by the operator =>.

Pipes computation model

Pipes computation model

Each pipe in the chain can have one or more output rates. For example, a pipe can receive an input every time the filter matches some event in the public stream, but only output events in batches of one minute. There are three different types of output: every time period, every item batch and at the end. For more information, see Section 1.3: Output rates and aggregation windows.

Inside each pipe, you can interact with events using expressions that access, transforms or aggregates its properties. For more information, see Section 1.4: Expressions.

Most pipes are implemented using parallel algorithms that allow seamless distribution over multiple physical machines. Usually, each pipe can be classified in three groups:

  • safe: pipes that can execute completelly in parallel without affecting the final result (e.g. filter pipe)
  • semi-safe: pipes that can execute most of their work in parallel, but must merge their output with another nodes (e.g. default pipe)
  • unsafe: pipes that must execute in a single node in order to compute the correct result (e.g. compress pipe)

When writing a typical query, it's likely your pipes will distribute as follows:

Typical pipe in single machine

Typical pipe in single machine

Typical pipe in multiple machines

Typical pipe in multiple machines


1.3. Output rates and aggregation windows

Every pipe in the chain will have one or more output rates. In this section, we will focus on how to declare them in default pipe, but the concept itself can be applied to any pipe type.

There are three types of output rates: every time period, every item batch and at the end.

  • every <period>: outputs every time the pipe's internal clock ticks a constant period. E.g. every 5 seconds
  • every <number> items: outputs every time the specified amount of items is processed by the engine. E.g. every 5 items
  • at the end: outputs only at the end of execution. Useful for summary queries.

The available period units are:

unitalso acceptsequivalent to
millisecondmilli, ms
secondsec1000 ms
minutemin60000 ms
hour360000 ms
day
weekwk
monthmon
bimester2 months
quarter3 months
semester6 months
yearyr

Keep in mind that in the case of default pipe, there may be a difference between the declared rate and the effective rate. In the query bellow, for example, the declared output rate in the last pipe is every item, but effectivelly the query outputs every 10 seconds.

* => count() every 10 seconds => count, prev(count) every item

Outputs every 10 seconds

This happens because when writing a query, you declare the output with pipe's local view. But the effective output rate considers the entire query, from the filter to the last pipe. In the example above, although the last pipe outputs every time it receives two items

Also in default pipe, it is allowed to aggregate over a period other than the output. You can, for example, output a result every minute, that corresponds to the last 5 minutes of aggregation. It is an aggregation over time window.

Important
The window concept only applies to the default pipe.

In traditional stream processing libraries, every event is stored inside a data window and the aggregation runs over them. This allows a fine grained configuration of outputs vs windows. In Pipes, the goal is to use the least resources possible. So, instead of storing events inside the window, we only store the aggregation outputs (in a mergeable format, see Section 1.4.4: Aggregations state representation). The picture below exemplifies:

Windows merges aggregations outputs

Windows merges aggregations outputs

This way, the memory footprint of a query is very low and predictable. Of course this approach introduces some limitations (e.g. the window size must be a multiple of the output rate) but the benefits pay it off.

There are three types of windows:

  • over last (<period>|<number> items): aggregates over last <period> / <output_rate> outputs. E.g. over last 5 minutes
  • over current (<period>|<number> items): aggregates since the beginning of the current period. E.g. over current day
  • over all: aggregates over all events, merges all outputs.

It is worth noticing that item-based windows can only be used with item-based outputs. And time-based windows can only be used with time-based outputs. Also, obviosly at the end pipes does not support any window definition and there is only a single output at the end. Below, there is a table with examples showing what is valid and what is not.

window/output examplesevery 2 minutesevery 2 itemsat the end
over last 10 minutesvalidinvalidinvalid
over current 10 minutesvalidinvalidinvalid
over last 10 itemsinvalidvalidinvalid
over current 10 itemsinvalidvalidinvalid
over allvalidvalidinvalid

1.4. Expressions

Expressions access and transform data from events. We will introduce them here, explaing Pipes' type system and how aggregations work. For detailed information on expressions, please refer Chapter 4: Operators, Chapter 5: Scalar Functions and Chapter 6: Aggregation Functions.


1.4.1. Type system

The Pipes language is statically typed, but has a very simple type system. There are four main types: number, string, boolean and object. All types inherit from object. There are a few other types, used in some specific functions. The table below summarizes all types:

typejava equivalentexample
numberjava.lang.Double42, 42.0, 1e42
stringjava.lang.String"42"
booleanjava.lang.Booleantrue, false
objectjava.lang.Objectnull
comparablejava.lang.Comparable
periodnet.intelie.pipes.time.Period1 day, 42 minutes, 1 month {America/Sao_Paulo}
rownet.intelie.pipes.Row
row_listnet.intelie.pipes.RowList

Pipes does not require events to be strongly typed (although it allows them to be). Because of this, the type of a property access may be not known at compile time. Still, the language is statically typed, so some type must be assigned to the expression. The chosen type depends on the configuration, but the default is to be a string. This means that even if the value of the field in the event contains a numeric type, it will be cast as string before being used by the engine.

Types are checked in compile time. For example, trying to use a function that requires a number passing a string (like avg(someproperty)) results in error:

PipeException: Error in call avg(someproperty) with types <PropertyAccess[string]>, cause(s):
AvgAggregation: The parameter #1 <someproperty>; must be 'number' instead of 'string'.

It is possible to tell the engine what type an expression must have, using the type's name as a function. E.g. number(someproperty) indicates that someproperty is a numeric value and will coerce to a number in cases it isn't. When used with other expressions, those functions also convert values: number("42") will just return 42.

The string and number types also have shorthands for these functions:

  • number(someproperty) is the same as someproperty#
  • string(someproperty) is the same as someproperty$

1.4.2. Row type metadata

The type row may have some metadata associated to it. This is particularly true when using the expression * after a non-raw pipe (see Section 1.4.7: Special expressions), but there are other expressions that may return an instance of row with metadata (e.g. <string>:regex(<string regex>) → <row>).

An instance of row with metadata has two main uses:

Examples

*
=> id, name, value#
=> last(*) every minute
=> @yield

Selects fields and outputs the last every minute. Please note that last(*) outputs a single field of type row with 3 fields in its metadata and the pipe @yield expands them into an event.

*
=> id, name, value#
=> expand last(*) every minute

Equivalent to above. But the pipe @yield accepts any scalar, while the expand keyword only works with row values with metadata.

* => expand name:regex(r'^(?<first>\w+)( \w+)* (?<last>\w+)$') as extracted_*_name

Yields an event with the fields extracted_first_name and extracted_last_name.


1.4.3. Scalars and aggregations

Every Pipes expression can be either a scalar or an aggregation. We call it a level.

Scalars are expressions that receive an event and computes its result immediately. E.g. to get the length of a string, one can write len(someproperty). In this case, the whole expression is a scalar.

Aggregations are expressions that receive several events but only computes its result at the end of the window. E.g. to get the average of a property during the window, one can write avg(someproperty#). In this case, the whole expression is an aggregation.

When a scalar has only constant values (and operations over them) we can say it is a constant. E.g. the literal value "some value" is a constant. len("some value") is also a constant, and may be optimized in compile to time to the value 10.

Most functions just assume the same level as some (sometimes all) of its parameters. E.g. in the expression len(first(someproperty)) the resulting expression is an aggregation, because first(someproperty) is also an aggregation. When a function always result in an aggregation, we refer it just as aggregation.

Some functions (mostly aggregations) require parameters to have some specific level. E.g. in the aggregation avg(<number>, [<number weight>]) → <number>, the parameter must be a scalar. For example, trying to compile avg(first(someproperty#)) result in an error:

PipeException: Error in call avg(first(someproperty#)) with types <FirstAggregation[number]>, cause(s):
AvgAggregation: The parameter #1 <first(someproperty#)> must be a scalar instead of an aggregation.

For most purposes, every constant is also an scalar (it just ignores the event) and every scalar may act like an aggregation, if required to.

Level set diagram

Level set diagram


1.4.4. Aggregations state representation

One of the main features in Pipes is the ability to run aggregations distributedly. To make this possible, most aggregations are written using parallel versions of the algorithms. Most of the solutions presented in this section only affect how semi-safe pipes (see Section 1.2: Chained computation model) represent their result over the wire. Additionally, this same mechanism is used by windows in default pipe (see Section 1.3: Output rates and aggregation windows) and meta-aggregations (e.g. overlast, overall) to merge the results from many outputs.

Lets take as example the average aggregation. Suppose there will be two machines calculating the query type:http => avg(response_time#) every minute in parallel. They share their results every minute. Suppose:

  • Machine 1 received its events and calculated the average response time is: 237.44
  • Machine 2 received its events and calculated the average response time is: 1061.08

Using only this information, it is impossible to know the global average value. If we just take the mean value from both results (649.26) this value could be wrong, because the machine 1 could have received way more events than machine 2, so its result should weight more in the global result.

That's the reason why every aggregation has an internal state representation more complex than just its result. In the case of average aggregation its state representation has two fields: "mean" and "sumw".

  • mean holds the current result
  • sumw holds the sum of all weights (or the count, in the case of unweighted average)
Aggregations state representation

Aggregations state representation

Some aggregations have a more complex inner state, like the distinct count aggregation, that uses the HyperLogLog data structure and must share the sketch content with the other nodes to merge its results. But even that state uses constant memory for its representation.

There is a meta-aggregation in pipes called describe that operates over other aggregations, but instead of yielding the result at the end of the window, it show a json representation of the aggregation's inner state. Using the same example as before, writing describe(avg(response_time#)) would return a value like:

{"sumw":262.0,"mean":416.628854962}

1.4.5. Aggregation TTL

When using default pipe with a non-empty group, the query tries to reduce memory consumption by removing any group that has not received any event in the last output period. Most aggregations do not hold any state other than the required in the last period, so this is fine. But some aggregations need additional state (e.g. overlast, prev) and may prevent this memory reclaim. There are even aggregations that need to be kept in memory forever to work properly (e.g. overall).

To solve this problem, every aggregation has a property called TTL (time to live). Which defines how long it should prevent row state from being deleted from a query if no events are received in that group. If there is no grouping, the property is ignored.

Most aggregations have TTL=1, but some aggregations with persistent state may have longer TTLs (even infinite). There is also the function <object>:keep([<number ttl>]) → <object> that allows to change the TTL of any expression to any desired value.


1.4.6. Property access

Property expressions are the way to access event's properties. In a glance, every identifier denotes a property access. Properties are scalar properties, but they can also be used in aggregation pipes, yielding the last evaluated expression.

* => last(cpu#), len(host) by host

In the example above, len(host) is a scalar, but acts as aggregations and returns the length of the host name.

Sometimes, there are property names that collide with keywords in the language. Or cases where the property name doesn't match the traditional identifier syntax. In this case, you can use curly braces to define the property name, like in the example below:

* => last({CPU usage value}) as cpu, len(host) as {product} by host

The way properties are fetched is specific in each configuration, but by default, it reads values from a java.util.Map instance, always as a string (casting to it when necessary). To access it as a number, for example, use the cast operators, e.g. prop# or prop:number().

Some configurations may allow indexing properties access. It may be done by using brackets. For example, assuming the event has a property config that is itself a java.util.Map:

* => config['hostname']

Important
This indexing syntax only applies to property access and is useful only to apply indexing before the usual cast occurs. To access indexed values in iterables, maps and strings please use the function <object>:get(<object... keys>) → <object>. In this case, config['hostname'] is somewhat equivalent to object(config):get('hostname')$ (access property config as an object, extract from it the value indexed by 'hostname' and convert it to string).

The property expression can also be used to access values from previous pipes. In this case, the value will have the same type as defined in the referenced pipe. Example:

* 
=> avg(cpu#) as avg, stdev(cpu#) as stdev every minute
=> avg-2*stdev as lower, avg+2*stdev as upper

1.4.7. Special expressions

There are also two special expressions that act like properties, but have special semantics: _ and *.

The expression _ returns the value of the default property. In the first pipe it is the first filter default property. In subsequent pipes, it is the first property of the previous pipe. Assuming "host" is the default property, the two queries bellow are equivalent.

host1 || host2
=> avg(cpu#) as avg by _ every minute
=> _*2 as doubled
host:host1 || host:host2
=> avg(cpu#) as avg by host every minute
=> avg*2 as doubled

The expression * returns the raw input event. Very useful when you don't want to repeat every field present in the event (or when you don't even know them in advance).

host1 => last(*) every second => @yield

Yields only the last event every second. In the example above, the expression last(*) has the type object and is as weakly typed as the input stream.

host1 
=> cpu#, memory# 
=> last(*) every second 
=> last->cpu, last->memory

In the (contrived) example above, the expression last(*) has the type row and is strongly typed. That's why we can use the operator 'peek' on the property last after.


1.4.8. Function call

Pipes language allows previously-configured functions to be called inside queries. The default syntax has nothing new. You can call a function by using parentheses after the function name. E.g. fn(param-1, param-2, ..., param-n). You can check the functions available by default at Chapter 5: Scalar Functions and Chapter 6: Aggregation Functions. Please note that it is possible to add other functions by configuration, so it isn't a complete list.

Other than the default syntax, there are other ways to call a function. For example, the syntax a:fn is equivalent to fn(a) and a:fn(b) is equivalent to fn(a, b). You can find a table bellow with all the alternative syntaxes for method call.

syntax equivalent to
a:fn fn(a)
a:fn(b, c) fn(a, b, c)
a#fn fn(a#)
a#fn(b, c) fn(a#, b, c)
a$fn fn(a$)
a$fn(b, c) fn(a$, b, c)
:fn fn(_)
:fn(b, c) fn(_, b, c)
#fn fn(_#)
#fn(b, c) fn(_#, b, c)
$fn fn(_$)
$fn(b, c) fn(_$, b, c)

2. Filters

Every query in Pipes is required to start with a filter. The filter runtime is optimized to allow many queries running over the same stream, sharing as much processing as possible.

The filter syntax is different from the rest of the syntax. It is inspired in the Lucene's query syntax.

Examples:

type:http status:404

Filters all records where the field "type" has a value "http" and "status" has a value "404"


2.1. *

Special filter that allows all records through.

When used in conjunction with field syntax, selects all records that contain that field.

Examples:

*

All records (no filter)

type:*

Filters records that contains any value in the field "type".


2.2. <term>

Selects records where one of the current fields matches <term>.

The match is case insensitive.

The use of wildcards (* and ?) is allowed.

Use the field syntax to change the default field.

Examples:

http

If the default field is "somefield", selects records where "somefield" has a value "http" (case insensitive)

otherfield:http

Selects records where "otherfield" has a value "http" (case insensitive)

otherfield:http*404

Selects records where "otherfield" has a value that starts with "http" and ends with "404" (case insensitive)


2.3. <term>~<number maxEdits>

Selects records where one of the current fields matches <term> with at most <maxEdits> edits.

The match is case insensitive.

Wildcards are not allowed in this filter.

Examples:

http~2

If the default field is "somefield", selects records where "somefield" has a value similar to "http" (e.g. "hxxp").


2.4. [<term lower> TO <term upper>]

Selects records where one of the current fields is between <lower> and <upper>.

The match is case insensitive.

The range filter is inclusive in both ends. To make it exclusive, change "[" to "(" and/or "]" to ")".

The TO keyword must be written in uppercase. It can also be replaced by a single comma with no change in meaning.

Wildcards are not allowed in this filter. Except for the single *. In this filter it means an unbounded end.

There are shorthand syntaxes for some versions of this query:

  • [lower TO *] can be writen as >= lower
  • (lower TO *] can be writen as > lower
  • [* TO upper] can be writen as <= upper
  • [* TO upper) can be writen as < upper

Examples:

status:[200 TO 299]

Selects records where status is between 200 and 299

status:[200, 300)

Selects records where status is between 200 and 299 (300 exclusive)

status:[200, *]

Selects records where status is greater than or equal to 200.

status:>=200

Same as the filter above.


2.5. <field>: <filter>

Sets the current field to <field>. Usually followed by <term> (e.g. somefield:someterm).

Any filter expression can be used inside a filter field. Even other filter fields.

Examples:

type:http

Selects records where the field "type" is equal to "http".

type:(http || cpu)

Selects records where the field "type" is equal to "http" or "cpu".


2.6. <filter> && <filter>

Selects the intersection of two other filters.

It's the default filter conjunction, so it can safely be omitted.

The operator && is equivalent to & and AND (must be uppercase).

Examples:

http verb:get status:404

Selects records with the default field equal to "http", "verb" equal to "get" and "status" equal to "404"

http && verb:get && status:404

Same as above, but with explicit &&

tag:(important && resolved)

Selects records with the field "tag" containing both "important" and "resolved" values


2.7. <filter> || <filter>

Selects the union of two other filters.

The operator || is equivalent to | and OR (must be uppercase).

Examples:

http || verb:get || status:404

Selects records with the default field equal to "http", "verb" equal to "get" or "status" equal to "404"

tag:(important || resolved)

Selects records with the field "tag" containing either "important" or "resolved" values


2.8. -<filter>

Selects the complement of another filter.

The operator - is equivalent to ! and NOT (must be uppercase).

Examples:

-verb:get

Selects records with the field "verb" different of "get"

tag:(a* -abnormal)

Selects records with the field "tag" starting with "a", but not equal to "abnormal"


3. Pipes

Pipes are the main building blocks in this language. No surprise the language is named after them. Pipes represent one single processing element that consumes events from the input and produces other events in the output.

The first pipe reads from the filtered public stream. Then, the output from one pipe can be chained as the input to another using the operator =>.

Each kind of pipe has its own properties, like rate of output and data window type and size. Some pipes can run on a distributed environment, some cannot. Some pipes may consume very low memory, some may need a lot of memory to run. All of these properties are derived and checked in compile time.

Examples:

type:http 
=> avg(response_time#) as time by host every minute
=> @top 10, time desc

Outputs every minute the 10 hosts with greatest average response time.


3.1. <named...> [by <named...>] [over <window>] [every <period> | at the end]

Transforms or aggregates records over configurable data window and output.

This is the default type of pipe. It receives events, aggregates them and output the result based on what is configured on the every clause. By default, the aggregation resets its state on every output, but the event retention can be configured on the over clause. The acceptable values for both clauses are discussed at Section 1.3: Output rates and aggregation windows.

The first part of the pipe defines which aggregations will be made. It accepts all kinds of expressions and treats them all as aggregations. All expressions can act as aggregations. The expression can receive an automatic name based on its contents or you can defined custom one using the keyword as. Example:

* => sum(x#)/count()**2 as value

Outputs the sum of the variable x divided by the event count squared every second.

If the expression is of type row and has metadata associated to it, you can also use the keyword expand to expand its values. Fore more info, see Section 1.4.2: Row type metadata. It's worth noting that expad keyword skips any timestamp fields the row might have.

The by clause allows one to group the results by one or more scalar values. Aggregations are not allowed in the by clause, only scalar values. When grouping, the group only outputs if at least one of its aggregations has still TTL, this usually means that it only outputs if the group received some event in the last period, but there are exceptions. For more info, see Section 1.4.5: Aggregation TTL.

All clauses are optional, so the simplest possible pipe is an empty string, which is equivalent to count() every 1 second. Examples:

* =>

Outputs how many events match the query type:http every second.

* => by host

Outputs how many events match the query type:http every second, aggregating by host.

This pipe can be classified as safe, semi-safe or unsafe, depending on which expressions, period and output are chosen. For more information on pipes safety, see Section 1.2: Chained computation model.

The default pipe is safe if and only if all the expressions are at least scalars, it has no by nor over clauses and the every clause is ommitted or exactly every item. This means in practice the pipe is safe if it is only a stateless transformation item-by-item of the input stream. Example:

* => id, name, value# * 3.14

The default pipe is unsafe if and only if the output type is every <n> items or it has some aggregation that forces the pipe to be unsafe (e.g. smooth). This means in practice the pipe is unsafe if it depends on some total order in the input, achieved running in a single node. This is true for item batch outputs and some aggregations. Please notice that unsafe pipes with no safe or semi-safe pipes before must be preceeded with @unsafe. Example:

* => @unsafe => avg(value#) every 10 items
* => @unsafe => smooth(value#) every 10 seconds

The default pipe is semi-safe in all other cases, i.e., when the output is by time period and it has no aggregation that makes it unsafe.

* => avg(value#) every 10 seconds

3.2. <pipe> product <pipe>

Computes the cartesian product of the outputs from two pipes with compatible output rates.

The outputs from two pipes are compatible if they are same type. For example every 5 seconds is compatible with every 7 seconds, but not with every 5 items. In this case, the resulting pipe would assume both outputs as its own output definition.

The resulting pipe will have all the fields from both queries (some fields may be renamed to avoid collision).

Examples:

* => [
    sum(value#) by host
    product
    sum(value#) as total
]

Computes the sum grouped by host with a field containing the total sum in each group.

The example above would generate events like:

{"timestamp":1399927869000, "host":"host1","sum_value":42.0, "total_sum":129.0}
{"timestamp":1399927869000, "host":"host2","sum_value":43.0, "total_sum":129.0}
{"timestamp":1399927869000, "host":"host3","sum_value":44.0, "total_sum":129.0}

A more practical example:

* 
=> count() by host
=> [ @yield * product avg(count) ]
=> @filter count > avg_count

Outputs all hosts that have an event count greater than the average of all hosts.


3.3. <pipe> union <pipe>

Concatenates the outputs from two pipes with compatible output rates.

The outputs from two pipes are compatible if they are same type. For example every 5 seconds is compatible with every 7 seconds, but not with every 5 items. In this case, the resulting pipe would assume both outputs as its own output definition.

If both pipes' type definitions are also compatible, the resulting pipe will assume the first pipe's types and field names. The field names must not be equal between the pipes, only their types and order.

Example:

* => [
    sum(value#) by host
    union
    'Total', sum(value#)
]

Computes the union of the sum grouped by host with the total sum. Note that the fields in the by clause come first in the final order, that's why 'Total' comes first in the second pipe.

The example above would generate events like:

{"timestamp":1399927869000, "host":"host1","sum_value":42.0}
{"timestamp":1399927869000, "host":"host2","sum_value":43.0}
{"timestamp":1399927869000, "host":"host3","sum_value":44.0}
{"timestamp":1399927869000, "host":"Total","sum_value":129.0}

3.4. @compress <number k>, [<number k2>,] <number... y> [by <object...>]

Compresses the result from the previous pipe to at most k most important rows.

Compress pipe keeps most of the previous pipe metadata, except for the output. It outputs only at the end. It is also an unsafe pipe (see Section 1.2: Chained computation model).

The pipe can optionally be grouped. In this case, it may be useful to define a global maximum to avoid that a group explosion causes the number of points to grow too much.

Examples:

* 
=> count() every minute
=> @compress 300, _

Computes the event count every minute, but compress it to the 300 most important points at end of the query.

* 
=> count() by host every minute
=> @compress 300, 1000, _ by host

Computes the event count by host every minute, but compress it to each host's 300 most important points at end of the query. Also limits to output at most 1000 points globally (even if there are 4 hosts or more).


3.5. @debounce <period> [by <object...>]

Only outputs rows that follows <period> time without any output.

The exceeding rows are discarded. Due to its nature, this is an semi-safe pipe (see Section 1.2: Chained computation model). When running in distributed environment, this pipe is used as-is both in the map and reduce pipes.

Debouncing: the red events are discarded.

Debouncing: the red events are discarded.

Examples:

* 
=> count() every minute
=> @filter _ > 100
=> @debounce 15 minutes

Alert when the number of events in a minute is greater than 100, but only outputs if there was a silence of at least 15 minutes.

* 
=> count() by host every minute
=> @filter _ > 100
=> @debounce 15 minutes by host

Same as previous, but grouping by host.


3.6. @filter <boolean condition>

Filters the results from previous pipe.

Filter pipe does not change any pipe metadata, it just repeats previous pipe info.

Examples:

* 
=> count() by host
=> @filter _ > 10

Filter only hosts that have event count greater than 10.


3.7. @latest

Keeps the latest batch of input events and output it at the end.

This pipe is a convenience to output only the latest batch of events with the output at the end. This concept of item batch doesn't exist in the syntax, so it would be harder to write queries that are only interested in the latest value for each group, for example. They are especially useful when replaying events in the past for a graph that only shows the latest value.

@latest is an unsafe pipe (see Section 1.2: Chained computation model).

Examples:

* 
=> count() by host
=> @latest

Outputs only the latest batch of counts by host at the end of the execution.


3.8. @sort <sortfield... expr>

Sorts the results from previous pipe.

Sort pipe does not change any pipe metadata, it just repeats previous pipe info. It also doesn't make any aggregation, it just outputs as soon as it finishes processing input.

The expressions can be any comparable, optionally followed by asc or desc to make the sort order explicit.

Examples:

* 
=> count() by host
=> @sort _ desc, host

Sorts the hosts by event count descending, and then by host (when the counts are equal).


3.9. @throttle [<number k>, ] <period> [by <object...>]

Limits the output of the previous pipe to at most <k> rows per <period>.

The exceeding rows are discarded. This is an unsafe pipe (see Section 1.2: Chained computation model).

Throttling: the red events are discarded.

Throttling: the red events are discarded.

Examples:

* 
=> count() every minute
=> @filter _ > 100
=> @throttle 15 minutes

Alert when the number of events in a minute is greater than 100, but throttles the output to once in every 15 minutes.

* 
=> count() by host every minute
=> @filter _ > 100
=> @throttle 2, 15 minutes by host

Same as previous, but allows at most two events by host every 15 minutes.


3.10. @top <number k>, <sortfield... expr> [by <object...>]

Sorts the results and gets the first k rows (possibly grouped) from previous pipe.

Top pipe does not change any pipe metadata, it just repeats previous pipe info. It also doesn't make any aggregation, it just outputs as soon as it finishes processing input.

The expressions can be any comparable, optionally followed by asc or desc to make the sort order explicit.

Examples:

* 
=> count() by host
=> @top 10, _ desc

Outputs the 10 hosts with the most events.

* 
=> count() by url, host
=> @top 10, _ desc by host

Outputs, for every host, the 10 urls with the most events.


3.11. @unsafe

Marks that any pipe executed after this must run in a non-distributed environment.

The engine forces that each unsafe pipe must be preceeded with a safe or semi-safe pipe. It is recommended to avoid excessive communication between nodes. When you really must write a query that breaks this rule, the unsafe pipe must be preceeded with this pipe, that is in fact an empty pipe that forces the pipe split, making everything after it to run in the reduce node. For more information, see Section 1.2: Chained computation model.

Examples:

* 
=> @unsafe
=> count() over all every item

Outputs the total count of events since the beginning of the query every time an item arrives.


3.12. @yield [<object expr>]

Extracts one field of the stream to be the output event.

If the expression is not defined, it is assumed to be _.

If the expression type is row and it has metadata, the pipe assumes the type's metadata. See intro-expression-rowmeta for more information.

Examples:

* 
=> name:regex(r'^(?<first>\w+)( \w+)* (?<last>\w+)$') as name
=> @yield name

Yields an event with the fields first and last.


4. Operators

Operators are elements of syntax that allows simple manipulation of scalar and aggregation values.

Every operator translates directly to a function call. E.g. the expression 2+2==4 is equivalent to .eq(.add(2, 2), 4).

You can find bellow a table with the operator group precedence, from highest to lowest. Operators inside same group have the same precedence.

Group Operators
Primary <row> -> <identifier> → <object>
<object># → <number>
<object>$ → <string>
Unary not <boolean> → <boolean>
-<number> → <number>
Null coaslescing <object> ?? <object> → <object>
Power <number> ** <number> → <number>
Multiplicative <number> * <number> → <number>
<number> / <number> → <number>
<number> // <number> → <number>
<number> % <number> → <number>
Additive <number> + <number> → <number>
<number> - <number> → <number>
<string> + <string> → <string>
Comparative <comparable> > <comparable> → <boolean>
<comparable> >= <comparable> → <boolean>
<comparable> < <comparable> → <boolean>
<comparable> <= <comparable> → <boolean>
Equality <object> == <object> → <boolean>
<object> != <object> → <boolean>
Logical AND <boolean> and <boolean> → <boolean>
Logical XOR <boolean> xor <boolean> → <boolean>
Logical OR <boolean> or <boolean> → <boolean>
Ternary <boolean> ? <object>, <object> → <object>

Only the ternary operator is right-associative. All other operators are left-associative. You can change both procedence and associativity by using parentheses. E.g. a+b*c is equivalent to a+(b*c).


4.1. <object># → <number>

Coerces the expression to number. Shorthand to <object>:number() → <number>.

The expression a# is equivalent to number(a).

Example:

* => avg(response_time#) every minute

Yields the average of the field response_time every minute.


4.2. <object>$ → <string>

Coerces the expression to string. Shorthand to <object>:string() → <string>.

The expression a$ is equivalent to string(a).

Example:

* => 'Count: ' + count()$ every minute

Yields the number of events every minute as a string 'Count: X'.


4.3. <number> + <number> → <number>

Adds two numbers.

The expression a+b is equivalent to .add(a, b).


4.4. <string> + <string> → <string>

Concatenates two strings.

The expression a+b is equivalent to .add(a, b).


4.5. <number> - <number> → <number>

Subtracts one number from another.

The expression a-b is equivalent to .sub(a, b).


4.6. <number> * <number> → <number>

Multiplies two numbers.

The expression a*b is equivalent to .mul(a, b).


4.7. <number> / <number> → <number>

Divides one number by another (float division).

The expression a/b is equivalent to .div(a, b).


4.8. <number> // <number> → <number>

Divides one number by another (integer division).

The expression a//b is equivalent to .intdiv(a, b).


4.9. <number> ** <number> → <number>

Raises one number to another's power.

The expression a**b is equivalent to .pow(a, b) or even pow(a, b) .


4.10. <number> % <number> → <number>

Returns the rest of the division of one number by another.

The expression a%b is equivalent to .mod(a, b).


4.11. -<number> → <number>

Negates one number.

The expression -a is equivalent to .neg(a).


4.12. <boolean> and <boolean> → <boolean>

Returns the logical AND of two booleans.

The expression a and b is equivalent to a&b, a&&b and .and(a, b).


4.13. <boolean> or <boolean> → <boolean>

Returns the logical OR of two booleans.

The expression a or b is equivalent to a|b, a||b and .or(a, b).


4.14. <boolean> xor <boolean> → <boolean>

Returns the logical XOR of two booleans.

The expression a xor b is equivalent to a^b and .xor(a, b).


4.15. not <boolean> → <boolean>

Returns the logical NOT of a boolean.

The expression not a is equivalent to !a and .not(a).


4.16. <object> == <object> → <boolean>

Checks whether two objects are equal.

The expression a==b is equivalent to .eq(a, b).


4.17. <object> != <object> → <boolean>

Checks whether two objects are not equal.

The expression a!=b is equivalent to .neq(a, b).


4.18. <comparable> < <comparable> → <boolean>

Checks whether the left operand compares lesser than the right one.

The expression a<b is equivalent to .lt(a, b).


4.19. <comparable> <= <comparable> → <boolean>

Checks whether the left operand compares lesser than or equal to the right one.

The expression a<=b is equivalent to .lteq(a, b).


4.20. <comparable> > <comparable> → <boolean>

Checks whether the left operand compares greater than the right one.

The expression a>b is equivalent to .gt(a, b).


4.21. <comparable> >= <comparable> → <boolean>

Checks whether the left operand compares greater than or equal to the right one.

The expression a>=b is equivalent to .gteq(a, b).


4.22. <row> -> <identifier> → <object>

Extracts a field information from a strongly-typed row value.

The expression a->identifier cannot be represented as method because the method .peek() requires a direct instance of java.lang.String.

Example:

* 
=> name:regex(r'^(?<first>\w+)( \w+)* (?<last>\w+)$') as match 
=> '%s, %s':format(match->last, match->first) as converted

Converts the name field to the format "Last, First".


4.23. <object> ?? <object> → <object>

Returns the first if it is not null; otherwise, returns the second.

The expression a??b is equivalent to .coalesce(a, b).

Example:

* => first(name) ?? 'None' as first_name every second

Yields the first name in the window, or 'None' if no name was found.


4.24. <boolean> ? <object>, <object> → <object>

If the condition is true, returns the first object; otherwise, returns the second.

The expression a?b,c is equivalent to .iif(a, b, c).

Please note this is the only right-associative operator. It is so to allow constructions like:

AllMonitorings
=> (type == 'http' ? 'Http Monitoring',
    type == 'cpu' ? 'CPU Monitoring',
    type == 'mem' ? 'Memory Monitoring',
    'Unknown') as monitoring_type

Example:

* => count()%2==0 ? 'Even', 'Odd' as type every second

Yields 'Even' or 'Odd' every second, depending on the number of events in the window.


5. Scalar Functions


5.1. <number>:abs() → <number>

Calculates the absolute value of a number.


5.2. <number>:acos() → <number>

Returns the arc cosine of the argument to an angle in radians.


5.3. <number>:asin() → <number>

Returns the arc sine of the argument to an angle in radians.


5.4. <number>:atan() → <number>

Returns the arc tangent of the argument to an angle in radians.


5.5. <number>:bytes([<number precision>]) → <string>

Formats a number as the best possible byte multiple.


5.6. <number>:ceil([<number precision>]) → <number>

Returns the smallest number that is greatest than or equal to the argument.


5.7. <number>:cos() → <number>

Returns the cosine of an angle in radians.


5.8. <number>:dateadd(<period>) → <number>

Adds <period> to timestamp argument.

Example:

* => timestamp():dateadd(1 day) every second

Adds one day to the current timestamp.


5.9. <number>:datefloor(<period>) → <number>

Rounds timestamp down to the nearest date that is divisible by <period>.

Example:

* => timestamp():datefloor(1 day) every second

Returns the beginning of the current day.


5.10. <number>:dateformat([<string format>], [<string tz>]) → <string>

Formats timestamp using specified format

Example:

* => timestamp():dateformat("dd/MM/YYYY hh:mm", "UTC") as date every second

Formats the current timestamp assuming UTC time zone.


5.11. <number>:datesub(<period>) → <number>

Sutracts <period> from timestamp argument.

Example:

* => timestamp():datesub(1 day) every second

Subtracts one day to the current timestamp.


5.12. <number>:exp() → <number>

Calculates the exponential of a number.


5.13. <number>:floor([<number precision>]) → <number>

Returns the largest number that is lesser than or equal to the argument.


5.14. <number>:format([<string format>], [<string locale>]) → <string>

Formats a number according to format string and locale.

Examples:

* => 123.456:format(".00") every second

Format with 2 digits after decimal point.

=> 12345.678:format("0,000.00", "pt-BR") as money every second

Format 12345.678 to '12.345,68' using Brazilian locale.


5.15. <number>:log([<number base>]) → <number>

Calculates the logarithm of a number.


5.16. <number>:pow(<number exp>) → <number>

Raises one number to another.


5.17. <number>:round([<number precision>]) → <number>

Rounds a number to <precision> decimal places.


5.18. <number>:select(<object... list>) → <object>

Selects the ith element from a list of arguments. Or null if it doesn't exist.

Example:

* => 1:select("a", "b", "c") as selected every second

Yields 'selected:b' every second. The list of arguments is zero-based.


5.19. <number>:sin() → <number>

Returns the sine of an angle in radians.


5.20. <number>:spanend(<string>, [<string tz>]) → <number>

Calculates end timestamp of span based on target.


5.21. <number>:spanstart(<string>, [<string tz>]) → <number>

Calculates start timestamp of span based on target.


5.22. <number>:tan() → <number>

Returns the tangent of an angle in radians.


5.23. <object>:boolean() → <boolean>

Converts object to boolean.


5.24. <object>:decode(<object,object... pairs>) → <object>

Transforms the parameter using the translation rules defined in <pairs>.


5.25. <object>:get(<object... keys>) → <object>

Much like property[keys]. Works for strings, containers and arrays.


5.26. <object>:indexin(<object... list>) → <number>

Returns the first index of the value in <list>, or null if <list> does not contain it.


5.27. <object>:isin(<object... list>) → <boolean>

Returns true if <list> contains the value, false otherwise.


5.28. <object>:json() → <string>

Converts the object to its JSON string representation.


5.29. <object>:keep([<number ttl>]) → <object>

When used in a default pipe, delays or disable (if ttl not supplied or < 0) inactive group removal.


5.30. <object>:len() → <object>

Tries to get <target>'s size. Works for strings, containers and arrays.


5.31. <object>:number() → <number>

Converts object to number.

Example:

* => avg(response_time:number()) every minute

Yields the average of the field response_time every minute.


5.32. <object>:object() → <object>

Casts any object to its canonical object representation.


5.33. <object>:string() → <string>

Converts object to string.

Example:

* => 'Count: ' + count():string every minute

Yields the number of events every minute as a string 'Count: X'.


5.34. <string>:contains(<string>) → <boolean>

Returns whether the target string contains the argument.


5.35. <string>:dateparse([<string format>], [<string tz>]) → <number>

Parses timestamp using specified format


5.36. <string>:endswith(<string>) → <boolean>

Returns whether the target string ends with the argument.


5.37. <string>:format(<object... args>) → <string>

Uses the target string as format to arguments.


5.38. <string>:hlleval() → <number>

Evaluates compressed base64 HyperLogLog data.


5.39. <string>:indexof(<string s>, [<number fromIndex>]) → <boolean>

Returns the index of position of <s> inside the target string. Returns null otherwise.


5.40. <string>:lower() → <string>

Converts string to lowercase.


5.41. <string>:parse([<string format>], [<string locale>]) → <number>

Parses a number according to format string and locale.


5.42. <string>:regex(<string regex>) → <row>

Returns a strongly typed row composed by all named groups in <regex>.

The result is a row value with a timestamp and as many fields as there are named groups in the specified regex.

Examples:

* 
=> expand name:regex(r'^(?<first>\w+)( \w+)* (?<last>\w+)$')

Extract the first and the last names from a single field.

* 
=> name:regex(r'^(?<first>\w+)( \w+)* (?<last>\w+)$') as match 
=> '%s, %s':format(match->last, match->first) as converted

Converts the name field to the format "Last, First".


5.43. <string>:regexfind(<string regex>, [<number|string group>]) → <string>

Returns the matched string by <regex> in target (or one specific group).


5.44. <string>:regexmatch(<string regex>) → <boolean>

Returns true if the target matches <regex>. False otherwise.


5.45. <string>:regexsub(<string regex>, <string replacement>) → <string>

Replaces all matches of <regex> in target by <replacement>.


5.46. <string>:replace(<string from>, <string to>) → <string>

Replaces all instances of <from> with the string <to>.


5.47. <string>:startswith(<string>) → <boolean>

Returns whether the target string starts with the argument.


5.48. <string>:substring(<number from>, [<number to>]) → <string>

Returns the substring between the indices <from> and <to>.


5.49. <string>:upper() → <string>

Converts string to uppercase.


5.50. compare(<comparable a>, <comparable b>) → <number>

Returns a number < 0 if a < b, > 0 if a > b or 0 if a = b.


5.51. hllmerge(<string... data>) → <string>

Merge many instances of compressed base64 HyperLogLog data.


5.52. max(<comparable>, <comparable>, <comparable...>) → <comparable>

Returns the greatest value of all supplied arguments.


5.53. min(<comparable>, <comparable>, <comparable...>) → <comparable>

Returns the least value of all supplied arguments.


5.54. newlist(<object...>) → <object>

Creates a instance of java.util.List with the supplied objects.


5.55. newmap(<object,object... pairs>) → <object>

Creates a instance of java.util.Map with the supplied keys and values.


5.56. pi() → <number>

Returns the constant value of pi.


5.57. random(<number min>, <number max>) → <number>

Returns a random value between <min> and <max>.


5.58. random([<number max>]) → <number>

Returns a random value of at most <max> (1 if not defined).


5.59. timestamp() → <number>

Returns the most appropriate timestamp, whether in scalar or aggregation contexts.


6. Aggregation Functions


6.1. <aggregation object expr>:if(<boolean condition>) → <object>

Aggregates only events that evaluates true to <condition>.


6.2. <aggregation object expr>:overall() → <object>

Merges all the results from the target aggregation.


6.3. <aggregation object expr>:overlast(<number window>) → <object>

Merges the results of the last <window> aggregations.


6.4. <aggregation object expr>:prev([<number prev>]) → <object>

Delays and returns the previous <number>th result from target aggregation.


6.5. all(<boolean>) → <boolean>

Returns true if all ocurrences evaluate true.


6.6. any(<boolean>) → <boolean>

Returns true if any ocurrence evaluates true.


6.7. avg(<number>, [<number weight>]) → <number>

Calculates the (possibly weighted) average of some expression.


6.8. dcount(<object>...) → <number>

Estimates the field's cardinality (distinct count) using HyperLogLog.


6.9. describe(<aggregation object expr>) → <string>

Yields a string json explaining the target aggregation's inner state representation.


6.10. first(<object>) → <object>

Yields the ocurrence with least timestamp.


6.11. greatest(<object>, <comparable>) → <object>

Yields the greatest ocurrence in the window based on some comparable.


6.12. hll(<number log2m>, <object>...) → <number>

Similar to dcount, but allows configuration of log2m parameter.


6.13. hllmerge(<string>...) → <string>

Performs union of many HyperLogLog encoded data in a window.


6.14. hllset(<number log2m>, <object>...) → <string>

Similar to hll, but it doesn't evaluate final cardinality, just return the sketch data.


6.15. join(<string>, [<string separator>], [<string lastSeparator>]) → <string>

Join all the strings in a window.


6.16. last(<object>) → <object>

Yields the ocurrence with greatest timestamp.


6.17. least(<object>, <comparable>) → <object>

Yields the least ocurrence in the window based on some comparable.


6.18. map(<object key>, <object value>) → <object>

Creates a java.util.Map from all events in a window.


6.19. max(<comparable>) → <comparable>

Yields the greatest ocurrence in the window.


6.20. median(<number>, [<number weight>]) → <number>

Estimates the median value of the population using Count-Min Sketch.


6.21. min(<comparable>) → <comparable>

Yields the least ocurrence in the window.


6.22. pcount(<boolean>) → <number>

Aggregates the proportion of events that evaluate true to expression.


6.23. quantile(<number q>, <number>, [<number weight>]) → <number>

Estimates the q (0..1) quantile of the population using Count-Min Sketch.


6.24. set(<object>) → <object>

Creates a java.util.Set from all events in a window.


6.25. smooth(<aggregation number expr>, [<number alpha>], [<number beta>]) → <number>

Smoothes the curve of another aggregation.


6.26. stdev(<number>, [<number weight>]) → <number>

Calculates the (possibly weighted) standard deviation of some expression.


6.27. sum(<number>) → <number>

Sums all evaluations of some expression.


6.28. variance(<number>, [<number weight>]) → <number>

Calculates the (possibly weighted) variance of some expression.


6.29. when(<boolean expr>) → <number>

Yields the latest timestamp inside window when some condition was true.


6.30. whenfirst(<boolean expr>) → <number>

Yields the first timestamp inside window when some condition was true.


6.31. Window Meta-aggregations


6.31.1. WCOUNT()

Yields how many outputs are merged in the current window.


6.31.2. WSTART()

Yields the minimum allowed timestamp or item for the current window.


6.31.3. WEND()

Yields the maximum allowed timestamp or item for the current window.


6.31.4. OSTART()

Yields the minimum allowed timestamp or item for the current output.


6.31.5. OEND()

Yields the maximum allowed timestamp or item for the current output.


6.31.6. OTIMESTAMP()

Yields the timestamp when the output was merged (useful for item batch pipes).


7. Timespan Language

Pipes comes bundled with a timespan definition language, that helps defining relative dates with an almost natural language syntax. A timespan is an expression that when provided with a reference timestamp can calculate the start and the end of a relative period.

For example, the expression current month will return:

  • (2014-04-01 00:00:00, 2014-05-01 00:00:00) when provided with the timestamp for (2014-04-11 14:56:20)
  • (2014-05-01 00:00:00, 2014-06-01 00:00:00) when provided with the timestamp for (2014-05-11 14:56:20)

Please notice that span intervals are always right-open.

The language is almost entirely right-associative. This allows the user to define complex relative dates e.g.

first day in the week before the 2nd month of this year

Equivalent to the day January 25 of the current year

Assuming now as 2014-05-11 14:56:20, reading from right to left:

expressionstartend
this year 2014-01-01 00:00:002015-01-01 00:00:00
2nd month of... 2014-02-01 00:00:002014-03-01 00:00:00
the week before... 2014-01-25 00:00:002014-02-01 00:00:00
first day in... 2014-01-25 00:00:002014-01-26 00:00:00

7.1. Period definitions

The periods are the units of time that powers span definitions.

Some spans accept only a single period definition, like current|this <period> (e.g. current day). Others accept multiple spans definition, like last <period...> (e.g. last 4 days, 2 hours and 1 minute). In this documentation <period> means single periods only and <period...> means one or more periods.

A single period is composed of (optionally) an amount and an unit, e.g.:

day
1 day
a day
the day

A multiple period is composed of many single periods separated by commas or 'and', e.g.: 1 day and 5 hours, a day, an hour and 5 minutes.

1 day and 5 hours
a day, an hour and 5 minutes

The available period units are:

unitalso acceptsequivalent to
millisecondmilli, ms
secondsec1000 ms
minutemin60000 ms
hour360000 ms
day
weekwk
monthmon
bimester2 months
quarter3 months
semester6 months
yearyr

7.2. Span definitions

Spans are the final product of this language. A span can compute the start and end of an interval, given a reference timestamp (usually now). They're always right-open. This means that a span current month will return the first timestamp of the next month, instead of the last of this month.

In respect of length, spans can either be a full interval or a point (when start == end).

In respect of reference point, spans can either be either relative or fixed (when neither start nor end depend on the reference).

examplesintervalpoint
relative last hour1 hour ago
fixed 2014-02-01 to 2014-02-28timestamp 1397243504000

7.2.1. now|none

(point, relative) Returns the reference timestamp.


7.2.2. today

(interval, relative) Equivalent to current day


7.2.3. <year>-<month>[-<day> [<hour>:[<minute>:[<second>]]]]

(interval, fixed) Returns the interval relative to the selected date.

The length of the interval will depend on the precision defined.

Examples:

2014-04-10 13:42

Returns the entire minute 2014-04-10 13:42

2014-04-10

Returns the entire day 2014-04-10

2014-04

Returns the entire month 2014-04


7.2.4. timestamp|ts <number>

(point, fixed) Returns the point with the speficied timestamp.

The resulting interval is a point, not an interval with 1 millisecond.

Examples:

ts 1397246205000
timestamp 1397246205

7.2.5. from|since <span> to|until <span>

(<both>, <both>) Returns a span from the beginning of the first span to the end of the second.

The result span can be anything. It depends on the parameters.

Examples:

from yesterday to today

From the beginning of yesterday until the end of today.

since previous year

From the beginning of previous year until now.


7.2.6. last <period...>

(interval, relative) Equivalent to <period...> before now.

Please notice that spans like last year are not equivalent to previous year. The former means the period of 1 year ending now. The latter means the 1 year period ending in the beginning of this year.

Examples:

last day

From yesterday this same second until now.

last 3 days, 4 hours and 5 minutes

From 3 days, 4 hours and 5 minutes ago until now.


7.2.7. current|this <period>

(interval, relative) Returns the full period enclosing the reference timestamp.

Examples:

current day

From the 0-hour of today to the 0-hour of tomorrow.

current year

From Jan 1 of this year to Jan 1 of next year.


7.2.8. previous|yester <period>

(interval, relative) Returns the period before the current <period>

Examples:

yesterday

From the 0-hour of yesterday to the 0-hour of today.

previous year

From Jan 1 of the previous year to Jan 1 of this year.


7.2.9. [<period...>] before <span>

(<both>, <both>) The full selected period ending in the beginning of the referenced span.

If the period is defined, it is an interval.

It will be relative or fixed depending on which span is used.

Examples:

the day before today

Equivalent to yesterday

2 years and 4 months before this month

Period of 2 years and 4 months ending in the beginning of this month.

before this month

A point span with the first timestamp of this month.


7.2.10. [<period...>] after <span>

(<both>, <both>) The full selected period starting at the end of the referenced span.

If the period is defined, it is an interval.

It will be relative or fixed depending on which span is used.

Examples:

2 days after previous month

Selects the first 2 days of current month.

after this month

A point span with the first timestamp of the next month.


7.2.11. <ordinal> <period> [of] <span>

(interval, <both>) Selects the nth period inside some span.

It will be relative or fixed depending on which span is used.

Examples:

2nd day of this week

Selects this week's tuesday.

first day of previous month

Entire day 1 of previous month.


7.2.12. <period...> ago

(point, relative) Select the exact timestamp of the defined period in the past.

Examples:

2 days ago

This exact timestamp, but 2 days ago.


7.2.13. <period> of <span>

(interval, <both>) Selects the full period enclosing the referenced span

It will be relative or fixed depending on which span is used.

Examples:

the week of 40 days ago

Selects the full week of 40 days ago, from monday to sunday.

the quarter of 2014-04-11

Selects the entire quarter that contains the day 2014-04-11.


7.2.14. <span> shifted by <period...>

(<both>, <both>) Shifts the selected span by <period> in the past.

Examples:

now shifted by 2 days

Equivalent to 2 days ago.

this week shifted by 2 months

Selects the current week and shifts the timestamps 2 months in the past.


7.2.15. <span> shifted to <span>

(<both>, <both>) Calculates the span changing the reference using another span.

Examples:

this week shifted to 40 days ago

The span this week as if the reference was 40 days ago.

this week shifted to 2014-01-01

The week of the first day of 2014.


7.2.16. <span> extend [left|right] [by] <number>%

(<both>, <both>) Extends either the start or the end of a span by a percentual value.

Examples:

this week extend right 20%

Current week + ~33.6h (20% of 7 days).


7.2.17. <span> align [left|right] to <span ref>

(<both>, <both>) Aligns the span's total milliseconds at the left or right side of <ref>

Examples:

last week align to a month ago

Aligns a 1-week period to start exactly 1 month before now.

last week align right to a month ago

Aligns a 1-week period to end exactly 1 month before now.


Page generated at 23-May-2014 14:23:02