Intelie Pipes

Language Reference Documentation

version 0.7

1. Introduction
1.1. Events and filters
1.2. Chained computation model
1.3. Output rates and aggregation windows
1.4. Expressions
1.4.1. Type system
1.4.2. Scalars and aggregations
1.4.3. Aggregations state representation
2. Filters
2.1. *	Special filter that allows all records through.
2.2. <term>	Selects records where one of the current fields matches <term>.
2.3. <term>~<number maxEdits>	Selects records where one of the current fields matches <term> with at most <maxEdits> edits.
2.4. [<term lower> TO <term upper>]	Selects records where one of the current fields is between <lower> and <upper>.
2.5. <field>: <filter>	Sets the current field to <field>. Usually followed by <term> (e.g. somefield:someterm).
2.6. <filter> && <filter>	Selects the intersection of two other filters.
2.7. <filter> \|\| <filter>	Selects the union of two other filters.
2.8. -<filter>	Selects the complement of another filter.
3. Pipes
3.1. <named...> [by <named...>] [over <window>] [every <period> \| at the end]	Transforms or aggregates records over configurable data window and output.
3.2. <pipe> union <pipe>	Concatenates the outputs from two pipes with compatible output rates.
3.3. <pipe> product <pipe>	Computes the cartesian product of the outputs from two pipes with compatible output rates.
3.4. @filter <boolean condition>	Filters the results from previous pipe.
3.5. @sort <sortfield... expr>	Sorts the results from previous pipe.
3.6. @top <number k>, <sortfield... expr> [by <object...>]	Sorts the results and gets the first k rows (possibly grouped) from previous pipe.
3.7. @compress <number k>, [<number k2>,] <number... y> [by <object...>]	Compresses the result from the previous pipe to at most k (or k2) most important rows.
3.8. @yield [<object expr>]	Extracts one field of the stream to be the output event.
3.9. @latest	Keeps the latest batch of input events and output it at the end.
3.10. @unsafe	Marks that any pipe executed after this must run in a non-distributed environment.
4. Expressions
4.1. Property access
4.2. Function call
5. Operators
5.1. <object># → <number>	Coerces the expression to number. Shorthand to <object>:number() → <number>.
5.2. <object>$ → <string>	Coerces the expression to string. Shorthand to <object>:string() → <string>.
5.3. <number> + <number> → <number>	Adds two numbers.
5.4. <string> + <string> → <string>	Concatenates two strings.
5.5. <number> - <number> → <number>	Subtracts one number from another.
5.6. <number> * <number> → <number>	Multiplies two numbers.
5.7. <number> / <number> → <number>	Divides one number by another (float division).
5.8. <number> // <number> → <number>	Divides one number by another (integer division).
5.9. <number> ** <number> → <number>	Raises one number to another's power.
5.10. <number> % <number> → <number>	Returns the rest of the division of one number by another.
5.11. -<number> → <number>	Negates one number.
5.12. <boolean> and <boolean> → <boolean>	Returns the logical AND of two booleans.
5.13. <boolean> or <boolean> → <boolean>	Returns the logical OR of two booleans.
5.14. <boolean> xor <boolean> → <boolean>	Returns the logical XOR of two booleans.
5.15. not <boolean> → <boolean>	Returns the logical NOT of a boolean.
5.16. <object> == <object> → <boolean>	Checks whether two objects are equal.
5.17. <object> != <object> → <boolean>	Checks whether two objects are not equal.
5.18. <comparable> < <comparable> → <boolean>	Checks whether the left operand compares lesser than the right one.
5.19. <comparable> <= <comparable> → <boolean>	Checks whether the left operand compares lesser than or equal to the right one.
5.20. <comparable> > <comparable> → <boolean>	Checks whether the left operand compares greater than the right one.
5.21. <comparable> >= <comparable> → <boolean>	Checks whether the left operand compares greater than or equal to the right one.
5.22. <row> -> <identifier> → <object>	Extracts a field information from a strongly-typed row value.
5.23. <object> ?? <object> → <object>	Returns the first if it is not null; otherwise, returns the second.
5.24. <boolean> ? <object>, <object> → <object>	If the condition is true, returns the first object; otherwise, returns the second.
6. Scalar Functions
6.1. <number>:abs() → <number>	Calculates the absolute value of a number.
6.2. <number>:acos() → <number>	Returns the arc cosine of the argument to an angle in radians.
6.3. <number>:asin() → <number>	Returns the arc sine of the argument to an angle in radians.
6.4. <number>:atan() → <number>	Returns the arc tangent of the argument to an angle in radians.
6.5. <number>:bytes([<number precision>]) → <string>	Formats a number as the best possible byte multiple.
6.6. <number>:ceil([<number precision>]) → <number>	Returns the smallest number that is greatest than or equal to the argument.
6.7. <number>:cos() → <number>	Returns the cosine of an angle in radians.
6.8. <number>:dateadd(<number amount>, <string unit>, [<string tz>]) → <number>	Adds <amount> <unit>s of date to timestamp argument.
6.9. <number>:datefloor(<number amount>, <string unit>, [<string tz>]) → <number>	Rounds timestamp down to the nearest date that is divisible by <amount> <unit>s.
6.10. <number>:dateformat([<string format>], [<string tz>]) → <string>	Formats timestamp using specified format
6.11. <number>:datesub(<number amount>, <string unit>, [<string tz>]) → <number>	Sutracts <amount> <unit>s of date from timestamp argument.
6.12. <number>:exp() → <number>	Calculates the exponential of a number.
6.13. <number>:floor([<number precision>]) → <number>	Returns the largest number that is lesser than or equal to the argument.
6.14. <number>:format([<string format>], [<string locale>]) → <string>	Formats a number according to format string and locale.
6.15. <number>:log([<number base>]) → <number>	Calculates the logarithm of a number.
6.16. <number>:pow(<number exp>) → <number>	Raises one number to another.
6.17. <number>:round([<number precision>]) → <number>	Rounds a number to <precision> decimal places.
6.18. <number>:select(<object... list>) → <object>	Selects the ith element from a list of arguments. Or null if it doesn't exist.
6.19. <number>:sin() → <number>	Returns the sine of an angle in radians.
6.20. <number>:spanend(<string>, [<string tz>]) → <number>	Calculates end timestamp of span based on target.
6.21. <number>:spanstart(<string>, [<string tz>]) → <number>	Calculates start timestamp of span based on target.
6.22. <number>:tan() → <number>	Returns the tangent of an angle in radians.
6.23. <object>:boolean() → <boolean>	Converts object to boolean.
6.24. <object>:decode(<object,object... pairs>) → <object>	Transforms the parameter using the translation rules defined in <pairs>.
6.25. <object>:get(<object... keys>) → <object>	Much like property[keys]. Works for strings, containers and arrays.
6.26. <object>:indexin(<object... list>) → <number>	Returns the first index of the value in <list>, or null if <list> does not contain it.
6.27. <object>:isin(<object... list>) → <boolean>	Returns true if <list> contains the value, false otherwise.
6.28. <object>:json() → <string>	Converts the object to its JSON string representation.
6.29. <object>:keep([<number ttl>]) → <object>	When used in a default pipe, delays or disable (if ttl not supplied or < 0) inactive group removal.
6.30. <object>:len() → <object>	Tries to get <target>'s size. Works for strings, containers and arrays.
6.31. <object>:number() → <number>	Converts object to number.
6.32. <object>:object() → <object>	Casts any object to its canonical object representation.
6.33. <object>:string() → <string>	Converts object to string.
6.34. <string>:contains(<string>) → <boolean>	Returns whether the target string contains the argument.
6.35. <string>:dateparse([<string format>], [<string tz>]) → <number>	Parses timestamp using specified format
6.36. <string>:endswith(<string>) → <boolean>	Returns whether the target string ends with the argument.
6.37. <string>:format(<object... args>) → <string>	Uses the target string as format to arguments.
6.38. <string>:hlleval() → <number>	Evaluates compressed base64 HyperLogLog data.
6.39. <string>:indexof(<string s>, [<number fromIndex>]) → <boolean>	Returns the index of position of <s> inside the target string. Returns null otherwise.
6.40. <string>:lower() → <string>	Converts string to lowercase.
6.41. <string>:parse([<string format>], [<string locale>]) → <number>	Parses a number according to format string and locale.
6.42. <string>:regex(<string regex>) → <row>	Returns a strongly typed row composed by all named groups in <regex>.
6.43. <string>:regexfind(<string regex>, [<number\|string group>]) → <string>	Returns the matched string by <regex> in target (or one specific group).
6.44. <string>:regexmatch(<string regex>) → <boolean>	Returns true if the target matches <regex>. False otherwise.
6.45. <string>:regexsub(<string regex>, <string replacement>) → <string>	Replaces all matches of <regex> in target by <replacement>.
6.46. <string>:replace(<string from>, <string to>) → <string>	Replaces all instances of <from> with the string <to>.
6.47. <string>:startswith(<string>) → <boolean>	Returns whether the target string starts with the argument.
6.48. <string>:substring(<number from>, [<number to>]) → <string>	Returns the substring between the indices <from> and <to>.
6.49. <string>:upper() → <string>	Converts string to uppercase.
6.50. compare(<comparable a>, <comparable b>) → <number>	Returns a number < 0 if a < b, > 0 if a > b or 0 if a = b.
6.51. hllmerge(<string... data>) → <string>	Merge many instances of compressed base64 HyperLogLog data.
6.52. max(<comparable>, <comparable>, <comparable...>) → <comparable>	Returns the greatest value of all supplied arguments.
6.53. min(<comparable>, <comparable>, <comparable...>) → <comparable>	Returns the least value of all supplied arguments.
6.54. newlist(<object...>) → <object>	Creates a instance of java.util.List with the supplied objects.
6.55. newmap(<object,object... pairs>) → <object>	Creates a instance of java.util.Map with the supplied keys and values.
6.56. pi() → <number>	Returns the constant value of pi.
6.57. random(<number min>, <number max>) → <number>	Returns a random value between <min> and <max>.
6.58. random([<number max>]) → <number>	Returns a random value of at most <max> (1 if not defined).
6.59. timestamp() → <number>	Returns the most appropriate timestamp, whether in scalar or aggregation contexts.
7. Aggregation Functions
7.1. <aggregation object expr>:if(<boolean condition>) → <object>	Aggregates only events that evaluates true to <condition>.
7.2. <aggregation object expr>:overall() → <object>	Merges all the results from the target aggregation.
7.3. <aggregation object expr>:overlast(<number window>) → <object>	Merges the results of the last <window> aggregations.
7.4. <aggregation object expr>:prev([<number prev>]) → <object>	Delays and returns the previous <number>th result from target aggregation.
7.5. all(<boolean>) → <boolean>	Returns true if all ocurrences evaluate true.
7.6. any(<boolean>) → <boolean>	Returns true if any ocurrence evaluates true.
7.7. avg(<number>, [<number weight>]) → <number>	Calculates the (possibly weighted) average of some expression.
7.8. dcount(<object>...) → <number>	Estimates the field's cardinality (distinct count) using HyperLogLog.
7.9. describe(<aggregation object expr>) → <string>	Yields a string json explaining the target aggregation's inner state representation.
7.10. first(<object>) → <object>	Yields the ocurrence with least timestamp.
7.11. greatest(<object>, <comparable>) → <object>	Yields the greatest ocurrence in the window based on some comparable.
7.12. hll(<number log2m>, <object>...) → <number>	Similar to dcount, but allows configuration of log2m parameter.
7.13. hllmerge(<string>...) → <string>	Performs union of many HyperLogLog encoded data in a window.
7.14. hllset(<number log2m>, <object>...) → <string>	Similar to hll, but it doesn't evaluate final cardinality, just return the sketch data.
7.15. join(<string>, [<string separator>], [<string lastSeparator>]) → <string>	Join all the strings in a window.
7.16. last(<object>) → <object>	Yields the ocurrence with greatest timestamp.
7.17. least(<object>, <comparable>) → <object>	Yields the least ocurrence in the window based on some comparable.
7.18. map(<object key>, <object value>) → <object>	Creates a java.util.Map from all events in a window.
7.19. max(<comparable>) → <comparable>	Yields the greatest ocurrence in the window.
7.20. median(<number>, [<number weight>]) → <number>	Estimates the median value of the population using Count-Min Sketch.
7.21. min(<comparable>) → <comparable>	Yields the least ocurrence in the window.
7.22. pcount(<boolean>) → <number>	Aggregates the proportion of events that evaluate true to expression.
7.23. quantile(<number q>, <number>, [<number weight>]) → <number>	Estimates the q (0..1) quantile of the population using Count-Min Sketch.
7.24. set(<object>) → <object>	Creates a java.util.Set from all events in a window.
7.25. smooth(<aggregation number expr>, [<number alpha>], [<number beta>]) → <number>	Smoothes the curve of another aggregation.
7.26. stdev(<number>, [<number weight>]) → <number>	Calculates the (possibly weighted) standard deviation of some expression.
7.27. sum(<number>) → <number>	Sums all evaluations of some expression.
7.28. variance(<number>, [<number weight>]) → <number>	Calculates the (possibly weighted) variance of some expression.
7.29. when(<aggregation boolean expr>) → <number>	Yields the latest timestamp inside window when some condition was true.
7.30. Window Meta-aggregations
7.30.1. WCOUNT()	Yields how many outputs are merged in the current window.
7.30.2. WSTART()	Yields the minimum allowed timestamp or item for the current window.
7.30.3. WEND()	Yields the maximum allowed timestamp or item for the current window.
7.30.4. OSTART()	Yields the minimum allowed timestamp or item for the current output.
7.30.5. OEND()	Yields the maximum allowed timestamp or item for the current output.
7.30.6. OTIMESTAMP()	Yields the timestamp when the output was merged (useful for item batch pipes).
8. Timespan Language
8.1. Period definitions
8.2. Span definitions
8.2.1. now\|none	(point, relative) Returns the reference timestamp.
8.2.2. today	(interval, relative) Equivalent to "current day"
8.2.3. <year>-<month>[-<day> [<hour>:[<minute>:[<second>]]]]	(interval, fixed) Returns the interval relative to the selected date.
8.2.4. timestamp\|ts <number>	(point, fixed) Returns the point with the speficied timestamp.
8.2.5. from\|since <span> to\|until <span>	(<both>, <both>) Returns a span from the beginning of the first span to the end of the second.
8.2.6. last <period...>	(interval, relative) Equivalent to "<period...> before now".
8.2.7. current\|this <period>	(interval, relative) Returns the full period enclosing the reference timestamp.
8.2.8. previous\|yester <period>	(interval, relative) Returns the period before the "current <period>"
8.2.9. [<period...>] before <span>	(<both>, <both>) The full selected period ending in the beginning of the referenced span.
8.2.10. [<period...>] after <span>	(<both>, <both>) The full selected period starting at the end of the referenced span.
8.2.11. <ordinal> <period> [of] <span>	(interval, <both>) Selects the nth period inside some span.
8.2.12. <period...> ago	(point, relative) Select the exact timestamp of the defined period in the past.
8.2.13. <period> of <span>	(interval, <both>) Selects the full period enclosing the referenced span
8.2.14. <span> shifted by <period...>	(<both>, <both>) Shifts the selected span by <period> in the past.
8.2.15. <span> shifted to <span>	(<both>, <both>) Calculates the span changing the reference using another span.
8.2.16. <span> extend left\|right [by] <number>%	(<both>, <both>) Extends either the start or the end of a span by a percentual value.

Pipes is a language and an engine to perform distributed aggregations over real-time streams. It enables stream processing with low latency and minimum memory footprint (constant for most operations).

The language is desined to be as intuitive as possible, expressing more explicitly the data flow. The goal was to provide a language you can read from left to right (no visual backtracking) and still keep the declarative way to express the computations. The name Pipes is an allusion to the unix pipeline, which enables a powerful and elegant functional programming model on unix shells.

Pipes language operates over events. Events are (usually immutable) sets of (key, value) tuples and may have some time information assigned to it. In the Pipes event model, events have no intrinsic type. They're all part of a single public stream of tuples. When a type information is applicable, it is usually encoded as a property of each event (e.g. type or __type).

The engine makes no assumption on the nature, frequency or schema of the incoming events. Every operation is written keeping in mind that the event may not have the referenced field or the value may not be of same type always. It is safe to say events are pretty well representable by schemaless JSONs. As a Java library, Pipes' default configuration understands instances of java.util.Map container.

It is recommended that events have a property called 'timestamp', with a Java timestamp, that is the number of milliseconds since 1 January 1970 UTC. But even that is not enforced nor required.

Unlike a SQL database, stream processing allows very little preprocessing in the events (e.g. index creation), so the heavy optimization is done in the queries. This is why, most of the times, the queries will run in the same engine: this way, the engine can optimize and share the most processing between them.

The first part of a Pipes query always consist in selecting some events from that stream, using a filter (see Chapter 2: Filters). Filters select values from the public stream. Each filter contains basically predicates about some fields and boolean operations between them. This has a great potential for optimization. For example, given the filters "type:http status:404", "type:http status:200" and "type:http status:(2?? || 3??)", the engine builds automata like the ones in the picture bellow:

Filters automata

After filtering the public stream, you can chain the result through one or more pipes (see Chapter 3: Pipes). Each pipe may transform, filter or aggregate the results from the previous one. In the language, this chaining is represented by the operator =>.

Pipes computation model

Each pipe in the chain can have one or more output rates. For example, a pipe can receive an input every time the filter matches some event in the public stream, but only output events in batches of one minute. There are three different types of output: every time period, every item batch and at the end. For more information, see Section 1.3: Output rates and aggregation windows.

Inside each pipe, you can interact with events using expressions that access, transforms or aggregates its properties. For more information, see Section 1.4: Expressions.

Most pipes are implemented using parallel algorithms that allow seamless distribution over multiple physical machines. Usually, each pipe can be classified in three groups:

safe: pipes that can execute completelly in parallel without affecting the final result (e.g. filter pipe)
semi-safe: pipes that can execute most of their work in parallel, but must merge their output with another nodes (e.g. default pipe)
unsafe: pipes that must execute in a single node in order to compute the correct result (e.g. compress pipe)

When writing a typical query, it's likely your pipes will distribute as follows:

Typical pipe in single machine

Typical pipe in multiple machines

Every pipe in the chain will have one or more output rates. In this section, we will focus on how to declare them in default pipe, but the concept itself can be applied to any pipe type.

There are three types of output rates: every time period, every item batch and at the end.

every <period>: outputs every time the pipe's internal clock ticks a constant period. E.g. every 5 seconds
every <number> items: outputs every time the specified amount of items is processed by the engine. E.g. every 5 items
at the end: outputs only at the end of execution. Useful for summary queries.

The available period units are:

unit	also accepts	equivalent to
millisecond	milli, ms
second	sec	1000 ms
minute	min	60000 ms
hour		360000 ms
day
week	wk
month	mon
bimester		2 months
quarter		3 months
semester		6 months
year	yr

Keep in mind that in the case of default pipe, there may be a difference between the declared rate and the effective rate. In the query bellow, for example, the declared output rate in the last pipe is every item, but effectivelly the query outputs every 10 seconds.

    * => count() every 10 seconds => count, prev(count) every item

Outputs every 10 seconds

This happens because when writing a query, you declare the output with pipe's local view. But the effective output rate considers the entire query, from the filter to the last pipe. In the example above, although the last pipe outputs every time it receives two items

Also in default pipe, it is allowed to aggregate over a period other than the output. You can, for example, output a result every minute, that corresponds to the last 5 minutes of aggregation. It is an aggregation over time window.

Important
The window concept only applies to the default pipe.

In traditional stream processing libraries, every event is stored inside a data window and the aggregation runs over them. This allows a fine grained configuration of outputs vs windows. In Pipes, the goal is to use the least resources possible. So, instead of storing events inside the window, we only store the aggregation outputs (in a mergeable format, see Section 1.4.3: Aggregations state representation). The picture below exemplifies:

Windows merges aggregations outputs

This way, the memory footprint of a query is very low and predictable. Of course this approach introduces some limitations (e.g. the window size must be a multiple of the output rate) but the benefits pay it off.

There are three types of windows:

over last (<period>|<number> items): aggregates over last (<period> / <output_rate>) outputs. E.g. over last 5 minutes
over current (<period>|<number> items): aggregates since the beginning of the current period. E.g. over current day
over all: aggregates over all events, merges all outputs.

It is worth noticing that item-based windows can only be used with item-based outputs. And time-based windows can only be used with time-based outputs. Also, obviosly at the end pipes does not support any window definition and there is only a single output at the end. Below, there is a table with examples showing what is valid and what is not.

window/output examples	every 2 minutes	every 2 items	at the end
over last 10 minutes	valid	invalid	invalid
over current 10 minutes	valid	invalid	invalid
over last 10 items	invalid	valid	invalid
over current 10 items	invalid	valid	invalid
over all	valid	valid	invalid

Expressions access and transform data from events. We will introduce them here, explaing Pipes' type system and how aggregations work. For detailed information on expressions, please refer Chapter 4: Expressions, Chapter 5: Operators, Chapter 6: Scalar Functions and Chapter 7: Aggregation Functions.

The Pipes language is statically typed, but has a very simple type system. There are four main types: number, string, boolean and object. All types inherit from object. There are a few other types, used in some specific functions. The table below summarizes all types:

type	java equivalent	example
number	java.lang.Double	42, 42.0, 1e42
string	java.lang.String	"42"
boolean	java.lang.Boolean	true, false
object	java.lang.Object	null
comparable	java.lang.Comparable
row	net.intelie.pipes.Row
row_list	net.intelie.pipes.RowList

Pipes does not require events to be strongly typed (although it allows them to be). Because of this, the type of a property access may be not known at compile time. Still, the language is statically typed, so some type must be assigned to the expression. The chosen type depends on the configuration, but the default is to be a string. This means that even if the value of the field in the event contains a numeric type, it will be cast as string before being used by the engine.

Types are checked in compile time. For example, trying to use a function that requires a number passing a string (like avg(someproperty)) results in error:

PipeException: Error in call avg(someproperty) with types <PropertyAccess[string]>, cause(s):
AvgAggregation: The parameter #1 <someproperty> must be 'number' instead of 'string'.

It is possible to tell the engine what type an expression must have, using the type's name as a function. E.g. number(someproperty) indicates that someproperty is a numeric value and will coerce to a number in cases it isn't. When used with other expressions, those functions also convert values: number("42") will just return 42.

The string and number types also have shorthands for these functions:

number(someproperty) is the same as someproperty#
string(someproperty) is the same as someproperty$

Every Pipes expression can be either a scalar or an aggregation. We call it a level.

Scalars are expressions that receive an event and computes its result immediately. E.g. to get the length of a string, one can write len(someproperty). In this case, the whole expression is a scalar.

Aggregations are expressions that receive several events but only computes its result at the end of the window. E.g. to get the average of a property during the window, one can write avg(someproperty#). In this case, the whole expression is an aggregation.

When a scalar has only constant values (and operations over them) we can say it is a constant. E.g. the literal value "some value" is a constant. len("some value") is also a constant, and may be optimized in compile to time to the value 10.

Most functions just assume the same level as some (sometimes all) of its parameters. E.g. in the expression len(first(someproperty)) the resulting expression is an aggregation, because first(someproperty) is also an aggregation. When a function always result in an aggregation, we refer it just as aggregation.

Some functions (mostly aggregations) require parameters to have some specific level. E.g. in the aggregation avg(<number>, [<number weight>]) → <number>, the parameter must be a scalar. For example, trying to compile avg(first(someproperty#)) result in an error:

PipeException: Error in call avg(first(someproperty#)) with types <FirstAggregation[number]>, cause(s):
AvgAggregation: The parameter #1 <first(someproperty#)> must be a scalar instead of an aggregation.

For most purposes, every constant is also an scalar (it just ignores the event) and every scalar may act like an aggregation, if required to.

Level set diagram

One of the main features in Pipes is the ability to run aggregations distributedly. To make this possible, most aggregations are written using parallel versions of the algorithms. Most of the solutions presented in this section only affect how semi-safe pipes (see Section 1.2: Chained computation model) represent their result over the wire. Additionally, this same mechanism is used by windows in default pipe (see Section 1.3: Output rates and aggregation windows) and meta-aggregations (e.g. overlast, overall) to merge the results from many outputs.

Lets take as example the average aggregation. Suppose there will be two machines calculating the query type:http => avg(response_time#) every minute in parallel. They share their results every minute. Suppose:

Machine 1 received its events and calculated the average response time is: 237.44
Machine 2 received its events and calculated the average response time is: 1061.08

Using only this information, it is impossible to know the global average value. If we just take the mean value from both results (649.26) this value could be wrong, because the machine 1 could have received way more events than machine 2, so its result should weight more in the global result.

That's the reason why every aggregation has an internal state representation more complex than just its result. In the case of average aggregation its state representation has two fields: "mean" and "sumw".

mean holds the current result
sumw holds the sum of all weights (or the count, in the case of unweighted average)

Aggregations state representation

Some aggregations have a more complex inner state, like the distinct count aggregation, that uses the HyperLogLog data structure and must share the sketch content with the other nodes to merge its results. But even that state uses constant memory for its representation.

There is a meta-aggregation in pipes called describe that operates over other aggregations, but instead of yielding the result at the end of the window, it show a json representation of the aggregation's inner state. Using the same example as before, writing describe(avg(response_time#)) would return a value like:

{"sumw":262.0,"mean":416.628854962}

Every query in Pipes is required to start with a filter. The filter runtime is optimized to allow many queries running over the same stream, sharing as much processing as possible.

The filter syntax is different from the rest of the syntax. It is inspired in the Lucene's query syntax.

Examples:

    type:http status:404

Filters all records where the field "type" has a value "http" and "status" has a value "404"

Special filter that allows all records through.

When used in conjunction with field syntax, selects all records that contain that field.

Examples:

All records (no filter)

    type:*

Filters records that contains any value in the field "type".

Selects records where one of the current fields matches <term>.

The match is case insensitive.

The use of wildcards (* and ?) is allowed.

Use the field syntax to change the default field.

Examples:

    http

If the default field is "somefield", selects records where "somefield" has a value "http" (case insensitive)

    otherfield:http

Selects records where "otherfield" has a value "http" (case insensitive)

    otherfield:http*404

Selects records where "otherfield" has a value that starts with "http" and ends with "404" (case insensitive)

Selects records where one of the current fields matches <term> with at most <maxEdits> edits.

The match is case insensitive.

Wildcards are not allowed in this filter.

Examples:

    http~2

If the default field is "somefield", selects records where "somefield" has a value similar to "http" (e.g. "hxxp").

Selects records where one of the current fields is between <lower> and <upper>.

The match is case insensitive.

The range filter is inclusive in both ends. To make it exclusive, change "[" to "(" and/or "]" to ")".

The TO keyword must be written in uppercase. It can also be replaced by a single comma with no change in meaning.

Wildcards are not allowed in this filter. Except for the single *. In this filter it means an unbounded end.

There are shorthand syntaxes for some versions of this query:

[lower TO *] can be writen as >= lower
(lower TO *] can be writen as > lower
[* TO upper] can be writen as <= upper
[* TO upper) can be writen as < upper

Examples:

    status:[200 TO 299]

Selects records where status is between 200 and 299

    status:[200, 300)

Selects records where status is between 200 and 299 (300 exclusive)

    status:[200, *]

Selects records where status is greater than or equal to 200.

    status:>=200

Same as the filter above.

Sets the current field to <field>. Usually followed by <term> (e.g. somefield:someterm).

Any filter expression can be used inside a filter field. Even other filter fields.

Examples:

    type:http

Selects records where the field "type" is equal to "http".

    type:(http || cpu)

Selects records where the field "type" is equal to "http" or "cpu".

Selects the intersection of two other filters.

It's the default filter conjunction, so it can safely be omitted.

The operator && is equivalent to & and AND (must be uppercase).

Examples:

    http verb:get status:404

Selects records with the default field equal to "http", "verb" equal to "get" and "status" equal to "404"

    http && verb:get && status:404

Same as above, but with explicit &&

    tag:(important && resolved)

Selects records with the field "tag" containing both "important" and "resolved" values

Selects the union of two other filters.

The operator || is equivalent to | and OR (must be uppercase).

Examples:

    http || verb:get || status:404

Selects records with the default field equal to "http", "verb" equal to "get" or "status" equal to "404"

    tag:(important || resolved)

Selects records with the field "tag" containing either "important" or "resolved" values

Selects the complement of another filter.

The operator - is equivalent to ! and NOT (must be uppercase).

Examples:

    -verb:get

Selects records with the field "verb" different of "get"

    tag:(a* -abnormal)

Selects records with the field "tag" starting with "a", but not equal to "abnormal"

Pipes are the main building blocks in this language. No surprise the language is named after them. Pipes represent one single processing element that consumes events from the input and produces other events in the output.

The first pipe reads from the filtered public stream. Then, the output from one pipe can be chained as the input to another using the operator =>.

Each kind of pipe has its own properties, like rate of output and data window type and size. Some pipes can run on a distributed environment, some cannot. Some pipes may consume very low memory, some may need a lot of memory to run. All of these properties are derived and checked in compile time.

Examples:

type:http 
=> avg(response_time#) as time by host every minute
=> @top 10, time desc

Outputs every minute the 10 hosts with greatest average response time.

Transforms or aggregates records over configurable data window and output.

This is the default type of pipe. The decision of when the query will output and over which data it will aggregate depends on what is configured on over and every clauses.

Concatenates the outputs from two pipes with compatible output rates.

Computes the cartesian product of the outputs from two pipes with compatible output rates.

Filters the results from previous pipe.

Sorts the results from previous pipe.

Sorts the results and gets the first k rows (possibly grouped) from previous pipe.

Compresses the result from the previous pipe to at most k (or k2) most important rows.

Extracts one field of the stream to be the output event.

Keeps the latest batch of input events and output it at the end.

Marks that any pipe executed after this must run in a non-distributed environment.

Yada yada yada!

Operators are elements of syntax that allows simple manipulation of scalar and aggregation values.

Every operator translates directly to a function call. E.g. the expression 2+2==4 is equivalent to .eq(.add(2, 2), 4).

You can find bellow a table with the operator group precedence, from highest to lowest. Operators inside same group have the same precedence.

Group	Operators
Primary	<row> -> <identifier> → <object> <object># → <number> <object>$ → <string>
Unary	not <boolean> → <boolean> -<number> → <number>
Null coaslescing	<object> ?? <object> → <object>
Power	<number> ** <number> → <number>
Multiplicative	<number> * <number> → <number> <number> / <number> → <number> <number> // <number> → <number> <number> % <number> → <number>
Additive	<number> + <number> → <number> <number> - <number> → <number> <string> + <string> → <string>
Comparative	<comparable> > <comparable> → <boolean> <comparable> >= <comparable> → <boolean> <comparable> < <comparable> → <boolean> <comparable> <= <comparable> → <boolean>
Equality	<object> == <object> → <boolean> <object> != <object> → <boolean>
Logical AND	<boolean> and <boolean> → <boolean>
Logical XOR	<boolean> xor <boolean> → <boolean>
Logical OR	<boolean> or <boolean> → <boolean>
Ternary	<boolean> ? <object>, <object> → <object>

Only the ternary operator is right-associative. All other operators are left-associative. You can change both procedence and associativity by using parentheses. E.g. a+b*c is equivalent to a+(b*c).

Coerces the expression to number. Shorthand to <object>:number() → <number>.

The expression a# is equivalent to number(a).

Coerces the expression to string. Shorthand to <object>:string() → <string>.

The expression a$ is equivalent to string(a).

Adds two numbers.

The expression a+b is equivalent to .add(a, b).

Concatenates two strings.

The expression a+b is equivalent to .add(a, b).

Subtracts one number from another.

The expression a-b is equivalent to .sub(a, b).

Multiplies two numbers.

The expression a*b is equivalent to .mul(a, b).

Divides one number by another (float division).

The expression a/b is equivalent to .div(a, b).

Divides one number by another (integer division).

The expression a//b is equivalent to .intdiv(a, b).

Raises one number to another's power.

The expression a**b is equivalent to .pow(a, b) or even pow(a, b) .

Returns the rest of the division of one number by another.

The expression a%b is equivalent to .mod(a, b).

Negates one number.

The expression -a is equivalent to .neg(a).

Returns the logical AND of two booleans.

The expression a and b is equivalent to a&b, a&&b and .and(a, b).

Returns the logical OR of two booleans.

The expression a or b is equivalent to a|b, a||b and .or(a, b).

Returns the logical XOR of two booleans.

The expression a xor b is equivalent to a^b and .xor(a, b).

Returns the logical NOT of a boolean.

The expression not a is equivalent to !a and .not(a).

Checks whether two objects are equal.

The expression a==b is equivalent to .eq(a, b).

Checks whether two objects are not equal.

The expression a!=b is equivalent to .neq(a, b).

Checks whether the left operand compares lesser than the right one.

The expression a<b is equivalent to .lt(a, b).

Checks whether the left operand compares lesser than or equal to the right one.

The expression a<=b is equivalent to .lteq(a, b).

Checks whether the left operand compares greater than the right one.

The expression a>b is equivalent to .gt(a, b).

Checks whether the left operand compares greater than or equal to the right one.

The expression a>=b is equivalent to .gteq(a, b).

Extracts a field information from a strongly-typed row value.

The expression a->identifier cannot be represented as method because the method .peek() requires a direct instance of java.lang.String.

Example:

* 
=> name:regex(r'^(?<first>\w+)( \w+)* (?<last>\w+)$') as match 
=> '%s, %s':format(match->last, match->first) as converted

Converts the name field to the format "Last, First".

Returns the first if it is not null; otherwise, returns the second.

The expression a??b is equivalent to .coalesce(a, b).

Example:

    * => first(name) ?? 'None' as first_name every second

Yields the first name in the window, or 'None' if no name was found.

If the condition is true, returns the first object; otherwise, returns the second.

The expression a?b,c is equivalent to .iif(a, b, c).

Please note this is the only right-associative operator. It is so to allow constructions like:

AllMonitorings
=> (type == 'http' ? 'Http Monitoring',
    type == 'cpu' ? 'CPU Monitoring',
    type == 'mem' ? 'Memory Monitoring',
    'Unknown') as monitoring type

Example:

    * => count()%2==0 ? 'Even', 'Odd' as type every second

Yields 'Even' or 'Odd' every second, depending on the number of events in the window.

Calculates the absolute value of a number.

Returns the arc cosine of the argument to an angle in radians.

Returns the arc sine of the argument to an angle in radians.

Returns the arc tangent of the argument to an angle in radians.

Formats a number as the best possible byte multiple.

Returns the smallest number that is greatest than or equal to the argument.

Returns the cosine of an angle in radians.

Adds <amount> <unit>s of date to timestamp argument.

Rounds timestamp down to the nearest date that is divisible by <amount> <unit>s.

Formats timestamp using specified format

Sutracts <amount> <unit>s of date from timestamp argument.

Calculates the exponential of a number.

Returns the largest number that is lesser than or equal to the argument.

Formats a number according to format string and locale.

Calculates the logarithm of a number.

Raises one number to another.

Rounds a number to <precision> decimal places.

Selects the ith element from a list of arguments. Or null if it doesn't exist.

Returns the sine of an angle in radians.

Calculates end timestamp of span based on target.

Calculates start timestamp of span based on target.

Returns the tangent of an angle in radians.

Converts object to boolean.

Transforms the parameter using the translation rules defined in <pairs>.

Much like property[keys]. Works for strings, containers and arrays.

Returns the first index of the value in <list>, or null if <list> does not contain it.

Returns true if <list> contains the value, false otherwise.

Converts the object to its JSON string representation.

When used in a default pipe, delays or disable (if ttl not supplied or < 0) inactive group removal.

Tries to get <target>'s size. Works for strings, containers and arrays.

Converts object to number.

Casts any object to its canonical object representation.

Converts object to string.

Returns whether the target string contains the argument.

Parses timestamp using specified format

Returns whether the target string ends with the argument.

Uses the target string as format to arguments.

Evaluates compressed base64 HyperLogLog data.

Returns the index of position of <s> inside the target string. Returns null otherwise.

Converts string to lowercase.

Parses a number according to format string and locale.

Returns a strongly typed row composed by all named groups in <regex>.

The result is a row value with a timestamp and as many fields as there are named groups in the specified regex.

Examples:

* 
=> expand name:regex(r'^(?<first>\w+)( \w+)* (?<last>\w+)$')

Extract the first and the last names from a single field.

* 
=> name:regex(r'^(?<first>\w+)( \w+)* (?<last>\w+)$') as match 
=> '%s, %s':format(match->last, match->first) as converted

Converts the name field to the format "Last, First".

Returns the matched string by <regex> in target (or one specific group).

Returns true if the target matches <regex>. False otherwise.

Replaces all matches of <regex> in target by <replacement>.

Replaces all instances of <from> with the string <to>.

Returns whether the target string starts with the argument.

Returns the substring between the indices <from> and <to>.

Converts string to uppercase.

Returns a number < 0 if a < b, > 0 if a > b or 0 if a = b.

Merge many instances of compressed base64 HyperLogLog data.

Returns the greatest value of all supplied arguments.

Returns the least value of all supplied arguments.

Creates a instance of java.util.List with the supplied objects.

Creates a instance of java.util.Map with the supplied keys and values.

Returns the constant value of pi.

Returns a random value between <min> and <max>.

Returns a random value of at most <max> (1 if not defined).

Returns the most appropriate timestamp, whether in scalar or aggregation contexts.

Aggregates only events that evaluates true to <condition>.

Merges all the results from the target aggregation.

Merges the results of the last <window> aggregations.

Delays and returns the previous <number>th result from target aggregation.

Returns true if all ocurrences evaluate true.

Returns true if any ocurrence evaluates true.

Calculates the (possibly weighted) average of some expression.

Estimates the field's cardinality (distinct count) using HyperLogLog.

Yields a string json explaining the target aggregation's inner state representation.

Yields the ocurrence with least timestamp.

Yields the greatest ocurrence in the window based on some comparable.

Similar to dcount, but allows configuration of log2m parameter.

Performs union of many HyperLogLog encoded data in a window.

Similar to hll, but it doesn't evaluate final cardinality, just return the sketch data.

Join all the strings in a window.

Yields the ocurrence with greatest timestamp.

Yields the least ocurrence in the window based on some comparable.

Creates a java.util.Map from all events in a window.

Yields the greatest ocurrence in the window.

Estimates the median value of the population using Count-Min Sketch.

Yields the least ocurrence in the window.

Aggregates the proportion of events that evaluate true to expression.

Estimates the q (0..1) quantile of the population using Count-Min Sketch.

Creates a java.util.Set from all events in a window.

Smoothes the curve of another aggregation.

Calculates the (possibly weighted) standard deviation of some expression.

Sums all evaluations of some expression.

Calculates the (possibly weighted) variance of some expression.

Yields the latest timestamp inside window when some condition was true.

Yields how many outputs are merged in the current window.

Yields the minimum allowed timestamp or item for the current window.

Yields the maximum allowed timestamp or item for the current window.

Yields the minimum allowed timestamp or item for the current output.

Yields the maximum allowed timestamp or item for the current output.

Yields the timestamp when the output was merged (useful for item batch pipes).

Pipes comes bundled with a timespan definition language, that helps defining relative dates with an almost natural language syntax. A timespan is an expression that when provided with a reference timestamp can calculate the start and the end of a relative period.

For example, the expression "current month" will return:

(2014-04-01 00:00:00, 2014-05-01 00:00:00) when provided with the timestamp for (2014-04-11 14:56:20)
(2014-05-01 00:00:00, 2014-06-01 00:00:00) when provided with the timestamp for (2014-05-11 14:56:20)

Please notice that span intervals are always right-open.

The language is almost entirely right-associative. This allows the user to define complex relative dates e.g.

    first day in the week before the 2nd month of this year

Equivalent to the day January 25 of the current year

Assuming now as 2014-05-11 14:56:20, reading from right to left:

expression	start	end
this year	2014-01-01 00:00:00	2015-01-01 00:00:00
2nd month of...	2014-02-01 00:00:00	2014-03-01 00:00:00
the week before...	2014-01-25 00:00:00	2014-02-01 00:00:00
first day in...	2014-01-25 00:00:00	2014-01-26 00:00:00

The periods are the units of time that powers span definitions.

Some spans accept only a single period definition, like current|this <period> (e.g. current day). Others accept multiple spans definition, like last <period...> (e.g. last 4 days, 2 hours and 1 minute). In this documentation <period> means single periods only and <period...> means one or more periods.

A single period is composed of (optionally) an amount and an unit, e.g.:

day
1 day
a day
the day

A multiple period is composed of many single periods separated by commas or 'and', e.g.: 1 day and 5 hours, a day, an hour and 5 minutes.

1 day and 5 hours
a day, an hour and 5 minutes

The available period units are:

unit	also accepts	equivalent to
millisecond	milli, ms
second	sec	1000 ms
minute	min	60000 ms
hour		360000 ms
day
week	wk
month	mon
bimester		2 months
quarter		3 months
semester		6 months
year	yr

Spans are the final product of this language. A span can compute the start and end of an interval, given a reference timestamp (usually now). They're always right-open. This means that a span "current month" will return the first timestamp of the next month, instead of the last of this month.

In respect of length, spans can either be a full interval or a point (when start == end).

In respect of reference point, spans can either be either relative or fixed (when neither start nor end depend on the reference).

examples	interval	point
relative	last hour	1 hour ago
fixed	2014-02-01 to 2014-02-28	timestamp 1397243504000

(point, relative) Returns the reference timestamp.

(interval, relative) Equivalent to "current day"

(interval, fixed) Returns the interval relative to the selected date.

The length of the interval will depend on the precision defined.

Examples:

    2014-04-10 13:42

Returns the entire minute 2014-04-10 13:42

    2014-04-10

Returns the entire day 2014-04-10

    2014-04

Returns the entire month 2014-04

(point, fixed) Returns the point with the speficied timestamp.

The resulting interval is a point, not an interval with 1 millisecond.

Examples:

    ts 1397246205000

    timestamp 1397246205

(<both>, <both>) Returns a span from the beginning of the first span to the end of the second.

The result span can be anything. It depends on the parameters.

Examples:

    from yesterday to today

From the beginning of yesterday until the end of today.

    since previous year

From the beginning of previous year until now.

(interval, relative) Equivalent to "<period...> before now".

Please notice that spans like "last year" are not equivalent to "previous year". The former means the period of 1 year ending now. The latter means the 1 year period ending in the beginning of this year.

Examples:

    last day

From yesterday this same second until now.

    last 3 days, 4 hours and 5 minutes

From 3 days, 4 hours and 5 minutes ago until now.

(interval, relative) Returns the full period enclosing the reference timestamp.

Examples:

    current day

From the 0-hour of today to the 0-hour of tomorrow.

    current year

From Jan 1 of this year to Jan 1 of next year.

(interval, relative) Returns the period before the "current <period>"

Examples:

    yesterday

From the 0-hour of yesterday to the 0-hour of today.

    previous year

From Jan 1 of the previous year to Jan 1 of this year.

(<both>, <both>) The full selected period ending in the beginning of the referenced span.

If the period is defined, it is an interval.

It will be relative or fixed depending on which span is used.

Examples:

    the day before today

Equivalent to yesterday

    2 years and 4 months before this month

Period of 2 years and 4 months ending in the beginning of this month.

    before this month

A point span with the first timestamp of this month.

(<both>, <both>) The full selected period starting at the end of the referenced span.

If the period is defined, it is an interval.

It will be relative or fixed depending on which span is used.

Examples:

    2 days after previous month

Selects the first 2 days of current month.

    after this month

A point span with the first timestamp of the next month.

(interval, <both>) Selects the nth period inside some span.

It will be relative or fixed depending on which span is used.

Examples:

    2nd day of this week

Selects this week's tuesday.

    first day of previous month

Entire day 1 of previous month.

(point, relative) Select the exact timestamp of the defined period in the past.

Examples:

    2 days ago

This exact timestamp, but 2 days ago.

(interval, <both>) Selects the full period enclosing the referenced span

It will be relative or fixed depending on which span is used.

Examples:

    the week of 40 days ago

Selects the full week of 40 days ago, from monday to sunday.

    the quarter of 2014-04-11

Selects the entire quarter that contains the day 2014-04-11.

(<both>, <both>) Shifts the selected span by <period> in the past.

Examples:

    now shifted by 2 days

Equivalent to 2 days ago.

    this week shifted by 2 months

Selects the current week and shifts the timestamps 2 months in the past.

(<both>, <both>) Calculates the span changing the reference using another span.

Examples:

    this week shifted to 40 days ago

The span this week as if the reference was 40 days ago.

    this week shifted to 2014-01-01

The week of the first day of 2014.

(<both>, <both>) Extends either the start or the end of a span by a percentual value.

Examples:

    this week extend right 20%

Current week + ~33.6h (20% of 7 days).

Page generated at 17-Apr-2014 18:55:51

Intelie Pipes

Language Reference Documentation

Table of Contents

1. Introduction (link)

1.1. Events and filters (link)

1.2. Chained computation model (link)

1.3. Output rates and aggregation windows (link)

1.4. Expressions (link)

1.4.1. Type system (link)

1.4.2. Scalars and aggregations (link)

1.4.3. Aggregations state representation (link)

2. Filters (link)

2.1. * (link)

2.2. <term> (link)

2.3. <term>~<number maxEdits> (link)

2.4. [<term lower> TO <term upper>] (link)

2.5. <field>: <filter> (link)

2.6. <filter> && <filter> (link)

2.7. <filter> || <filter> (link)

2.8. -<filter> (link)

3. Pipes (link)

3.1. <named...> [by <named...>] [over <window>] [every <period> | at the end] (link)

3.2. <pipe> union <pipe> (link)

3.3. <pipe> product <pipe> (link)

3.4. @filter <boolean condition> (link)

3.5. @sort <sortfield... expr> (link)

3.6. @top <number k>, <sortfield... expr> [by <object...>] (link)

3.7. @compress <number k>, [<number k2>,] <number... y> [by <object...>] (link)

3.8. @yield [<object expr>] (link)

3.9. @latest (link)

3.10. @unsafe (link)

4. Expressions (link)

4.1. Property access (link)

4.2. Function call (link)

5. Operators (link)

5.1. <object># → <number> (link)

5.2. <object>$ → <string> (link)

5.3. <number> + <number> → <number> (link)

5.4. <string> + <string> → <string> (link)

5.5. <number> - <number> → <number> (link)

5.6. <number> * <number> → <number> (link)

5.7. <number> / <number> → <number> (link)

5.8. <number> // <number> → <number> (link)

5.9. <number> ** <number> → <number> (link)

5.10. <number> % <number> → <number> (link)

5.11. -<number> → <number> (link)

5.12. <boolean> and <boolean> → <boolean> (link)

5.13. <boolean> or <boolean> → <boolean> (link)

5.14. <boolean> xor <boolean> → <boolean> (link)

5.15. not <boolean> → <boolean> (link)

5.16. <object> == <object> → <boolean> (link)

5.17. <object> != <object> → <boolean> (link)

5.18. <comparable> < <comparable> → <boolean> (link)

5.19. <comparable> <= <comparable> → <boolean> (link)

5.20. <comparable> > <comparable> → <boolean> (link)

5.21. <comparable> >= <comparable> → <boolean> (link)

5.22. <row> -> <identifier> → <object> (link)

5.23. <object> ?? <object> → <object> (link)

5.24. <boolean> ? <object>, <object> → <object> (link)

6. Scalar Functions (link)

6.1. <number>:abs() → <number> (link)

6.2. <number>:acos() → <number> (link)

6.3. <number>:asin() → <number> (link)

6.4. <number>:atan() → <number> (link)

6.5. <number>:bytes([<number precision>]) → <string> (link)

6.6. <number>:ceil([<number precision>]) → <number> (link)

6.7. <number>:cos() → <number> (link)

6.8. <number>:dateadd(<number amount>, <string unit>, [<string tz>]) → <number> (link)

6.9. <number>:datefloor(<number amount>, <string unit>, [<string tz>]) → <number> (link)

6.10. <number>:dateformat([<string format>], [<string tz>]) → <string> (link)

6.11. <number>:datesub(<number amount>, <string unit>, [<string tz>]) → <number> (link)

6.12. <number>:exp() → <number> (link)

6.13. <number>:floor([<number precision>]) → <number> (link)

6.14. <number>:format([<string format>], [<string locale>]) → <string> (link)

6.15. <number>:log([<number base>]) → <number> (link)

6.16. <number>:pow(<number exp>) → <number> (link)

6.17. <number>:round([<number precision>]) → <number> (link)

6.18. <number>:select(<object... list>) → <object> (link)

6.19. <number>:sin() → <number> (link)

6.20. <number>:spanend(<string>, [<string tz>]) → <number> (link)