vADC Docs

libSet.rts: Sets in TrafficScript

by chrisboyle on ‎03-20-2013 07:34 AM (316 Views)

Here (attached) is a library that uses TrafficScript array and hashes to provide another new data structure, the set: an unordered collection in which an element occurs either once or not at all. An example use would be "words I have seen on the page".

The trick to implementing this is to realise that TrafficScript already has an efficient set implementation: the hash (associative array). Specifically, you can put your data in the keys of the hash and use an arbitrary constant as the value. This means that inserting, deleting and checking membership of the set are all fast operations. While you could use that trick directly on an ad-hoc basis in individual rules, this library will improve readability and provide some type checking.

If you're curious and use lang.dump($some_set) to inspect the data structure, you'll see something like this (note that the order of hash elements is arbitrary):


[ "type" => "set", "values" => [ "foo" => 1, "123" => 1, "bar" => 1 ] ]



One limitation of this structure is that only scalars can be members of sets, since only scalars can be hash keys. In this library, if you insert an array, each element will be inserted, and if you try to insert a hash, you'll get a warning and nothing will be inserted.

The library includes the following functions:

set.new()Returns a new (empty) set.
set.destroy( $set )Destroy a set.
set.insert( $set, $value )Insert a value (or another set or an array of values) into the set.
set.remove( $set, $value )Remove a value (or set or array of values) from the set.
set.contains( $set, $value )Check if the set contains a particular value.
set.toarray( $set )Return all the values in the set.
set.empty( $set )Empty a set.
set.union( $a, $b )Returns the set of elements that are in $a or $b.
set.intersection( $a, $b )Returns the set of elements that are in $a and $b.
set.difference( $a, $b )Returns the set of elements that are in $a and not in $b.
set.count( $set )Count the number of items in the set.
set.subseteq( $a, $b )Check if $a is a (non-strict) subset of $b.
set.superseteq( $a, $b )Check if $a is a (non-strict) superset of $b.

To use it, add the library to your TrafficScript rules catlog, and then, in another rule, use:


import libSet.rts as set;



and all the 'set' functions above will be available.

Here's an example of how you could use it. This rule will expect the words in $target to occur somewhere on the page, and write a log line if any of them are missing.


import libSet.rts as set;



$ctype = http.getResponseHeader( "Content-Type" );


if( ! string.startswith( $ctype, "text/html" ) ) break;



$target = set.new();


set.insert( $target, ["riverbed","news","articles"] );


$used = set.new();


$words = string.split(string.lowercase(http.getResponseBody()));


foreach( $w in $words ) {


   set.insert( $used, $w );


}


$unused = set.toarray( set.difference( $target, $used ));


if( array.length( $unused ) ) {


   log.info( http.getPath().": " . array.join(array.sort($unused),", ") );