vADC Docs

XML, TrafficScript and Java Extensions

by mikeg_2 on ‎03-22-2013 10:39 PM - edited on ‎06-23-2015 10:30 AM by PaulWallace (841 Views)

Stingray allows you to inspect and manipulate both incoming and outgoing traffic with a customized version of Java's Servlet API. In this article we'll delve more deeply into some of the semantics of Stingray's Java Extensions and show how to validate XML files in up- and in downloads using TrafficScript and Java Extensions.

 

The example that will allow us to illustrate the use of XML processing by Stingray is a website that allows users to share music play-lists. We'll first look at the XML capabilities of 'conventional' TrafficScript, and then investigate the use of Java Extensions.

 

A play-list sharing web site

 

You have spent a lot of time developing a fancy website where users can upload their personal play-lists, making them available to others who can then search for music they like and download it. Of course you went for XML as the data format, not least because it allows you to make sure uploads are valid. Therefore, you can help your users' applications by providing them with a way of validating their XML-files as they are uploaded. Also, to be on the safe side, whenever an application downloads an XML play-list it should be checked and only reach the user if it passes the validation.

 

XML provides the concept of schema files to describe what a valid document has to look like. One popular schema language is the W3C's XML Schema Definition (XSD), see http://www.w3.org/TR/xmlschema-0/. Given an XSD file, you can hand an XML document to a validator to find out whether it actually conforms to the data structure specified in the schema.

 

Coming back to our example of a play-list sharing website, you have downloaded the popular xspf (XML Shareable Playlist Format, 'spiff') schema description from http://xspf.org/validation/. One of the tags allowed inside a track in XML files of this type is image. By specifying tags like <image>http://images.amazon.com/images/P/B000002J0B.01.MZZZZZZZ.jpg</image> a user could see the following pictures:

 

albums.png

 

Validating XML with TrafficScript

 

How do you validate an XML file from a user against that schema? Stingray's TrafficScript provides the xml.validate() function. Here's a simple rule to check the response of a web server against a XSD:

 

1
2
3
4
5
6
7
8
9
10
$doc = http.getResponseBody(); 
$schema = resource.get( "xspf.xsd" ); 
$result = xml.validate.xsd( $doc, $schema ); 
if( 1 == $result ) { 
   log.info( "Validation succeeded" ); 
} else if( 0 == $result ) { 
   log.info( "Validation failed" ); 
} else
   log.info("Validation error"); 
}

 

Let's have a closer look at what this rule does:

 

  1. First, it reads in the whole response by calling http.getResponseBody(). This function is very practical but you have to be extremely careful with it. The reason is that you do not know beforehand how big the response actually is. It might be an audio stream totaling many hundred megabytes in size. Surely you don't want Stingray to buffer all that data. Therefore, when using http.getResponseBody() you should always check the mime type and the content length of the response (see below for code that does this).
  2. Our rule then goes on to load the schema definition file with resource.get(), which must be located in ZEUSHOME/zxtm/conf/extra/ for this step to work.
  3. Finally it does the actual validation and checks the result. In this simple example, we are only logging the result, on your music-sharing web site you would have to take the appropriate action.

 

The last rule was a response rule that worked on the result from the back-end web server. These files are actually under your control (at least theoretically), so validation is not that urgent. Things are different if you allow uploads to your web site. Any user-provided data must be validated before you let it through to your back-ends. The following request rule does the XML validation for you:

 

1
2
3
4
5
6
7
8
9
10
11
12
$m = http.getMethod(); 
if( 0 == string.icmp( $m, "POST" ) ) { 
   $clen = http.getHeader("Content-Length"); 
   if( $clen > 0 && $clen <= 1024*1024 ) { 
      $schema = resource.get( "xspf.xsd" ); 
      $doc = http.getBody(); 
      $result = xml.validate.xsd( $doc, $schema ); 
      # handle result 
   } else
      # handle over-sized posts 
   

 

Note how we first look at the HTTP method, then retrieve the length of the post's body and check it. That check, which is done in the line

 

1
if( $clen > 0 && $clen <= 1024*1024 ) {

 

...deserves a bit more comment: The variable $clen was initialized from the post's Content-Length header, so it could be the empty string at that stage. When TrafficScript converts data to integers, variables that do not actually represent numbers are converted to 0 (see the TrafficScript reference for more details). Therefore, we have to check that $clen is greater than zero and at most 1 megabyte (or whatever limit you choose to impose on the size of uploads). After checking the content length we can safely invoke getBody().

 

A malicious user might have faked the HTTP header to specify a length larger than his actual post. This would lead Stingray to try to read more data than the client sends, pausing on a file descriptor until the connection times out. Due to Stingray's non-blocking IO multiplexing, however, other requests would be processed normally.

 

Validating XML with Stingray's Java Extensions

 

After having explored TrafficScript's built-in XML support, let's now see how XML validation can be done using Java Extensions.

 

If you are at all familiar with Java Servlets, Stingray's Java Extensions should feel like home for you. The main differences are

 

  • You have a lot of Stingray's built-in functionality ready at hand via attributes.
  • You can manipulate both the response (as in conventional Servlets) and the request (unique to Stingray's Servlet Extensions as Stingray sits between the client and the server).

 

There's lots more detail in the Feature Brief: Java Extensions in Stingray Traffic Manager.

 

The interesting thing here is that this flow actually applies twice: First when the request is sent to the server (you can invoke the Java extension from a request rule) and then again when the response is sent back to the client (allowing you to change the result from a response rule). This is very practical for your music-sharing web site as you only have to write one Servlet. However, you have to be able to tell whether you are working on the response or the request. The ZXTMHttpServletResponse object which is passed to both the doGet() and doPost() methods of the HttpServlet object has a method to find out which direction of the traffic flow you are currently in: boolean isResponseRule(). This distinction is never needed in conventional Servlet programming as in that scenario it's the Servlet's task to create the response, not to modify an existing response.

 

These considerations make it easy to design the Stingray Servlet for your web site:

 

  • There will be an init() method to read in the schema definition and to set up the xml.validation.Validator object.
  • We'll have a single private validate() method to do the actual work.
  • The doGet() method will invoke validate() on the server's response, whereas
  • the doPost() method does the same on the body of the request

 

After all that theory it's high time for some real code (note that any import directives have been removed for the sake of readability as they don't add anything to our discussion - see Writing Java Extensions - an introduction ):

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
public class XmlValidate extends HttpServlet { 
   private static final long serialVersionUID = 1L; 
   private static Validator validator = null; 
   public void init( ServletConfig config ) throws ServletException { 
      super.init( config ); 
      String schema_file = config.getInitParameter("schema_file"); 
   
      if( schema_file == null ) 
         throw new ServletException("No schema file specified"); 
   
      SchemaFactory factory = SchemaFactory.newInstance( 
         XMLConstants.W3C_XML_SCHEMA_NS_URI); 
   
      Source schemaFile = new StreamSource(new File(schema_file)); 
      try { 
         Schema schema = factory.newSchema(schemaFile); 
         validator = schema.newValidator(); 
      } catch( SAXException saxe ) { 
         throw new ServletException(saxe.getMessage()); 
      
   
// ... other methods below 
}

 

The validate() function is actually very simple as all the hard work is done inside the Java library. The only thing to be careful about is to make sure that we don't allow concurrent access to the Validator object from multiple threads:

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
private boolean validate( InputStream in, HttpServletResponse res, String errmsg ) 
      throws IOException 
   
      Source src=new StreamSource(in); 
      try { 
         synchronized( validator ) { 
            validator.validate(src); 
         
      } catch( SAXException saxe ) { 
         String msg = saxe.getMessage(); 
         res.setContentType("text/plain"); 
         PrintWriter out = res.getWriter(); 
         out.println(errmsg); 
         out.print("Validation of the xml file has failed with error message: "); 
         out.println(msg); 
         return false; 
      
      return true; 
   

 

Note that the only thing we have to do in case of a failure is to write to the stream that makes up the response. No matter whether this is being done in a request or a response rule, Stingray will take that as an indication that this is what should be sent back to the client. In the case of a request rule, Stingray won't even bother to hand on the request to a back-end server and instead send the result of the Java Servlet; in a response rule, the server's answer will be replaced by what the Servlet has produced.

 

Now we are ready for the doGet() method:

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
public void doGet( HttpServletRequest req, HttpServletResponse res ) 
     throws ServletException, IOException 
  
     try { 
        ZXTMHttpServletResponse zres = (ZXTMHttpServletResponse) res; 
        if( !zres.isResponseRule() ) { 
           log("doGet called in request rule ... bailing out"); 
           return
        
        InputStream in = zres.getInputStream(); 
        validate(in, zres, "The file you requested was rejected."); 
     } catch( Exception e ) { 
        throw new ServletException(e.getMessage()); 
     
  

 

There's not really much work left apart from calling our validate() method with the error message to append in case of failure. As discussed previously, we make sure that we are actually working in the context of a response rule because otherwise the response would be empty. Exactly the opposite has to be done when processing a post:

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
public void doPost( HttpServletRequest req, HttpServletResponse res ) 
    throws ServletException, IOException 
 
    try { 
       ZXTMHttpServletRequest zreq = (ZXTMHttpServletRequest) req; 
       ZXTMHttpServletResponse zres = (ZXTMHttpServletResponse) res; 
 
       if( zres.isResponseRule() ) { 
          log("doPost called in response rule ... bailing out"); 
          return
       
 
       InputStream in = zreq.getInputStream(); 
       if( validate(in, zres, "Your upload was unsuccessful") ) { 
          // just let the post through to the backends 
       
    } catch(Exception e) { 
       throw new ServletException(e.getMessage()); 
    
 

 

The only thing missing are the rules to invoke the Servlet, so here they are (assuming that the Servlet has been loaded up via the 'Java' tab of the 'Catalogs' section in Stingray's UI as a file called XmlValidate.class). First the request rule:

 

1
2
3
4
$m = http.getMethod(); 
if( 0 == string.icmp( $m, "POST" ) ) { 
   java.run( "XmlValidate" ); 

 

and the response rule is almost the same:

 

1
2
3
4
$m = http.getMethod(); 
if( 0 == string.icmp( $m, "GET") ) { 
   java.run( "XmlValidate" ); 

 

It's your choice: TrafficScript or Java Extensions

 

Which is better?

 

So now you are left with a difficult decision: you have two implementations of the same functionality, which one do you choose? Bearing in mind that the unassuming java.run() leads to a considerable amount of inter-process communication between the Stingray child process and the Java Servlet runner, whereas the xml.validate() is handled in C++ inside the same process, it is a rather obvious choice. But there are still situations when you might prefer the Java solution.

 

One example would be that you have to do XML processing not supported directly by Stingray. Java is more flexible and complete in the XML support it provides. But there is another advantage to using Java: you can replace the actual implementation of the XML functionality. You might want to use Intel's XML Software SuiteJ for Java, for example. But how do you tell Stingray's Java runner to use another XML library? Only two settings have to be adapted:

 

java!classpath /opt/intel/xmlsoftwaresuite/java/1.0/lib/intel-xss.jar java!command java -Djava.library.path=/opt/intel/xmlsoftwaresuite/java/1.0/bin/intel64 -server

 

This applies if you have installed Intel's XML Software SuiteJ in /opt/intel/xmlsoftwaresuite/java/1.0/ and are using the 64 bit version of the shared library. Both changes can be made in the 'Global Settings' tab of the 'System' section in Stingray's UI.

Contributors