3.1. Exception Handling in Data Lab Services

Note

This document is currently in a DRAFT stage.

This document will describe the recommended handling of errors and exceptions in the Python-based Data Lab middleware (i.e. the various ‘manager’ services).

The Java-based servlets (i.e. VOSpace and DALServer) implement IVOA protocols that describe the required exception handling for those protocols. These are not covered in detail except when/how those protocol exceptions should be handled by the Data Lab server and client codes.

3.1.1. Exceptions vs. return values

The two philosophies about how to handle errors are sometimes described as EAFP (Easier to Ask for Forgiveness than Permission) and LBYL (Look Before You Leap). Under the LBYL model, the application using the interface, the interface itself and the service each must anticipate all potential error conditions (either by catching system library exceptions and/or writing code to check in advance that parameters are valid, files exist, etc) and respond with an error code. With the EAFP model, exceptions can either be handled, ignored or dealt with at a more appropriate level in the application. The EAFP model also has readability benefits and so is the model adopted for Data Lab

Exceptions are, by definition, unexpected behavior of the code, but they can also be raised in response to improper use of the client or service (e.g. missing or invalid input). We require that all interface methods (client and server) describe their calling arguments as well as what (if anything) is returned by the method or service in the method docstrings. …..

3.1.2. Service Architecture Overview

The existing Data Lab service and client code all have a similar structure, however since they were developed at different times in the project there remain a number of inconsistencies between the services. This section describes the target structure we wish to have for all code following a throrough code review, notes will be used to identify known problems to be addressed in the current release.

3.1.3. Server-side Code

Data Lab services are implemented using the Python Flask microframework. These services define a middleware layer that clients (web-based, programmatic or command-line) access from their interfaces. As middleware, these services, depending on their function, may in turn call lower-level services (e.g. VOSpace, TAP, SIA, etc) or access a resource such as a database directly. The return value of each service is documented in the service implementation docstring.

Note

The code review process is intended to discover those docstrings that don’t yet provide the required service documentation.

As an example, a simple ‘echo’ service might look something like:

@app.route('/echo')
def echo(arg):
    ''' ECHO - A simple echo service endpoint

        Parameters
        ----------
        arg : str
              The argument to be echoed.

        Returns
        -------
        (Status 200)  A string that echoes the argument
        (Status 400)  An error message string
    '''
    if arg is not None:
        return "Hello %s!\n" % arg
    else
        raise Exception('Missing "arg" parameter')

In this example, simply raising the generic Exception will cause the service to return a 500 (Internal Server Error) to the caller. In order to return a specific error message we need to define an errorhandler() method for the Flask application. For example,

@app.errorhandler (Exception)
def handle_invalid_request (error):
   return app.make_response(('Error: '+error.message, 400, ''))

This is better in that it returns a proper HTTP response with the specified error message, but the status code is fixed at 400 (or whatever value chosen). The solution is to create an exception subclass in the server code (and an associated Flask errorhandler()) that allows us to set the message, status code, and optionally an error payload:

class dlInvalidRequest(Exception):
   def __init__(self, message, status_code=None, payload=None):
      Exception.__init__(self, message)
      self.message = message
      self.status_code = (status_code if status_code is not None else 400)
      self.payload = payload

   def to_dict(self):
      """ Method to return a JSON formatting of the error. """
      rv = dict(self.payload or ())
      rv['message'] = self.message
      rv['code'] = self.status_code
      return rv

@app.errorhandler(dlInvalidRequest)
def handle_invalid_request(error):
   return app.make_response(('Error: ' + error.message,
                             error.status_code, ''))

The service code then looks like:

if arg is not None:
    return "Hello %s!\n" % arg
else
    raise dlInvalidRequest('Missing argument', 400)

When no argument is provided, the service will return a status 400 response with a specific error message useful to the client.

3.1.3.1. Service Return Codes

As RESTful web services, the standard set of HTTP return codes are available to communicate status back to the calling client in addition to any returned data. This provides the flexibility needed to return error messages that provide detail on why the service failed when a standard status status message may be ambiguous.

Exceptions in the server code should follow a few simple guidelines. Services will return:

Status 200 (Successful)
When the service performs the requested action without error.
Status 400 (Bad Request)
When the call fails due to missing or invalid input to the service. When a backend service returns an error status, that status should be returned to the client when it provides a more detailed explanation of the error.
Status 403 (Forbidden)
When the client does not present the identity token required to access or modify a requested resource.
Status 404 (Not Found)
When the service cannot access a requested static resource (e.g. a VOSpace URI).
Status 503 (Service Unavailable)
When the service requires access to a backend resource that cannot be reached (e.g. a database or storage system), preventing the entire service from executing as required.

Backend-services (e.g. TAP, VOSpace, database) may have return codes specified by the protocols used, but these will be handled by the service when determining whether the call succeeded. For example, when deleting a file from VOSpace, the protocol requires a 204 status code response indicating the file was deleted, however the service should return a 200 status to the client because the storeClient.rm() method succeeded. In the event of an error in the VOSpace service that returns a non-204 code, the rm() service can handle or ignore the error or else return a 400 (Bad Request) error along with the specific error message from VOSpace. Similarly, a query that requires a user MyDB table that doesn’t exist will return the error message from the database that identifies the missing table without requiring that every possible database message map into a corresponding HTTP status code. By limiting the number of trapped error status codes, the client has fewer specific exceptions to catch explicitly and can raise the error to the calling method more easily.

3.1.4. Client-side Code

The basic layout of a Data Lab Client interface is something like the following (using the AuthManager as an example):

def login (user, password):          # Module method
     return ac_client.login(user, password)

class authClient (Object):           # AuthManager Object class
   def __init__(self):
        pass

   def login (self, user, password): # Class method
       resp = requests.get (svc_url, headers=hdrs)
       return resp

def getClient():                     # Get a new instance of the authClient
   return authClient()

ac_client = getClient()              # Create a default client object

Note

The use of MultiMethod signatures is ignored here for brevity.

This structure is intended to allow applications to directly access the module methods when using the default client instance created by the import of the module, but also the ability to create additional clients when necessary (e.g. when using DEV instances of Data Lab services, or when using different service profiles without requiring a resetting of the profile before each use in the default client). For example,

from dl import queryClient as qc             # standard import

gp04 = qc.getClient(profile='gp04')          # get new client instance with
                                             # 'gp04' profile

res1 = qc.query (sql='....')                 # query default service
res2 = gp04.query (sql='....')               # query 'gp04' service

Note

As of this writing, the use of MultiMethod signatures prevents new client instances (e.g. the ‘gp04’ client above) from working correctly. The solution is understood and will be implemented as part of the MultiMethod docstring work to come.

3.1.4.1. Client-side exception handling

You can see from the client code example that the module methods are simply calls to the default client’s class method which performs all the work of the client interface. Our goal is to catch and/or handle exceptions in this class module and simply raise them to the calling procedure. When writng a default client module method, the code may look something like:

def login(user, password):
   try:
      resp = ac_client.login(user, password)
   except Exception as e:
       raise

   return resp

In this way, an exception either returned by the service or raised by the class method is simply passed to the caller. On success, the normal return value of the method is returned.

The class method that actually calls the service should use an appropriate try-except block to raise exceptions in the client code or returned by the service. Data Lab client interfaces use the requests module to make service calls where all exceptions that Requests raises inherit from the requests.exceptions.RequestException class, making it possible to trap the specific errors returned by a service individually as well as HTTP connection-related issues.

For example, a class method might look something like:

def login(user, passwd):
   resp = None
   try:
      resp = requests.get(url, params={'username':user,'password':passwd})
      resp.raise_for_status()

   except requests.exceptions.RequestException as err:
     if resp is None:
         raise Exception (str(err))          # connection error
     else:
         raise Exception (resp.content)      # service error

   return resp.content

There are a few things to note in this example:

  • The raise_for_status() is used to raise HTTP errors that inherit from the RequestException object used by the requests module. Service errors (e.g. TimeOut) are returned using HTTP status codes and can be caught in the same block.
  • When the response resp is None during the except handling, the exception message is returned to indicate the specific HTTP error. However, if we have a valid resp object in an exception it was generated by the server and we return the error message in the response content to pass the message back from the service. The resp is initialized to let us differentiate the two exception types so that we can handle HTTP connection problems and service problem differently when needed.
  • Here we raise the generic Exception but in production code we generally create a throwable exception class to be used. For Py2/Py3 compatability this allows us to assure the ‘string’ type on the error message currently assumed by legacy code and can be removed later once a full transition to Py3 is complete.

Client code of course does other processing, e.g. validating input parameters, processing return values and so on. These steps may themselves use try-except blocks to do additional error handling and should follow similar concepts when determining which exceptions are raised.

Note

The requests call is so common to each client method that a utility method should be implemented to avoid code duplication and provide a central location for all service-related exception handling. Similar utility methods could be envisioned for paramter validation, ensuring standard string-type conversion, etc.

3.1.5. Coding Style: Examples vs. Applications

In HowTo notebooks, science examples and general documentation the intent is often to convey an example use of an interface or service, implicitly assuming the example will always succeed. On the other hand, application code should be written to catch exceptions that might be raised or that risk aborting a task entirely.

As an example, consider the authClient.login() method that returns an authorization token for the user. In example code this might be written as

token = authClient.login ('foobar', getpass())

where we assume a valid user and password are entered and the token variable then contains a valid auth token for the application. In application code we would want to protect this call to catch a login error with something like:

try:
   token = authClient.login ('foobar', getpass())
except Exception as e:
   print ('Login Error: ' + str(e))
else:
   print ("Logged in as user '%s'" % token.split('.')[0])

The added try-except block in this code snippet, however, distracts from the example use of login() being demonstrated. We recommend therefore that its use should be limited either to examples that show explicitly how errors are to be handled or when writing production-quality code.

Note

In many cases existing client API code now returns a mix of an ‘OK’ string, and error message, or the (valid) return data from the service. We wish to have all methods throw exceptions on an error and return either nothing or valid data from the service.

Client API methods may return objects of various types. Two issues still to be settled are:

  • proper handling of boolean return values, i.e. ensure the Python True/False type is returned and not strings
  • proper handling of string return types, i.e. enforce ‘string’ or allow for return of ‘byte’ types under Py3 that may require decoding.