Calling Amazon Mechanical Turk With Python and ZSI 2.0
Amazon Mechanical Turk, the micropayment-for-microwork service, includes a web service interface for creating tasks from your own software application. As of this writing, their developer web site has 8 "getting started" code samples for calling MechTurk using various languages and techniques. One of these samples demonstrates how to call MechTurk's SOAP interface from Python using SOAPpy for its SOAP toolkit.
SOAPpy is being superseded by the more thorough and robust Zolera SOAP Infrastructure (ZSI). ZSI 1.x was famous for being difficult to use, at least compared to SOAPpy, but ZSI 2.0 is nearing completion, and it doesn't look half bad. So let's rewrite the Mechanical Turk Python SOAP example using ZSI 2.0-rc3.
ZSI uses a code generation approach to WSDL support, in contrast to SOAPpy's run-time creation of classes and methods. The wsdl2py utility, installed with ZSI, creates the source files from the WSDL. Run it like this:
wsdl2py --complexType --url=http://mechanicalturk.amazonaws.com/AWSMechanicalTurk/2006-10-31/AWSMechanicalTurkRequester.wsdl
This command creates two source files: AWSMechanicalTurkRequester_services.py and AWSMechanicalTurkRequester_services_types.py. The --complexType option creates convenience methods for building and accessing elements of complex types defined in the WSDL.
The source of the main program isn't much different from the SOAPpy example. First, import the libraries used to calculate the timestamp and HMAC signature for each request:
#!/usr/bin/env python import datetime import hmac import sha import base64
Next, import the generated _services package:
from AWSMechanicalTurkRequester_services import *
Set constants for your AWS access key ID and secret key:
AWS_ACCESS_KEY_ID = '[INSERT YOUR AWS ACCESS KEY ID]' AWS_SECRET_ACCESS_KEY = '[INSERT YOUR AWS SECRET KEY]'
Define routines for calculating a timestamp and signature for each request. These routines are similar to the SOAPpy example; I've modified the timestamp routine slightly to use the datetime module instead of the time module.
def generate_timestamp(dtime):
return dtime.strftime("%Y-%m-%dT%H:%M:%SZ")
def generate_signature(service, operation, timestamp, secret_access_key):
my_sha_hmac = hmac.new(secret_access_key, service + operation + timestamp, sha)
my_b64_hmac_digest = base64.encodestring(my_sha_hmac.digest()).strip()
return my_b64_hmac_digest
Tricky bit: For AWS's HMAC signature mechanism to work, the value of the Timestamp parameter must match the corresponding part of the string used to calculate the Signature, byte for byte. Unfortunately, the WSDL defines the Timestamp parameter as a dateTime, and most SOAP toolkits, ZSI included, try to be helpful by only accepting a native date-time data structure, then doing a string conversion behind the scenes. Unless you can see—or better, use—the SOAP toolkit's dateTime-to-string routine, it can be difficult to write your own timestamp routine that works consistently.
ZSI's dateTime formatter takes a 9-element tuple of date-time information, similar to what is returned by datetime.datetime.now(). Note that this is not the same tuple that is returned by time.gmtime(): datetime's tuple's seventh element (element 6) is a number of milliseconds, while gmtime's seventh element is the day of the week.
ZSI's formatter renders the milliseconds field if it is non-zero. However, neither time nor datetime provide a (platform-independent) strftime string formatting routine that can render milliseconds. One workaround is to zero out the milliseconds field before passing the tuple to ZSI. As per the dateTime spec, ZSI will omit the milliseconds from the stringified version in this case, so we can use a strftime pattern that doesn't mention milliseconds.
Ideally, our authentication code would use ZSI's own dateTime formatting routine to produce the timestamp value for the signature calculation. Someone let me know if there is an official (supported) way to do this.
timestamp_datetime = datetime.datetime.utcnow()
timestamp_list = list(timestamp_datetime.timetuple())
timestamp_list[6] = 0
timestamp_tuple = tuple(timestamp_list)
timestamp_str = generate_timestamp(timestamp_datetime)
operation = 'GetAccountBalance'
signature = generate_signature('AWSMechanicalTurkRequester', operation, timestamp_str, AWS_SECRET_ACCESS_KEY)
Next, instantiate a locator, which knows where the service's port address is, then use it to instantiate a port object:
locator = AWSMechanicalTurkRequesterLocator() port = locator.getAWSMechanicalTurkRequesterPortType()
Handy tip: Like SOAPpy, ZSI has a way to dump outgoing and incoming SOAP messages for debugging purposes. One way to use this feature with wsdl2py-generated classes is to pass an output stream to the port instantiator. For example, to dump SOAP messages and diagnostic output to the console (sys.stdout):
import sys ... port = locator.getAWSMechanicalTurkRequesterPortType(tracefile=sys.stdout)
To make a request with ZSI and wsdl2py classes, create a request object, then set the appropriate values.
request = GetAccountBalanceRequestMsg() request.AWSAccessKeyId = AWS_ACCESS_KEY_ID request.Timestamp = timestamp_tuple request.Signature = signature request.Request = [request.new_Request()]
ZSI enforces the WSDL data types when building complex data structures, and creating a non-conformant structure throws an exception the moment you try. In the case of calling MechTurk's GetAccountBalance operation, even though GetAccountBalance takes no parameters, the WSDL still demands a list of one or more Request elements, and ZSI enforces this.
Next, make the request:
response = port.GetAccountBalance(request)
Like SOAPpy, ZSI gives you access to the response XML using convenience methods. Thankfully, ZSI's implementation is more rigorous and more consistent than SOAPpy's. For example, if an element could appear more than once, in ZSI you always access the element in a list, even if the element only appears once. SOAPpy tried to be helpful by making the data member a (non-list) single object in this case, which invariably meant having to test if the element is a list in your code.
This example mimics the behavior of the SOAPpy example: It prints the results of the call and any errors it finds. I won't describe the MechTurk error response structure here; see the MechTurk getting started guide for more information.
def print_errors(error_list):
if len(error_list) > 0:
print 'There were errors processing your request:'
for error_node in error_list:
print ' Error code: ' + error_node.Code
print ' Error message: ' + error_node.Message
try:
print_errors(response.OperationRequest.Errors.Error)
except AttributeError:
pass
try:
results = response.GetAccountBalanceResult
for result in results:
try:
print_errors(result.Request.Errors.Error)
except AttributeError:
pass
print 'Available balance: ' + result.AvailableBalance.FormattedPrice
except AttributeError:
pass
Hi Dan! Have you had any luck going further with this? I am trying out ZSI with MTurk based on your code, and am getting a bit hung up on making an instance of a Price_Def for a <Reward> element. Any hints?