Random thoughts of a warped mind…

August 2, 2013

Cloudfront woes – “Your request contains one or more invalid invalidation paths.” – Use custom regexp for URI::encode

Filed under: All,Amazon EC2,EC2,Linux,Ruby — Srinivas @ 12:27

AWS Cloudfront is a content delivery network part of Amazons EC2/AWS stack which lets you serve static assets from a source (S3 bucket or custom origin server) by caching it across numerous edge locations. Occassionally the underlying content can change which needs the cache to be refreshed – This is done via a Cloudfront cache invalidation request which specifies a distribution id and a list of paths to refresh (e.g. /index.html or /imgs/logo.png etc).

If you use a CDN to do this, its pretty much the same; What Amazon really does is run varnish servers in multiple locations and have these cache your content locally. On varnish you would use a PURGE or BAN to do this (See varnish docs) – On cloudfront you send an invalidation request (which eventually may very well translate into a PURGE/BAN on amazons varnish edge boxes).  

Of course it makes sense to use versioned URLs/paths since you would’nt need to send cache invalidations but thats not always possible.

When I was sending out some invalidation requests (batches of 1000 which is the max per invalidation request), I realized that some of the batches were being rejected by amazon with the error “Your request contains one or more invalid invalidation paths.” As per Cloudfronts developer docs, all URLs have to be encoded as per RFC 1738… I was using Ruby 1.9.x with URI::encode to encode these urls so I was a little surprised to see the rejections. Upon digging further I located some URLs with tilde(~) and at(@) characters in them… Turns out URI::encode was’nt encoding them leading to AWS rejecting the cache invalidation errors.

As per RFC 2396 (http://www.ietf.org/rfc/rfc2396.txt) ,
    ~ is an unreserved character (can be escaped without changing semantics of the URI)
    @ is a reserved characted

As per RFC 1738 (http://www.w3.org/Addressing/rfc1738.txt, To which cloudfront Paths must conform)
    ~ is Unsafe, must always be encoded within a URL.
    @ is Reserved, May be used unencoded in a URL ONLY if used for their reserved purpose(which is not the case with my data)

Typically with Ruby 1.8.6/7 you could override the Regexp used to match unsafe characters (which must be encoded) by using REGEXP::UNSAFE but thats not valid in URI::encode in Ruby 1.9.x (The documentation for URI for Ruby 1.9.x. is stale and still refers to REGEXP::UNSAFE). So the fix needed to add ~ and @ to be encoded is to use URI::Parser.new.regexp[:UNSAFE] instead of REGEXP::UNSAFE – Do a union of the UNSAFE regexp with ~/@ and use that in URI::encode(). With this done, my invalidations requests went through fine and the cached content was invalidated and new content fetched from origin…

Powered by WordPress