Safe JSON

Update: March 5th 2007:  Important change to the recommendation for Safe JSON detailed below.  It is not as safe as people think, but it can still be made to be safe.

We have been investigating the security implications of having a JSON api in Connections. It turns out that it is very easy to leave pretty big security exposures in an application if it isn’t done right.  The security exposure in this case is rogue sites being able to get at data made available via a JSON api.  The truly frightening part of this is that applications installed on a corporate intranet can actually leak data to internet sites should a user visit a rogue site. BTW, these exposures apply equally to both formally published api’s such as Yahoo’s and also any internal JSON api’s often used for AJAX tricks.

As far as I can make out there are 3 different approaches used with JSON api’s. Before detailing the vulnerabilities I’ll highlight the three approaches using the Yahoo examples (you might want to familiarize yourself with the examples before reading any further). The three approaches are :

Approach 1 – Plain JSON

Simply return JSON i.e.

{
  "Image": {
    "Width":800,
    "Height":600,
    "Title":"View from 15th Floor",
    "Thumbnail":
    {
      "Url":"http:\/\/scd.mm-b1.yimg.com\/image\/481989943",
      "Height": 125,
      "Width": "100"
    },
  "IDs":[ 116, 943, 234, 38793 ]
  }
}

Approach 2 – var assignment

Assign the JSON object to some variable that can then be accessed by the embedding application (not an approach used by Yahoo).

var result = {
  "Image": {
    "Width":800,
    "Height":600,
    "Title":"View from 15th Floor",
    "Thumbnail":
    {
      "Url":"http:\/\/scd.mm-b1.yimg.com\/image\/481989943",
      "Height": 125,
      "Width": "100"
    },
  "IDs":[ 116, 943, 234, 38793 ]
  }
}

Approach 3 – function callback

When calling the JSON Web Service pass as a parameter a callback function.  The resulting JSON response passes the JSON object as a parameter to this callback function.

callbackFunction( {
  "Image": {
    "Width":800,
    "Height":600,
    "Title":"View from 15th Floor",
    "Thumbnail":
    {
      "Url":"http:\/\/scd.mm-b1.yimg.com\/image\/481989943",
      "Height": 125,
      "Width": "100"
    },
  "IDs":[ 116, 943, 234, 38793 ]
  }
})

All approaches can be used via an XMLHttpRequest followed by a javascript eval, but as Yahoo points out Approaches 2 & 3 unlike Approach 1 don’t "run afoul of browser security restrictions that prevent files from being loaded across domains." as…

"Using JSON and callbacks, you can place the Yahoo! Web Service request inside a <script> tag, and operate on the results with a function elsewhere in the JavaScript code on the page. Using this mechanism, the JSON output from the Yahoo! Web Services request is loaded when the enclosing web page is loaded. No proxy or server trickery is required."

Indeed they have successfully navigated the browser security restrictions, which I should point out is probably fine for Yahoo as ALL their services only expose publically available data.  However, if a developer coding up an application that contains private data uses the same approach (i.e. Approach 2 or 3) then they have exposed the application to a pretty simple attack.  BTW, I’m defining private data to be any data that should not be publically accessible to the entire world (this probably covers most data on a corporate intranet but also includes any data that requires authenticatation prior to access). Here’s an example.

A user logs into a wiki on the corporate intranet.  This wiki provides a JSON api with a callback function (Approach 3).  The user then visits a rogue site on the internet.  The page from the rogue site, when rendered in the user’s browser, performs a javascript include to the wiki’s json api passing a callback function. This results in data from the wiki being made available to the rogue site’s javascript function in the page via the callback. Further javascript, on the page, can then form POST the data back to the rogue site and as such the data can be stolen. Not good.

Approach 1, on the other hand, does not contain this vulnerability as it can’t be used via a javascript include.  If attempted it does not make the any data available on the page as it is not valid javascript, indeed it, instead, results in a javascript error and so is safe for JSON api’s that contain private data.

Recommendation

I’m going to tentatively propose the following recommendation and would welcome feedback.

When developing a JSON api that contains data that should not be publically accessible to the world use Approach 1 i.e. return plain JSON.  Update: The JSON returned MUST be of type "Serialized Object" and not of type "Array" (as defined by the JSON spec).  (See the March 5th update below for the rationale behind this change).  If the data can be publically exposed then Approaches 2 & 3 have significant advantages in terms of consumability.

Update: March 5th 2007

Joe has pointed out that care still needs to be taken even when using a plain JSON return (Approach 1). From my testing and as others have pointed out the vulnerability that Joe is referring to only applies when returning JSON of type "array" (section 2.3 of  the JSON standard). However, it appears that if you return JSON of type "serialized object" (section 2.2) then, at the moment, I know of no vulnerability.  It’s worth mentioning that arrays can still be present in the JSON as long as they are not at the top level. The example in Approach 1 above is not vulnerable to attack even though it contains an embedded array.  The following structure is vulnerable though

[["ct","Your Name","foo@gmail.com"], ["ct","Another Name","bar@gmail.com"] ]

as google knows only too well

Anyway, I have updated my recommendation.  It remains tentative.

30 thoughts on “Safe JSON”

  1. Approach 2 and 3 should, simply, NEVER, EVER, EVER be used. There are plenty of libraries available today to parse JSON data structures, and none of them will EVER, EVER be able to read the whacked out Approach 2 and 3 styles. EVAR.

    Data, baby, data!

  2. Patrick,

    I have to disagree with you here. Approaches 2 & 3 are much more consumable than Approach 1 and if the data is publically available then there is no security exposure to the json producing site. Yahoo’s api is a great example here.

    However, I do agree that the consuming site now exposes itself to attacks from the producer. When using api’s such as Yahoo’s the consuming site needs to assess the risk and the data that they are now exposing e.g. Yahoo’s javascript could sniff the DOM on the page that it finds itself included in and send data back to Yahoo. So it’s very important that the consumer “trusts” the producer.

  3. I’m with Patrick – use Approach 1 all the way. If you want to increase consumability, then *separately provide* a consumable API that a consumer may choose to use if they want to.

    Besides, Approach 2 and 3 isn’t valid JSON. It is, however, valid JavaScript.

  4. True, but whatever it’s called there are folks doing things that way. It’s actually not so bad if the data involved is not sensitive; there are really quite a few very interesting things that can be done with the callback approach. For myself, I think the larger issue of trust is far more important. Any javascript embedded into a page can steal just about whatever it wants from that page and send it somewhere else you don’t it to go. That’s bad.

  5. It’s worth noting the data feed can be consumed by JavaScript, Perl, Ruby, Python, Java, and about a bazillion other languages. Approach two and three can’t. They can only be used from JavaScript.

  6. Maybe I’m missing something but where’s the security risk? If you provide a resource via a publicly available URL, doesn’t it seem obvious that people can make HTTP requests to “steal” that data?

  7. Jonathan,

    I probably could have explained it a bit better, but there are two risks, namely

    1) Data on a corporate intranet accessible via approach 2 or 3 can be stolen by rogue scripts on the public internet. So while the data is available on a url in the intranet that does not mean that the data should be available to the entire world.
    2) The user may have previously authenticated with a system that has a JSON based api. This is often in the form of a cookie that will be subsequently sent on further requests. As such this private data that requires authentication can again be stolen by rogue scripts.

    Make sense?

  8. Approach 3 can be further sub-divided into JSONP and not. A static given callback name, as (presumably) in the example is less consumable than a full JSONP API where the consumer specifies the name of the callback in the URL query parameter “callback”.

    Doing it that way lets the consumer mix and match concurrent requests for different URLs, both yours and others, and name callbacks to have them consumed by callbacks aware of which request they were tied to, without getting them mixed up due to random timing effects.

    Letting consumers specify the callback name however they please exposes your site’s cookies to theft, though, unless you restrict the callback name so it can not pass the variable “document.cookie” to itself.

    Yahoo restricts it to case insensitive alphanumerics, underscore, period and angle brackets, which is sane.

    Your concern for JSON APIs seems to miss a subtle yet important point: the only thing case 1 guards against is people doing browser side mashups of your published data with javascript alone; whichever variant you pick, any perl, ruby, java or unix shell power user can consume your data and do whatever they want with it.

    To implement *security*, you should use forms of HTTP authentification, i e basic auth, cookies, or exchanging other auth tokens between client and server. If the scheme you pick lets browsers logged on to your system leak the data to third party web pages (I presume this is your fear, though it does not come out very clear in the post), approach 1 would require the attacker to have trojaned the browser to steal the data, which approaches 2 and 3 would give them for free without any spying software.

  9. I’m not really getting it either, you seem to be saying, that with JSON enabled services, if artbitrary code is being run on the website, it has access to that data.

    Wouldn’t it be reasonable to expect that if a user is able to execute script on a users browser, then regardless their data is not safe? No matter how you call your webservice, if I can execute script on your page, I have access to it.

  10. Quoting from http://www.json.org/js.html, “Since JSON is a subset of JavaScript, it can be used in the language with no muss or fuss.” This page has said that as long as I can remember: since 2003. I don’t know about anybody else but I consider it “muss and fuss” to have to use iframes/innerHTML/parsing/eval to make JSON data from a file local to my web server, available to Javascript in my HTML pages: even Douglas Crockford himself calls the DOM “an inconvenient API.” I can’t find the URL, it might be on json.org, but Crock has also said that while JSON API’s should be strict in what they emit, they can be less so in what they accept. A JSON decoder that accepts an optional var assignment can do away with the need to jump through the iframes/innerHTML/parsing/eval hoops; var assignment could for instance be perfect for declarative form validation: one http/script/src call rather than using the AJAX hammer on a problem that is probably not a nail. That being said, you just need to remain vigilant against exposing sensitive data, where the operative word is *remain*.

  11. Kris Gray: “No matter how you call your webservice, if I can execute script on your page, I have access to it.”

    Yep… regardless of any specific exploit, the fundamental problem is that potentially untrustworthy script has full rights to do pretty much whatever it wants on a page. There’s no way of sandboxing the execution of the script or limiting what a script can do.

  12. Pingback: Jaisen's Blog
  13. “The attack that was used in the post you referenced relied on the JSON being in ().”

    No it didn’t, as far as I can see.

  14. One thing that wasn’t emphasized enough in this article is that the assumptions here are that XSRF is at play:

    1. The attack is launched from a malicious page opened by the user in their web browser or some other user-agent that support cookies
    2. The user is already “logged in” to the secure service
    3. The JSON API accepts that login cookie for authentication

    If your secure JSON APIs does not accept cookies for authentication I believe that these XSRF attacks are no longer a problem. You simply have to change the API so that instead of a cookie it uses a parameter directly on the URL or in a custom HTTP header and the valid non-malicious javascript accessing the data must include this parameter/header with each request. It can scan the cookies on its own page to find that parameter for convenience.

    The third-party site will not have access to this secure token (stored in a cookie or otherwise) and won’t be able to submit it with the request.

    Does that seem like a reasonable solution or am I missing something?

Leave a Reply

Your email address will not be published. Required fields are marked *