Recently my software experience helped me to point where was the problem with a php source code given by developers.

Developers thought the problem was a sysadmin misconfiguration with the miss php libraries or something like this because with their environment it works! OK I can hear you: “but if you use a docker image for development and a docker image for production, this problem can’t happen, and thus there is no question about a misconfiguration”. But it’s not the case, we don’t use docker for several reasons.

The problem with url not encoded

Here the example of code that was submitted with curl through the php library php-curl:

$ch = curl_init();
$options = array(
  CURLOPT_URL => $this->URL.'?user='.$this->Login.'&data='.$this->data,
  ...

As you can see if the field userdata contains a value with special characters like spaces, ampersands, equals, … it will break the value and then break the url.

The problem is that if you don’t submit special characters during your tests you won’t break the url and you won’t have an error.

The solution here is simple, url_encore all your values

$ch = curl_init();
$escapedData = curl_escape($ch, $this->data);
$options = array(
  CURLOPT_URL => $this->URL.'?user='.$this->Login.'&data='.$escapedData,
  ...

Here for the solution I submitted to developers the php curl_escape function that follow RFC 3986. Notably, this RFC contains specification about how to encode/decode URI: reserved characters must be encoded (it means characters that must be “escaped”). It is also known as “percent-encoding”.

For instance if the data is not url encoded and its value is:

it's awesome & fabulous

The $escapedData (data url encoded) will be:

it's%20awesome%20%26%20fabulous

You can see that spaces and ampersand have been encoded.

Yes there is an important point that you see.

We call it url encoded but it’s not the whole url that has been encoded but only the value of GET parameters. And that it!

You have the same with POST values if your post values are submitted with Content-Type http header value set to application/x-www-form-urlencoded.

Submission of my POST data (form), do I need to care about URL encode?

If your data are submited via a form, even if you have a javascript or a regexp that forbit special characters for a username form example, you can’t totally count on it, because you javascript could have an error or can be bypassed. That’s the reason why, by default you should always url_encode field value.

It’s seem there is also two ways to submit a form with curl or any other function, these two ways are differanciated with Content-Type in the http header:

  • application/x-www-form-urlencoded
  • multipart/form-data

Form submitted with Content-Type http header application/x-www-form-urlencoded

You need to encode your form field values.

In case of Content Type has value application/x-www-form-urlencoded, it means that the form will be submitted as an url encoded. Then the receiver will url decode it when he will see that the Content-Type header is application/x-www-form-urlencoded. But if the form field were not url encoded you fall in the previous problem.

Form submitted with Content-Type http header multipart/form-data

Here no need to encode your form field values.

In this case there is a content delimiter for submitted value of your form. It is called a boundary value, and it is a delimiter similar to & (ampersand), and this value is a string of characters, a very specific string that should not be found in the form content submitted (other wise it would bread the submitted value).

For instance:

--XXX
Content-Disposition: form-data; name="name"

John
--XXX
Content-Disposition: form-data; name="age"

12
--XXX--

Here the boundary value is XXX. You can even choose yourself the boundary value, but take care to choose a boundary value that won’t appear in the submitted fields.