My web application does screen scraping to do some work.
One Monday, the returned page started to contain CR+LF+someHex+CR+LF. When I use cURL all is just fine but when I used Internet Command provided by 4D this sets of characters (CR+LF+someHex+CR+LF) was randomly inserted to returned HTML page.
What I end up finding out was that:
1. the web site I was accessing must have enabled chunked transfer encoding on their web server on “One Monday” or over the weekend.
2. The chunked encoding modifies the body of a message in order to transfer it as a series of chunks
3. chunked body contains chunk size which is represented in HEX (ah…) and the way I was receiving this page did not understand this encoding…
Solution?
Well I could re-write this with cURL that’s the best way to go and I will do that when I have a chance.
Since chunked encoding is only available in HTTP version 1.1. changing the version to 1.0 in request header solved this problem for now…
One Monday, the returned page started to contain CR+LF+someHex+CR+LF. When I use cURL all is just fine but when I used Internet Command provided by 4D this sets of characters (CR+LF+someHex+CR+LF) was randomly inserted to returned HTML page.
What I end up finding out was that:
1. the web site I was accessing must have enabled chunked transfer encoding on their web server on “One Monday” or over the weekend.
2. The chunked encoding modifies the body of a message in order to transfer it as a series of chunks
3. chunked body contains chunk size which is represented in HEX (ah…) and the way I was receiving this page did not understand this encoding…
Solution?
Well I could re-write this with cURL that’s the best way to go and I will do that when I have a chance.
Since chunked encoding is only available in HTTP version 1.1. changing the version to 1.0 in request header solved this problem for now…