"First packet" may not always be the first packet, given Connection:
keep-alive. Basically what I'm saying is that there are no guarantees that
this will continue working, or work consistently. You seem to have a good
understanding of the risks involved, but for me I'd sooner use
child_process.spaw
Varnish is great at normalizing http headers. You could make this telco site a
"backend" on varnish and continue on parsing in node.
sub vcl_recv {
req.http.Content-Length = regsub(req.http.Content-Length, '^([0-9]+)', '\1');
}
On Jan 8, 2013, at 6:34 PM, Matt wrote:
> On Tue, Jan 8, 2013 at
On Tue, Jan 8, 2013 at 5:24 PM, Marcel Laverdet wrote:
> By heavy load I'm talking about network traffic, either on your end, their
> end, or any hop in between. "In the first packet" is certainly *not*
> something I'd recommend anyone to depend on, as that depends on a whole lot
> of things.
>
By heavy load I'm talking about network traffic, either on your end, their
end, or any hop in between. "In the first packet" is certainly *not*
something I'd recommend anyone to depend on, as that depends on a whole lot
of things.
The monkey patching is gross, but hey it works. The only thing here
Exactly, it's designed for this one service which always sends the
Content-Length capitalized like this, and screwed up with the comma, and in
the first packet. If there's other screwy things in the future we can deal
with them then. Believe me I know all about parsing headers (I had to write
the p
omg I can't believe you've done this.
Obviously this won't work if the server doesn't send
"Content-Length" capitalized like you have here, but if you're only
designing against one service that's not a huge issue. You should be aware
though that this may fail in certain rare circumstances, or unde
Rather than go into patching anything, I managed to get this to work:
r.on('request', function (req) {
req.on('socket', function () {
var oldOnData = req.socket.ondata;
var first_packet = true;
req.socket.ondata = function (d, start, end) {
Apply this patch:
https://gist.github.com/4487528
Node shouldn't be barfing on anything a browser can display and should
really be more tolerant of these failures. I should submit a PR.. but not
sure if this will cause other issues down the road.
On Tue, Jan 8, 2013 at 12:42 PM, Matt wrote:
> W
I mean, use the manual client as a fallback. It's only as good/hard as you
need it to be. You could simply look for the first instance of "\r\n\r\n"
and assume everything after that is the body. If you needed the headers,
just split on "\r\n" and then split on ":" and you'll get most of it.
Dep
On Tue, Jan 8, 2013 at 2:26 PM, Tim Caswell wrote:
> You can use the TCP client directly and hand-roll the http request. Your
> response won't be parsed as http (nor would you want to in the error case),
> but you can write a crude parser in js to get the bulk of it.
>
Yeah that occurred to me
On Tue, Jan 8, 2013 at 2:22 PM, Ryan Schmidt wrote:
> I'll bet you already have, but sending a bug report to whoever's serving
> that invalid content so that they can fix it seems like the best and
> simplest solution
>
Yeah - have contacted them but they're a big telco - I very much doubt
they'l
You can use the TCP client directly and hand-roll the http request. Your
response won't be parsed as http (nor would you want to in the error case),
but you can write a crude parser in js to get the bulk of it.
On Tue, Jan 8, 2013 at 12:42 PM, Matt wrote:
> We're doing web scraping using node
On Jan 8, 2013, at 12:42, Matt wrote:
> We're doing web scraping using node and coming across an issue that we cannot
> fetch a particular URL on a particular web site, because it sends back:
> "Content-Length: 1234,1234"
I'll bet you already have, but sending a bug report to whoever's servin
We're doing web scraping using node and coming across an issue that we
cannot fetch a particular URL on a particular web site, because it sends
back: "Content-Length: 1234,1234"
I totally understand that node's http parser doesn't deal with this, and
throws an error, but is there any way we can in
14 matches
Mail list logo