What's the correct way to get an article body?

I'm using java.util.logging.Logger to catch org.apache.commons.net.MalformedServerReplyException to a log file:

    15  <record>
    16    <date>2012-03-24T03:09:35</date>
    17    <millis>1332583775299</millis>
    18    <sequence>1</sequence>
    19    <logger>gwene.LogUtils</logger>
    20    <level>INFO</level>
    21    <class>gwene.LogUtils</class>
    22    <method>logArticles</method>
    23    <thread>1</thread>
    24    <message>Could not parse response code.
25 Server Reply: &lt;p&gt;Alex &amp;#8220;Hurricane&amp;#8221; Higgins, transformer of snooker, died on July 24th, aged
...text snipped...
mercilessly, one by one. ...&lt;/p&gt;&lt;div class="feedflare"&gt;</message>
    26  </record>


The server reply is *exactly* what I'm missing, the content of the article. code and full output:

https://gist.github.com/2180843

I'm guessing that the HTML is throwing things off? What does NNTPClient.retrieveArticleBody expect? After all, anything can be in an NNTP post.

Now, what I'm really after, I suppose, is the server reply because that has the body of the NNTP article. However, surely, that's not the way to use org.apache.commons.net.nntp.NNTPClient, only I can't find the correct way. Hence this kludge to grab the MalformedServerReply instead of parsing it.

I suppose it's possible to log everything, and then parse the log file, but that seems like a very complex way of doing a simple thing.

The API documentation for NNTPClient assumes a knowledge of NNTP which, unfortunately, I don't have. I've looked through the example code and don't see any samples where article bodies are parsed. The closest I see is NNTPClient.retrieveArticleBody:

https://commons.apache.org/net/api-3.1/org/apache/commons/net/nntp/NNTPClient.html#retrieveArticleBody%28java.lang.String%29

however, that's just malformed content. Presumably, since Pan can connect with gmane fine, that's not the problem. Also, by looking in the Pan newsreader, NNTPClient.retrieveArticleBody results match with what I'm after -- namely, the body of the article.

What is the correct way to grab the article body? I've looked through the API quite thoroughly.

Surely there must be an example for parsing the article body, not just the header. Or, at least, using BufferedReader to get the article body and assign it to a String. If so, I don't see a better method available through the API.



thanks,

Thufir

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org

Reply via email to