Dependent on language and libraries available you should try making a HEAD request for each of the URLs you extract. This would return only the headers of the endpoint, and inside this list you should get the mime-type of the content. There are a ton of video mime-types but it should be easy to look for them once you extract the headers.
On May 31, 1:07 pm, Nick Arnett <nick.arn...@gmail.com> wrote: > On Sun, May 31, 2009 at 4:53 AM, grand_unifier <jijodasgu...@gmail.com>wrote: > > > > > i have written a code to get all tweets that have urls in them in atom > > or json format..... > > > now i want a way to: > > > 1>separate the urls from the tweets....like a tweetmeme way... > > 2>find out if the url represents a video... > > > how will i do that?? > > I don't think anyone can answer this in detail without knowing what language > are you writing this code in. You should be able to use a regular > expression to extract the URLs and then use the file extension to detect > whether or not it is a direct link to a video file. But if it is a link to > a page that contains a video, you'll have to fetch the page and examine its > links. > > There are some URL patterns that you probably can assume point to pages that > contain video, such as YouTube URLs. > > Nick