Dependent on language and libraries available you should try making a
HEAD request for each of the URLs you extract.  This would return only
the headers of the endpoint, and inside this list you should get the
mime-type of the content.  There are a ton of video mime-types but it
should be easy to look for them once you extract the headers.



On May 31, 1:07 pm, Nick Arnett <nick.arn...@gmail.com> wrote:
> On Sun, May 31, 2009 at 4:53 AM, grand_unifier <jijodasgu...@gmail.com>wrote:
>
>
>
> > i have written a code to get all tweets that have urls in them in atom
> > or json format.....
>
> > now i want a way to:
>
> > 1>separate the urls from the tweets....like a tweetmeme way...
> > 2>find out if the url represents a video...
>
> > how will i do that??
>
> I don't think anyone can answer this in detail without knowing what language
> are you writing this code in.  You should be able to use a regular
> expression to extract the URLs and then use the file extension to detect
> whether or not it is a direct link to a video file.  But if it is a link to
> a page that contains a video, you'll have to fetch the page and examine its
> links.
>
> There are some URL patterns that you probably can assume point to pages that
> contain video, such as YouTube URLs.
>
> Nick

Reply via email to