Re: [Xpert]Using MMX assembly (for video card drivers)
[...] > BTW - does anyone know why the mga driver internally converts to 422 > format ? It seems to me that mga 400 and 450 chips do support 420 > planar format... (I saw some sample code using it, I can probably find > it back if needed). I think XFree would benefit from using this > feature instead of converting to nonplanar 422. I also wrote a patch for this several months ago (even before XFree86-4.1.0). If you're interested, I've uploaded it here : http://rambo.its.tudelft.nl/~ewald/XFree86-4.0.99.3-mga-xv-planar-data.patch It's about 13% faster decoding DVD movies on a PII-350 using planar format instead of converting to YUY2. Unfortunately, the Matrox hardware is not capable of filtering the chrominance component in vertical direction, so you can't have that at the same time. > Cheers, bye, ewald ___ Xpert mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/xpert
Re: [Xpert]Using MMX assembly (for video card drivers
> At 11:26 AM 4/01/02 +0100, Ewald Snel wrote: (sorry for the duplicate message, it was delayed for one week (see date)) [...] > It would be interesting to see if the same could be achieved with 3DNow! > instructions, as this would provide a welcome boost for anyone with an AMD > K6-2 or K6-3 or any of the other 3DNow! capable CPU's. I'm sure there are Using MMX will benefit any CPU capable of MMX instructions, including AMD K6, K6-2, K6-3 and Athlon/Duron processors. That's why I did not use SSE or 3DNow!. > also a number of other platforms that could use in-line assembly to do the > same (eg: PPC/Altivec). > > Out of interest, how much in-line assembly code are you referring to? > Anywhere some of us can get a look-see? Here's an image of what it looks like ... http://rambo.its.tudelft.nl/~ewald/xfree86-chrominance-filter.jpg And here are some patches ... http://rambo.its.tudelft.nl/~ewald/ bye, ewald ___ Xpert mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/xpert
Re: [Xpert]Using MMX assembly (for video card drivers)
At 11:26 AM 4/01/02 +0100, Ewald Snel wrote: >Hi, > >Could I use MMX assembly for improving the mga video driver? I wrote a >vertical chrominance filter (*) for the XVideo module using inline MMX >assembly. This allows me to improve output quality without any speed penalty. It would be interesting to see if the same could be achieved with 3DNow! instructions, as this would provide a welcome boost for anyone with an AMD K6-2 or K6-3 or any of the other 3DNow! capable CPU's. I'm sure there are also a number of other platforms that could use in-line assembly to do the same (eg: PPC/Altivec). Out of interest, how much in-line assembly code are you referring to? Anywhere some of us can get a look-see? Stuart Young - [EMAIL PROTECTED] (aka Cefiar) - [EMAIL PROTECTED] [All opinions expressed in the above message are my] [own and not necessarily the views of my employer..] ___ Xpert mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/xpert
Re: [Xpert]Using MMX assembly (for video card drivers)
On Fri, Jan 04, 2002 at 11:26:02AM +0100, Ewald Snel wrote: > Could I use MMX assembly for improving the mga video driver? I wrote a > vertical chrominance filter (*) for the XVideo module using inline MMX > assembly. This allows me to improve output quality without any speed penalty. BTW - does anyone know why the mga driver internally converts to 422 format ? It seems to me that mga 400 and 450 chips do support 420 planar format... (I saw some sample code using it, I can probably find it back if needed). I think XFree would benefit from using this feature instead of converting to nonplanar 422. Cheers, -- Michel "Walken" LESPINASSE Is this the best that god can do ? Then I'm not impressed. ___ Xpert mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/xpert
Re: [Xpert]Using MMX assembly (for video card drivers)
To reply to my own mail :) Billy Biggs ([EMAIL PROTECTED]): > > It's actually 0.5 pixel (my mistake :)) using the following filter : > > > > o o (c=c1) > > c1 > > o o (c=.5*c1 + .5*c2) > > > > o o (c=c2) > > c2 > > o o (c=.5*c2 + .5*c3) > > I don't think this is right for MPEG2. I sent this and realized I might look like an asshole. :) This should read: Thanks, I see what you mean now, and yeah, I think this filter is wrong for filtering chroma from MPEG2. :) Apologies. -- Billy Biggs [EMAIL PROTECTED] ___ Xpert mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/xpert
Re: [Xpert]Using MMX assembly (for video card drivers)
Ewald Snel ([EMAIL PROTECTED]): > > Please, please correct me if I'm wrong here. In MPEG sampling, the > > chrominance sample is halfway between the two luminance samples on > > the same vertical scanline (by is138182): > > I think you're right, my interpolation looks like this : > > o o (c=.75*c1 + .25*c0) > c1 > o o (c=.75*c1 + .25*c2) > > o o (c=.75*c2 + .25*c1) > c2 > o o (c=.75*c2 + .25*c3) You mean you think I'm wrong. :) My picture was wrong, yours makes total sense, and I now believe your filter is reasonable. > It's actually 0.5 pixel (my mistake :)) using the following filter : > > o o (c=c1) > c1 > o o (c=.5*c1 + .5*c2) > > o o (c=c2) > c2 > o o (c=.5*c2 + .5*c3) I don't think this is right for MPEG2. -- Billy Biggs [EMAIL PROTECTED] ___ Xpert mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/xpert
Re: [Xpert]Using MMX assembly (for video card drivers)
Hi, [...] > > Something like that, the filter uses 0.75x nearest chrominance sample > > and 0.25x second nearest chrominance sample. This is more accurate as > > it doesn't shift the chrominance signal by 1 pixel. > > Please, please correct me if I'm wrong here. In MPEG sampling, the > chrominance sample is halfway between the two luminance samples on the > same vertical scanline (by is138182): I think you're right, my interpolation looks like this : o o (c=.75*c1 + .25*c0) c1 o o (c=.75*c1 + .25*c2) o o (c=.75*c2 + .25*c1) c2 o o (c=.75*c2 + .25*c3) [...] > So, are not the chroma samples above and below the same distance away? > I thought this was the purpose of MPEG sampling, that is, it's > reasonable to convert to 4:2:2 sampling by doubling the scanlines. It's reasonable, but doubling the scanlines will make the image look a little blocky as both scanlines use the same chrominance values. That's why you should use filtering. > Are you sure that maybe the images where you see that nasty chroma > artifact aren't from when the DVD is using interlaced encoding? In this > case, each second chroma sample is from a different field, and you can > get blocky errors because you don't correllate samples correctly. The source was a non-interlaced MPEG-1 video file. The red blocks are very small for (high resolution) DVD movies, but they are still visible. > What do you mean by shifting the chroma by one pixel? It's actually 0.5 pixel (my mistake :)) using the following filter : o o (c=c1) c1 o o (c=.5*c1 + .5*c2) o o (c=c2) c2 o o (c=.5*c2 + .5*c3) bye, ewald ___ Xpert mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/xpert
Re: [Xpert]Using MMX assembly (for video card drivers)
On Fri, 4 Jan 2002, Billy Biggs wrote: > Please, please correct me if I'm wrong here. In MPEG sampling, the > chrominance sample is halfway between the two luminance samples on the > same vertical scanline (by is138182): > >o o where o == luma sample >x x == chroma sample >o o Note that this depends on which version of MPEG you're talking about. I forget which (I can look it up if anyone's interested), but one of the MPEG standards specifies that the chroma samples are located between the lumas in both dimensions, i.e.: o o x o o > So, are not the chroma samples above and below the same distance away? > I thought this was the purpose of MPEG sampling, that is, it's > reasonable to convert to 4:2:2 sampling by doubling the scanlines. Possibly, but you have to beware what the chroma position is for the 4:2:2 as well. If the 4:2:2 specifies colocated first luma and chroma, it will work nicely for the first form (above). If in the middle, it'll work for the second form. > What do you mean by shifting the chroma by one pixel? If a chroma sample is colocated with a luma sample (in either dimension), you get the following: ooooo x x |^|^| Where a single chroma sample impacts three adjacent pixels (note the difference between pixel and sample...), and the luma samples in the middle actually get chroma from two different chroma samples. In this case you have to give differing amounts to each new (resampled) sample, according to the percentages mentioned previously. Erik Walthinsen <[EMAIL PROTECTED]> - System Administrator __ / \GStreamer - The only way to stream! || M E G A* http://gstreamer.net/ * _\ /_ ___ Xpert mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/xpert
Re: [Xpert]Using MMX assembly (for video card drivers)
Ewald Snel ([EMAIL PROTECTED]): > > > I wrote a vertical chrominance filter (*) for the XVideo module > > > using inline MMX assembly. This allows me to improve output > > > quality without any speed penalty. > > > > Do you mean for upsampling to 4:2:2 ? How do you filter? Do you > > average to create the new chroma line? > > Something like that, the filter uses 0.75x nearest chrominance sample > and 0.25x second nearest chrominance sample. This is more accurate as > it doesn't shift the chrominance signal by 1 pixel. Please, please correct me if I'm wrong here. In MPEG sampling, the chrominance sample is halfway between the two luminance samples on the same vertical scanline (by is138182): o o where o == luma sample x x == chroma sample o o So, if we look vertically down a 2-pixel wide line, we see: o1 o x1 o2 o o == luma sample x2x == chroma sample o3 o x3 o4 o So, are not the chroma samples above and below the same distance away? I thought this was the purpose of MPEG sampling, that is, it's reasonable to convert to 4:2:2 sampling by doubling the scanlines. Are you sure that maybe the images where you see that nasty chroma artifact aren't from when the DVD is using interlaced encoding? In this case, each second chroma sample is from a different field, and you can get blocky errors because you don't correllate samples correctly. What do you mean by shifting the chroma by one pixel? -- Billy Biggs [EMAIL PROTECTED] ___ Xpert mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/xpert
Re: [Xpert]Using MMX assembly (for video card drivers)
Hi, > > I wrote a vertical chrominance filter (*) for the XVideo module using > > inline MMX assembly. This allows me to improve output quality without > > any speed penalty. > > Do you mean for upsampling to 4:2:2 ? How do you filter? Do you > average to create the new chroma line? Something like that, the filter uses 0.75x nearest chrominance sample and 0.25x second nearest chrominance sample. This is more accurate as it doesn't shift the chrominance signal by 1 pixel. Here are the patches, the second one is for enabling the horizontal filtering in hardware: http://rambo.its.tudelft.nl/~ewald/XFree86-4.1.99.4-mga-xv-mmx-chromafilter.patch http://rambo.its.tudelft.nl/~ewald/XFree86-4.2.0-mga-xv-uvfilter.patch These are not paired for Pentium MMX, but performance is already better than the C version (which compiles to slow "movzx" instructions). It's nearly optimal for AMD Athlon though (about 2 IPC using L1-cache). bye, ewald ___ Xpert mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/xpert
Re: [Xpert]Using MMX assembly (for video card drivers)
> > I've also been playing with some mmx-ification of the XVideo > > routines, for example I also did an SSE-4:2:0-to-4:2:2 function. > > I just did this too, MMX only though. How many cycles/pixel did you > end up with? What percentage of pairing did you achieve? I'll get some numbers in a sec. > > There was some discussion on #xfree86 about efforts to have a nice > > runtime detection mechanism somewhere. Has anyone got any code for > > this already done? If not I might also have a go at it. > > there are plenty of samples of this on Intel's site. And in many nice abstracted open source modules. :) Specifically I meant code to put this somewhere appropriate in the X tree. -- Billy Biggs [EMAIL PROTECTED] ___ Xpert mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/xpert
Re: [Xpert]Using MMX assembly (for video card drivers)
On Fri, 4 Jan 2002, greg wright wrote: > I just did this too, MMX only though. How many cycles/pixel did you > end up with? What percentage of pairing did you achieve? Note that only P5-core chips care about pairing, per-se. There are much nastier issues involved in modern P6 cores. I haven't thought about them for quite a while, so it'd take me a while to dig out the stuff and put it back into main memory, but I think I have a pretty good understanding of how the P6 really works... > there are plenty of samples of this on Intel's site. Unfortunately that just isn't very useful outside Intel's world. There are about a half-dozen manufacturers of x86 chips that matter, and they all have all sortsof bizarre quirks. I ran across a sourceforge project a few days ago (x86info I think) that tries to deal with that, but I didn't look at the code. There's a larger issue when it comes to other architectures. There are similar but in some cases nastier problems on things like PPC and Alpha. This is why I want to gather all this into a single library. It would go closely with my other projects, SpeciaLib and libcodec, which focus on run-time specialization of time-critical kernels, such as the motion-compensation code in an MPEG decoder, or color-space conversion/transliterations, etc. (as in the 4:2:0 to 4:2:2 problem). You can see a lot of this stuff at http://codecs.org/, though specialib itself isn't there because it's not anywhere near formed enough for CVS. Erik Walthinsen <[EMAIL PROTECTED]> - System Administrator __ / \GStreamer - The only way to stream! || M E G A* http://gstreamer.net/ * _\ /_ ___ Xpert mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/xpert
Re: [Xpert]Using MMX assembly (for video card drivers)
> I've also been playing with some mmx-ification of the XVideo routines, > for example I also did an SSE-4:2:0-to-4:2:2 function. I just did this too, MMX only though. How many cycles/pixel did you end up with? What percentage of pairing did you achieve? > There was some discussion on #xfree86 about efforts to have a nice > runtime detection mechanism somewhere. Has anyone got any code for this > already done? If not I might also have a go at it. there are plenty of samples of this on Intel's site. --greg ___ Xpert mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/xpert
Re: [Xpert]Using MMX assembly (for video card drivers)
Ewald Snel ([EMAIL PROTECTED]): > Of course, I'm using "#ifdef USE_MMX_ASM" and the original C code as > an alternative for other CPU architectures. Runtime detection of MMX > support is not included yet, but will be added if MMX is allowed. I've also been playing with some mmx-ification of the XVideo routines, for example I also did an SSE-4:2:0-to-4:2:2 function. There was some discussion on #xfree86 about efforts to have a nice runtime detection mechanism somewhere. Has anyone got any code for this already done? If not I might also have a go at it. -- Billy Biggs [EMAIL PROTECTED] ___ Xpert mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/xpert