Hi Stefan,

No it is not only you :-), I thought I was clear that these references are 
totally synthetic. We just try to make sure that they are reproducible, they do 
sound natural, they are physically-based and they do not rely on any perceptual 
spatialization methods themselves (ambisonic, panning or anything else). Then 
we consider that as a plausible reference. As I said we do not have access to 
an original soundfield, so synthetic is our next best call. 

This relates to the difficult question, I believe, of what is the best way to 
assess transparency in a reproduction method of spatial recordings (compared, 
for example, to transparency of spatial audio coding with playback of 5.0 
material and its spatially compressed version, which is a much easier task 
since there is a clear reference). For most cases transparency is not of 
interest, and an overall perceptual quality is more important. However we have 
done these comparisons in the way I described, published the results and 
somebody interested can extract their own conclusions. And if they're good for 
DirAC decoding, then maybe they're good for other decoding approaches.

Regards,
Archontis



> On 05 Jul 2016, at 21:23, Stefan Schreiber <st...@mail.telepac.pt> wrote:
> 
> Politis Archontis wrote:
> 
>> 
>> We start by setting up a large dense 3D loudspeaker setup in a fully 
>> anechoic chamber (usually between 25~35 speakers at a distance of ~2.5m), so 
>> that there is no additional room effect at reproduction. Then we decide on 
>> the composition of the sound scene (e.g. band, speakers, environmental 
>> sources), their directions of arrival and the surrounding room 
>> specifications. We then generate room impulse responses (RIR) using a 
>> physical room simulator for the specified room and source positions. We end 
>> up with one RIR for each speaker and for each source in the scene. 
>> Convolving these with our tests signals and combining the results we end up 
>> with an auralization of the intended scene. This part uses no spatial sound 
>> method at all, no panning for example - if a reflection falls between 
>> loudspeakers it is quantized to the closest one. The final loudspeaker 
>> signals we consider as the reference case (after listening to it and 
>> checking if it sounds ok).
>> 
> 
> Is it only me to notice that these "original scenes" look highly synthetical?
> 
> Maybe good for DirAC encoding/decoding, but a natural recording this is not...
> 
> BR
> 
> Stefan
> 
> P.S.: (Richard Lee )
> 
>> Some good examples of 'natural' soundfield recordings with loadsa stuff
>> happening from all round are Paul Doombusch's Hampi, JH Roy's schoolyard &
>> John Leonard's Aran music.
>> 
> 
> --------------------------------------------------------------------------
> 
> 
>> Then we generate our recordings from that reference. either by encoding 
>> directly to ambisonic signals, by simulating a microphone array recording, 
>> or by putting a Soundfield or other microphone at the listening spot and 
>> re-recording the playback. These have been dependent on the study.
>> 
>> Finally the recordings are processed, and decoded back to the loudspeakers, 
>> usually to a subset of the full setup (e.g. horizontal, discrete surround, 
>> small 3D setup), or even to the full setup. That allows us to switch 
>> playback between the reference and the method.
>> 
>> The tests have been usually MUSHRA style, where the listeners are asked to 
>> judge perceived distance from the reference and various randomized playback 
>> methods (including a hidden reference and a low quality anchor, used to 
>> normalize the perceptual scale for each subject). The criteria are a 
>> combination of timbral distance/colouration, spatial distance, and artifacts 
>> if any.
>> 
>> I’ve left out various details from the above, but this is the general idea. 
>> Some publications that have used this approach are:
>> 
>> 
>> Vilkamo, J., Lokki, T., & Pulkki, V. (2009). Directional Audio Coding: 
>> Virtual Microphone-Based Synthesis and Subjective Evaluation. Journal of the 
>> Audio Engineering Society, 57(9), 709–724.
>> 
>> Politis, A., Vilkamo, J., & Pulkki, V. (2015). Sector-Based Parametric Sound 
>> Field Reproduction in the Spherical Harmonic Domain. IEEE Journal of 
>> Selected Topics in Signal Processing, 9(5), 852–866.
>> 
>> Politis, A., Laitinen, MV., Ahonen, A., Pulkki, V. (2015). Parametric 
>> Spatial Audio Processing of Spaced Microphone Array Recordings for 
>> Multichannel Reproduction. Journal of the Audio Engineering Society 63 (4), 
>> 216-227
>> 
>> Vilkamo, J., & Pulkki, V. (2014). Adaptive Optimization of Interchannel 
>> Coherence. Journal of the Audio Engineering Society, 62(12), 861–869.
>> 
>> Getting the listening test samples and generating recordings or virtual 
>> recordings from the references would be a lot of work for the time being.
>> 
>> What is easier and I can definitely do is process one or some of the 
>> recordings you mentioned for your speaker setup, and send you the results 
>> for   listening. There is no reference in this case, but you can compare 
>> against your preferred decoding method. And it would be interesting for me 
>> to hear you feedback too.
>> 
>> Best regards,
>> Archontis
>> 
>> On 05 Jul 2016, at 09:32, Richard Lee 
>> <rica...@justnet.com.au<mailto:rica...@justnet.com.au>> wrote:
>> 
>> Can you give us more detail about these tests and perhaps put some of these
>> natural recordings on ambisonia.com<http://ambisonia.com>?
>> 
>> The type of soundfield microphone used .. and particularly the accuracy of
>> its calibration ... makes a HUGE difference to the 'naturalness' of a
>> soundfield recording.
>> 
>> Some good examples of 'natural' soundfield recordings with loadsa stuff
>> happening from all round are Paul Doombusch's Hampi, JH Roy's schoolyard &
>> John Leonard's Aran music.  Musical examples include John Leonards Orfeo
>> Trio, Paul Hodges "It was a lover and his lass" and Aaron Heller's (AJH)
>> "Pulcinella".  The latter has individual soloists popping up in the
>> soundfield .. not pasted on, but in a very natural and delicious fashion
>> ... as Stravinsky intended.
>> 
>> Also to my experience, and that doesn?t seem to be a very popular view
>> yet in ambisonic community, these parametric methods do not only upsample
>> or sharpen the image compared to direct first-order decoding, but they
>> actually reproduce the natural recording in a way that is closer
>> perceptually to how the original sounded, both spatially and in timbre.
>> 
>> Or at least that?s what our listening tests have shown in a number of
>> cases and recordings. And the directional sharpening is one effect, but
>> also the higher spatial decorrelation that they achieve (or lower
>> inter-aural coherence) in reverberant recordings is equally important.
>> 
> 
> 
> _______________________________________________
> Sursound mailing list
> Sursound@music.vt.edu
> https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
> account or options, view archives and so on.

_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
account or options, view archives and so on.

Reply via email to