Politis Archontis wrote:


We start by setting up a large dense 3D loudspeaker setup in a fully anechoic 
chamber (usually between 25~35 speakers at a distance of ~2.5m), so that there 
is no additional room effect at reproduction. Then we decide on the composition 
of the sound scene (e.g. band, speakers, environmental sources), their 
directions of arrival and the surrounding room specifications. We then generate 
room impulse responses (RIR) using a physical room simulator for the specified 
room and source positions. We end up with one RIR for each speaker and for each 
source in the scene. Convolving these with our tests signals and combining the 
results we end up with an auralization of the intended scene. This part uses no 
spatial sound method at all, no panning for example - if a reflection falls 
between loudspeakers it is quantized to the closest one. The final loudspeaker 
signals we consider as the reference case (after listening to it and checking 
if it sounds ok).

Is it only me to notice that these "original scenes" look highly synthetical?

Maybe good for DirAC encoding/decoding, but a natural recording this is not...

BR

Stefan

P.S.: (Richard Lee )

Some good examples of 'natural' soundfield recordings with loadsa stuff
happening from all round are Paul Doombusch's Hampi, JH Roy's schoolyard &
John Leonard's Aran music.


--------------------------------------------------------------------------


Then we generate our recordings from that reference. either by encoding 
directly to ambisonic signals, by simulating a microphone array recording, or 
by putting a Soundfield or other microphone at the listening spot and 
re-recording the playback. These have been dependent on the study.

Finally the recordings are processed, and decoded back to the loudspeakers, 
usually to a subset of the full setup (e.g. horizontal, discrete surround, 
small 3D setup), or even to the full setup. That allows us to switch playback 
between the reference and the method.

The tests have been usually MUSHRA style, where the listeners are asked to 
judge perceived distance from the reference and various randomized playback 
methods (including a hidden reference and a low quality anchor, used to 
normalize the perceptual scale for each subject). The criteria are a 
combination of timbral distance/colouration, spatial distance, and artifacts if 
any.

I’ve left out various details from the above, but this is the general idea. 
Some publications that have used this approach are:


Vilkamo, J., Lokki, T., & Pulkki, V. (2009). Directional Audio Coding: Virtual 
Microphone-Based Synthesis and Subjective Evaluation. Journal of the Audio 
Engineering Society, 57(9), 709–724.

Politis, A., Vilkamo, J., & Pulkki, V. (2015). Sector-Based Parametric Sound 
Field Reproduction in the Spherical Harmonic Domain. IEEE Journal of Selected 
Topics in Signal Processing, 9(5), 852–866.

Politis, A., Laitinen, MV., Ahonen, A., Pulkki, V. (2015). Parametric Spatial 
Audio Processing of Spaced Microphone Array Recordings for Multichannel 
Reproduction. Journal of the Audio Engineering Society 63 (4), 216-227

Vilkamo, J., & Pulkki, V. (2014). Adaptive Optimization of Interchannel 
Coherence. Journal of the Audio Engineering Society, 62(12), 861–869.

Getting the listening test samples and generating recordings or virtual 
recordings from the references would be a lot of work for the time being.

What is easier and I can definitely do is process one or some of the recordings 
you mentioned for your speaker setup, and send you the results for   listening. 
There is no reference in this case, but you can compare against your preferred 
decoding method. And it would be interesting for me to hear you feedback too.

Best regards,
Archontis

On 05 Jul 2016, at 09:32, Richard Lee 
<rica...@justnet.com.au<mailto:rica...@justnet.com.au>> wrote:

Can you give us more detail about these tests and perhaps put some of these
natural recordings on ambisonia.com<http://ambisonia.com>?

The type of soundfield microphone used .. and particularly the accuracy of
its calibration ... makes a HUGE difference to the 'naturalness' of a
soundfield recording.

Some good examples of 'natural' soundfield recordings with loadsa stuff
happening from all round are Paul Doombusch's Hampi, JH Roy's schoolyard &
John Leonard's Aran music.  Musical examples include John Leonards Orfeo
Trio, Paul Hodges "It was a lover and his lass" and Aaron Heller's (AJH)
"Pulcinella".  The latter has individual soloists popping up in the
soundfield .. not pasted on, but in a very natural and delicious fashion
... as Stravinsky intended.

Also to my experience, and that doesn?t seem to be a very popular view
yet in ambisonic community, these parametric methods do not only upsample
or sharpen the image compared to direct first-order decoding, but they
actually reproduce the natural recording in a way that is closer
perceptually to how the original sounded, both spatially and in timbre.

Or at least that?s what our listening tests have shown in a number of
cases and recordings. And the directional sharpening is one effect, but
also the higher spatial decorrelation that they achieve (or lower
inter-aural coherence) in reverberant recordings is equally important.


_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
account or options, view archives and so on.

Reply via email to