This is kind of in line with what I said about deciding what your noise characteristics are and taking them into account. In this case you feel that the "noise" mostly very small but occasionally huge. If that's the case then the correct thing to do is to toss that measurement out (with patching things up left as another step).

I should be working so I may be telling you to do what you're already doing, but one approach to take would be to filter the data, probably with a filter of limited horizons (i.e. a linear FIR filter or a median filter), then compare the difference between the filtered and measured data point-by-point. Any place where the difference is larger than a threshold, call that point bad.

In Scilab-ish pseudo-code:

x = independent axis (your vertical axis, I think)
y = dependent axis

yf = filter(y) // insert correct filtering algorithm here

// From here on it could be real Scilab code
goodix = find(abs(y - yf) < threshold);

// This assumes no interpolation to replace bad points
x_good = x[goodix];
y_good = y[goodix];

You may need to use log(y) for this to work (i.e., use yf = filter(log(y))) -- I noticed that in the graph you posted the horizontal axis was in logarithms. If the error is always in one direction you may want to do your comparison without the "abs" operator -- use your judgment in this regard.

On wording your letter -- I suggest just a straightforward description of what you're seeing, and a request for information on whether they've seen it before. It may well be that it's either something that happens in some corner cases and you're lucky, or that you only _think_ that you're following directions. Speaking as a guy who spends a lot of time designing circuits and writing embedded software, it's also always possible that they've made some nifty upgrade to something and in the process introduced a bug -- in that case, unless they're stupidly arrogant, they'd like to hear from a sympathetic, cooperative user to help them clear things up.

On 2016-04-04 07:45, scilab.20.browse...@xoxy.net wrote:
Rafael/Stepahane/Tom,

The problem with using a median filter -- and actually any continuous
filter -- is that it implies that the median value of any n-group of
adjacent values is "more reliable" than the actual value *for every
value in the dataset*. And I'm really not convinced that is true for
this data.

In other words. Continuous filtering can adjust all the values in the
dataset; rather than just adjusting or rejecting the anomalous ones.
One (large) erroneous data point early in the dataset would impose an
influence upon the rest of the entire dataset causing a subtle shift
in one direction or the other. If there are multiple erroneous values
that all tend to be in the same direction -- as appears to be the case
with these data -- then that shift accumulates through the dataset.

And as an engineer, that feels wrong. If you're taking a set of
measurements and some external influence messes with one of them -- a
fly blocks your sensor -- you reject that single data point; not
spread some percentage of it through the rest of your readings.

I'm going to put in a request to the manufacturer of the equipment
that produces this data, to request an explanation of the cause of the
discontinuities; in the hope that might shed some light on the best
way to deal with them. (With luck they'll have some standard mechanism
for doing so.)

(I've been trying to word the request all weekend, but its difficult
to phrase it correctly.  These are the pre-eminent people in their
field; they don't know me, and I don't have an introduction; and their
equipment defines the standard for these types of measurements. It is
extremely difficult to formulate the request such that it does not
imply some shortcoming in their equipment or techniques.)

The data is magnetic field intensity vs field strength for samples of
amorphous metal. The measurement involves ramping the surrounding
field with one set of coils, and measuring the field strength induced
in the material with another set of coils. The samples have
hysteresis; the coils have hysteresis; the ambient surrounding can
influence. The equipment goes to great pains to adjust the speed of
ramping and sampling to try and eliminate discontinuities due to
hysteresis and eddy current effects.

I believe (at this point) that the discontinuities are due to these
effects "settling out"; and the right thing to do is to essentially
ignore them. My problem is how to go about that.

I've come up with something. (It almost certainly can be written in a
less prosaic way; but I'm still finding my feet in SciLab):

    plot2d(  ptype, h*1000, b, style = [ rgb( i ) ] );
    e = gce(); e.children.mark_style = 2;

    h1 = [h(1)]; b1 = [b(1)];
    for n=2:size(h,'r')
        if( (b(n) - b(n-1)) / (h(n) - h(n-1) + %eps) > 0 ) then
             h1 = [ h1, h(n) ]; b1 = [ b1, b(n) ];
        end
    end
    plot2d( ptype, h1*1000, b1, style = [ rgb( i + 1 ) ] );

    h = h1'; b = b1';
    h1 = [h(1)]; b1 = [b(1)];
    for n=2:size(h,'r')
        if( (b(n) - b(n-1)) / (h(n) - h(n-1) + %eps) > 0 ) then
             h1 = [ h1, h(n) ]; b1 = [ b1, b(n) ];
        end
    end
    plot2d( ptype, h1*1000, b1, style = [ rgb( i + 2 ) ] );

See the attached png. The black Xs are the raw data.
The red is the results of the first pass.
The green is the results of the second pass.
The purple are hand-drawn "what I think I'd like" lines.

What I like about this is that it only adjust (currently omits; but it
could interpolate replacements) points that fall outside the criteria.
As you said of the median filter; it doesn't guarantee monotonicity
after one pass (or even 2), but it only makes changes where they are
strictly required, leaving most of the raw data intact.

(Note: At this stage I'm not saying that is the right thing to do;
just that it seems to be :)

I'm not entirely happy with the results:

a) I think the had-drawn purple lines are a better representation of
the replaced data; but I can't divine the criteria to produce those?
b) I've hard coded two passes for this particular dataset; but I need
to repeat until no negative slopes remain; and I haven't worked out
how to do that yet.

Comments; rebuttals; referrals to the abuse of SciLab/math police;
along with better implementations of what I have; or better criteria
for solving my problem all actively sought.

Thanks, Buk.



-----Original Message-----
From: scilab.browseruk.b28bd2e902.jrafaelbguerra#hotmail....@ob.0sg.net
Sent: Mon, 4 Apr 2016 14:58:47 +0200
To: users@lists.scilab.org
Subject: Re: [Scilab-users] "Smoothing" very localised discontinuities in
(scilab: to exclusive) (scilab: to exclusive) curves.

If your data is not recorded in real-time, you can sort it (along the
x-axis)
and this does not imply that the "y(x) function" will become monotonous.
See
below.

As suggested, by Stephane Mottelet, see one 3-point median filter
solution below
applied to data similar to yours:


M = [1.0  -0.2;
        1.4   0.0;
        2.1   0.2;
        1.7   0.45;
        2.45  0.5;
        2.95  0.6;
        2.5   0.75;
        3.0   0.8;
        3.3   1.2];
x0 = M(:,1);
y0 = M(:,2);
clf();
plot2d(x0,[y0 y0],style=[5 -9]);
[x,ix] = gsort(x0,'g','i'); // sorting input x-axis
y = y0(ix);
k =1; // median filter half-lenght
n = length(x);
x(2:n+1)=x; y(2:n+1)=y;
x(1)=x(2); y(1)=y(2);
x(n+2)=x(n+1); y(n+2)=y(n+1);
n = length(x);
for j = 1:n
    j1 = max(1,j-k);
    j2 = min(n,j+k);
    ym(j) = median(y(j1:j2));
end
plot2d(x,ym+5e-3,style=[3],leg="3-point median filtering@"); // shift for
display purposes



This gets rid of obvious outliers but does not guarantee a monotonous
output
(idem for the more robust LOWESS technique, that can be googled).

Rafael


____________________________________________________________
Can't remember your password? Do you need a strong and secure password?
Use Password manager! It stores your passwords & protects your account.
Check it out at http://mysecurelogon.com/password-manager

_______________________________________________
users mailing list
users@lists.scilab.org
http://lists.scilab.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.scilab.org
http://lists.scilab.org/mailman/listinfo/users

Reply via email to