It might be worth trying with --fair-sched=yes, just in case what you see
is due to the unfairness of thread scheduling.
Philippe
On Fri, 2018-01-26 at 06:57 +0000, Wuweijia wrote:
> Hi:
>
> How large is 'm_nStep'? [Are you sure?]
>
> The source as below, all are the integer. Do you care what value ?.
> class CDynamicScheduling
> {
> public:
> static const int m_nDefaultStepUnit;
> static const int m_nDefaultStepFactor;
>
> private:
> int m_nBegin;
> int m_nEnd;
> int m_nStep;
> #if defined(_MSC_VER)
> std::atomic<int> m_nCurrent;
> #else
> int m_nCurrent;
> #endif
>
>
> I hope the actual source contains a comment such as:
> Compute pDst[] as the rounded average of non-overlapping 2x2 blocks of
> pixels in pSrc[].
>
> Yes, you are right. It just compute the average of 2 * 2 blocks
>
> I show you just the aarch64 neon code:
> This is same function, but implement is x86.
>
> UINT16 *pDstL;
> UINT16 *pSrcL;
> INT32 dstWDiv2 = srcW >> 1;
> // INT32 dstHDiv2 = srcH >> 1;
> INT32 x, y;
> INT32 posDst,posSrc;
>
> pSrcL = pSrc;
> pDstL = pDst;
>
> int beginY, endY;
> while (pDS->GetProcLoop(beginY, endY))
> {
> // for (y = 0; y < dstHDiv2; y++)
> for (y = beginY; y < endY; y++)
> {
> for (x = 0; x < dstWDiv2; x++)
> {
> posDst = y*dstStride + x;
> posSrc = (y<<1)*srcStride + (x<<1);
> pDstL[posDst] = (pSrcL[posSrc] + pSrcL[posSrc
> + 1] + pSrcL[posSrc+srcStride] + pSrcL[posSrc+srcStride + 1] + 2) >> 2;
> }
> }
> }
>
> pSrc is image buffer, about 11m. Width:3968 Height: 2976
> srcStride: 3968
> It meant four thread compute the average of 2 * 2 blocks
> pSrc is divided into many small pieces , and compute the average of
> every piceces, not by designed, by status of the running threads, maybe some
> threads hold the cpu ,so they compute more pieces, Maybe some thread not
> hold the cpu, compute less pieces ;
>
>
> BR
> Owen
>
> -----邮件原件-----
> 发件人: John Reiser [mailto:[email protected]]
> 发送时间: 2018年1月26日 12:44
> 收件人: [email protected]
> 主题: Re: [Valgrind-users] 答复: 答复: 答复: [Help] Valgrind sometime run the program
> very slowly sometimes , it last at least one hour. can you show me why or
> some way to analyze it?
>
> On 01/25/2018 15:37 UTC, Wuweijia wrote:
>
> > Function1:
> > bool CDynamicScheduling::GetProcLoop(
> > int& nBegin,
> > int& nEndPlusOne)
> > {
> > int curr = __sync_fetch_and_add(&m_nCurrent, m_nStep);
>
> How large is 'm_nStep'? [Are you sure?] The overhead expense of switching
> threads in valgrind would be reduced by making m_nStep as large as possible.
> It looks like the code in Function2 would produce the same values regardless.
>
>
> > if (curr > m_nEnd)
> > {
> > return false;
> > }
> >
> > nBegin = curr;
> > int limit = m_nEnd + 1;
>
> Local variable 'limit' is unused. By itself this is unimportant, but it
> might be a clue to something that is not shown here.
>
> > nEndPlusOne = curr + m_nStep;
> > return true;
> > }
> >
> >
> > Function2:
> > ....
> > int beginY, endY;
> > while (pDS->GetProcLoop(beginY, endY)){
> > for (y = beginY; y < endY; y++){
> > for(x = 0; x < dstWDiv2-7; x+=8){
> > vtmp0 = vld2q_u16(&pSrc[(y<<1)*srcStride+(x<<1)]);
> > vtmp1 = vld2q_u16(&pSrc[((y<<1)+1)*srcStride+(x<<1)]);
>
> I hope the actual source contains a comment such as:
> Compute pDst[] as the rounded average of non-overlapping 2x2 blocks of
> pixels in pSrc[].
>
> > vst1q_u16(&pDst[y*dstStride+x], (vtmp0.val[0] + vtmp0.val[1] +
> > vtmp1.val[0] + vtmp1.val[1] + vdupq_n_u16(2)) >> vdupq_n_u16(2));
> > }
> > for(; x < dstWDiv2; x++){
> > pDst[y*dstStride+x] = (pSrc[(y<<1)*srcStride+(x<<1)] +
> > pSrc[(y<<1)*srcStride+(x<<1)+1] + pSrc[((y<<1)+1)*srcStride+(x<<1)] +
> > pSrc[((y<<1)+1)*srcStride+((x<<1)+1)] + 2) >> 2;
> > }
> > }
> > }
> >
> > return;
> > }
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most engaging tech
> sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Valgrind-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/valgrind-users
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Valgrind-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/valgrind-users
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Valgrind-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/valgrind-users