It might be worth trying with --fair-sched=yes, just in case what you see is due to the unfairness of thread scheduling.
Philippe On Fri, 2018-01-26 at 06:57 +0000, Wuweijia wrote: > Hi: > > How large is 'm_nStep'? [Are you sure?] > > The source as below, all are the integer. Do you care what value ?. > class CDynamicScheduling > { > public: > static const int m_nDefaultStepUnit; > static const int m_nDefaultStepFactor; > > private: > int m_nBegin; > int m_nEnd; > int m_nStep; > #if defined(_MSC_VER) > std::atomic<int> m_nCurrent; > #else > int m_nCurrent; > #endif > > > I hope the actual source contains a comment such as: > Compute pDst[] as the rounded average of non-overlapping 2x2 blocks of > pixels in pSrc[]. > > Yes, you are right. It just compute the average of 2 * 2 blocks > > I show you just the aarch64 neon code: > This is same function, but implement is x86. > > UINT16 *pDstL; > UINT16 *pSrcL; > INT32 dstWDiv2 = srcW >> 1; > // INT32 dstHDiv2 = srcH >> 1; > INT32 x, y; > INT32 posDst,posSrc; > > pSrcL = pSrc; > pDstL = pDst; > > int beginY, endY; > while (pDS->GetProcLoop(beginY, endY)) > { > // for (y = 0; y < dstHDiv2; y++) > for (y = beginY; y < endY; y++) > { > for (x = 0; x < dstWDiv2; x++) > { > posDst = y*dstStride + x; > posSrc = (y<<1)*srcStride + (x<<1); > pDstL[posDst] = (pSrcL[posSrc] + pSrcL[posSrc > + 1] + pSrcL[posSrc+srcStride] + pSrcL[posSrc+srcStride + 1] + 2) >> 2; > } > } > } > > pSrc is image buffer, about 11m. Width:3968 Height: 2976 > srcStride: 3968 > It meant four thread compute the average of 2 * 2 blocks > pSrc is divided into many small pieces , and compute the average of > every piceces, not by designed, by status of the running threads, maybe some > threads hold the cpu ,so they compute more pieces, Maybe some thread not > hold the cpu, compute less pieces ; > > > BR > Owen > > -----邮件原件----- > 发件人: John Reiser [mailto:jrei...@bitwagon.com] > 发送时间: 2018年1月26日 12:44 > 收件人: valgrind-users@lists.sourceforge.net > 主题: Re: [Valgrind-users] 答复: 答复: 答复: [Help] Valgrind sometime run the program > very slowly sometimes , it last at least one hour. can you show me why or > some way to analyze it? > > On 01/25/2018 15:37 UTC, Wuweijia wrote: > > > Function1: > > bool CDynamicScheduling::GetProcLoop( > > int& nBegin, > > int& nEndPlusOne) > > { > > int curr = __sync_fetch_and_add(&m_nCurrent, m_nStep); > > How large is 'm_nStep'? [Are you sure?] The overhead expense of switching > threads in valgrind would be reduced by making m_nStep as large as possible. > It looks like the code in Function2 would produce the same values regardless. > > > > if (curr > m_nEnd) > > { > > return false; > > } > > > > nBegin = curr; > > int limit = m_nEnd + 1; > > Local variable 'limit' is unused. By itself this is unimportant, but it > might be a clue to something that is not shown here. > > > nEndPlusOne = curr + m_nStep; > > return true; > > } > > > > > > Function2: > > .... > > int beginY, endY; > > while (pDS->GetProcLoop(beginY, endY)){ > > for (y = beginY; y < endY; y++){ > > for(x = 0; x < dstWDiv2-7; x+=8){ > > vtmp0 = vld2q_u16(&pSrc[(y<<1)*srcStride+(x<<1)]); > > vtmp1 = vld2q_u16(&pSrc[((y<<1)+1)*srcStride+(x<<1)]); > > I hope the actual source contains a comment such as: > Compute pDst[] as the rounded average of non-overlapping 2x2 blocks of > pixels in pSrc[]. > > > vst1q_u16(&pDst[y*dstStride+x], (vtmp0.val[0] + vtmp0.val[1] + > > vtmp1.val[0] + vtmp1.val[1] + vdupq_n_u16(2)) >> vdupq_n_u16(2)); > > } > > for(; x < dstWDiv2; x++){ > > pDst[y*dstStride+x] = (pSrc[(y<<1)*srcStride+(x<<1)] + > > pSrc[(y<<1)*srcStride+(x<<1)+1] + pSrc[((y<<1)+1)*srcStride+(x<<1)] + > > pSrc[((y<<1)+1)*srcStride+((x<<1)+1)] + 2) >> 2; > > } > > } > > } > > > > return; > > } > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most engaging tech > sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Valgrind-users mailing list > Valgrind-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/valgrind-users > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Valgrind-users mailing list > Valgrind-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/valgrind-users ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users