@Nirmal, okay i'll arange it today.
@Mahesan
Thanks for the suggestion. yes 100 must me too high for some cases. I
thought that during 100 iterations most probably it will converge to stable
clusters. Thats why I put 100. yes as cases like k = 100 it might be not
enough. Thanks and ill try with d
@Ashen let's have a code review today, if it's possible.
@Srinath Forgot to mention that I've already given some feedback to Ashen,
on how he could use Spark transformations effectively in his code.
On Tue, Aug 25, 2015 at 4:33 PM, Ashen Weerathunga wrote:
> Okay sure.
>
> On Tue, Aug 25, 2015
Hi Ashen
Thank you for sharing the results.
When I looked at the last column - anomaly data %
the best value 99.04% results in for 3 clusters with 100 iterations
and
the worst case (28.12%) for 100 clusters with 100 iterations.
This would happen as k increases (with fixed number of iterations)
Okay sure.
On Tue, Aug 25, 2015 at 3:55 PM, Nirmal Fernando wrote:
> Sure. @Ashen, can you please arrange one?
>
> On Tue, Aug 25, 2015 at 2:35 PM, Srinath Perera wrote:
>
>> Nirmal, Seshika, shall we do a code review? This code should go into ML
>> after UI part is done.
>>
>> Thanks
>> Srinat
Sure. @Ashen, can you please arrange one?
On Tue, Aug 25, 2015 at 2:35 PM, Srinath Perera wrote:
> Nirmal, Seshika, shall we do a code review? This code should go into ML
> after UI part is done.
>
> Thanks
> Srinath
>
> On Tue, Aug 25, 2015 at 2:20 PM, Ashen Weerathunga wrote:
>
>> Hi all,
>>
Nirmal, Seshika, shall we do a code review? This code should go into ML
after UI part is done.
Thanks
Srinath
On Tue, Aug 25, 2015 at 2:20 PM, Ashen Weerathunga wrote:
> Hi all,
>
> This is the source code of the project.
> https://github.com/ashensw/Spark-KMeans-fraud-detection
>
> Best Regard
Hi all,
This is the source code of the project.
https://github.com/ashensw/Spark-KMeans-fraud-detection
Best Regards,
Ashen
On Tue, Aug 25, 2015 at 2:00 PM, Ashen Weerathunga wrote:
> Thanks all for the suggestions,
>
> There are few assumptions I have made,
>
>- Clusters are uniform
>
Thanks all for the suggestions,
There are few assumptions I have made,
- Clusters are uniform
- Fraud data always will be outliers to the normal clusters
- Clusters are not intersect with each other
- I have given the number of Iterations as 100. So I assume that 100
iterations wil
Is there any particular reason why you are putting aside 65% of anomalous
data at the evaluation? Since there is an obvious imbalance when the
numbers of normal and abnormal cases are taken into account, you will get
greater accuracy at the evaluation because a model tends to produce more
accurate
Hi Ashen,
It would be better if you can add the assumptions you make in this process
(uniform clusters etc). It will make the process more clear IMO.
Regards,
CD
On Tue, Aug 25, 2015 at 11:39 AM, Nirmal Fernando wrote:
> Can we see the code too?
>
> On Tue, Aug 25, 2015 at 11:36 AM, Ashen Weer
Can we see the code too?
On Tue, Aug 25, 2015 at 11:36 AM, Ashen Weerathunga wrote:
> Hi all,
>
> I am currently working on fraud detection project. I was able to cluster
> the KDD cup 99 network anomaly detection dataset using apache spark k means
> algorithm. So far I was able to achieve 99% a
Hi all,
I am currently working on fraud detection project. I was able to cluster
the KDD cup 99 network anomaly detection dataset using apache spark k means
algorithm. So far I was able to achieve 99% accuracy rate from this
dataset.The steps I have followed during the process are mentioned below.
12 matches
Mail list logo