Hi Jesus!

I appreciate the info on the unicode error. I might have missed it,
but I also asked about the general microtask specifications. Here
was my original inquiry:
> And to clarify, my understanding is that the final result of this task
> is an index of Xen data, with two types: commits and messages.
> Each commit document should contain its original information
> from git, plus the name of the branch it was developed in. And
> should only the mbox messages which appear to be associated
> with a specific commit exist in the final index? Is there some
> key information in messages that is supposed to indicate the
> association of a given commit with a git branch? I would be
> grateful if you could specify the end goal a little more. :D

Yeah, so overall I'm not sure I understand the relationship of
branches to the mailing list messages. Is this to be a simple
string parsing task wherein I should scan the message body
for the word "branch"? (I am guessing not ;P)

I will be happy to get back on developing once I better grasp
the goal! :)

Thanks!

Heather

On Sun, Apr 16, 2017 at 4:23 PM, Jesus M. Gonzalez-Barahona <
j...@bitergia.com> wrote:

> On Thu, 2017-04-13 at 00:47 -0700, Heather Booker wrote:
> > Hi,
> >
> > I submitted an application for this code review dashboard and
> > would love to keep working on the microtask once I get some
> > more info. :)
>
> Great! I answered your message, could you progress with the task?
>
> > I also came up with a general idea of how the project might be
> > split up - any feedback on this would be welcome! I wrote:
> >
> > "As said by Jesus, the big picture of this project will be porting
> > everything behind the current code review dashboard to use
> > Grimoire Lab tools, from the current state of using
> > MetricsGrimoire and custom scripts. I expect this would involve
> > Perceval for analyzing data, and Grimoire Elk may be useful in
> > further stages, or may be too general - this is something I would
> > wish to explore.
> > This project will also involve a migration from SQL to Elasticsearch
> > - because I believe the relevant data is mostly / all available in
> > places online, I am unsure whether this would need to be a direct
> > migration. However, looking at the current SQL setup would be
> > beneficial to understanding the desired format of the Elasticsearch
> > indexes.
> > I would love to dive into this project and have 3 main parts -
> > getting
> > data into ES, turning it into dashboard displays, and then fine
> > tuning
> > and perhaps augmenting the dashboard to improve its usefulness.
> > Getting data into ES may seem simple but I believe that once it
> > needs to be used for the dashboard, many realizations will pop up
> > - thus I’d like to leave maybe 2-3 weeks for that first step, 6-7
> > weeks
> > for the visualizations (which will include querying the data), and
> > the
> > final 3 weeks for touch ups and improvements."
>
> The plan could be sound, but would need some tweaks, once your skills
> in Python are clear, which could be the main blocker for the first
> stages.
>
> > Does this sound like an accurate summary and reasonable timeline?
> > And I am guessing that from Jesus's involvement with the threads
> > that Jesus would be the mentor, is that correct? :)
>
> Yes, I would be ;-)
>
>         Jesus.
>
> > Thanks!
> >
> > Heather
> >
> >
> > On Sun, Apr 9, 2017 at 9:50 PM, Heather Booker <heather.j.booker@gmai
> > l.com> wrote:
> > > Hi Jesus,
> > >
> > > While using the Elasticsearch python library
> > > (https://elasticsearch-py.readthedocs.io/en/master/) to add mbox
> > > messages to an index, I would get a UnicodeEncodeError:
> > > "'utf-8' codec can't encode character '\udca0' in position 767:
> > > surrogates not allowed".
> > >
> > > Investigating in Grimoire elk https://github.com/grim
> > > oirelab/GrimoireELK/blob/96b00bc682485976104a6825ca63ae0
> > > 8639deacc/grimoire_elk/elk/mbox.py#L200 seems to show that
> > > perhaps that tool instead uses Latin-1 encoding, but I found that
> > > to then produce a serialization error (their custom error message:
> > > "Unable to serialize %r (type: %s)"). I suppose this is because
> > > now it's bytes; of course, converting back to string after encoding
> > > just cycles back to the first error.
> > >
> > > As somewhat of a Python newbie I don't really know how to tackle
> > > this! My thought atm is to splice the offending character out
> > > of the message.
> > >
> > > And to clarify, my understanding is that the final result of this
> > > task
> > > is an index of Xen data, with two types: commits and messages.
> > > Each commit document should contain its original information
> > > from git, plus the name of the branch it was developed in. And
> > > should only the mbox messages which appear to be associated
> > > with a specific commit exist in the final index? Is there some
> > > key information in messages that is supposed to indicate the
> > > association of a given commit with a git branch? I would be
> > > grateful if you could specify the end goal a little more. :D
> > >
> > > Thanks so much!
> > >
> > > Heather
> > >
> > >
> > >
> > > On Sat, Apr 8, 2017 at 10:02 AM, Jesus M. Gonzalez-Barahona <jgb@bi
> > > tergia.com> wrote:
> > > > On Fri, 2017-04-07 at 15:49 -0700, Heather Booker wrote:
> > > > > Hi Jesus,
> > > > >
> > > > > Thanks for your reply!
> > > > >
> > > > > So about the task, instructions say after analyzing mboxes with
> > > > > Perceval to
> > > > > "store the resulting raw index in ElasticSearch" - what does
> > > > raw
> > > > > index mean?
> > > >
> > > > In this context, I mean "storing the JSON documents produced by
> > > > Perceval in an ElasticSearch index, as such". ElasticSearch
> > > > stores JSON
> > > > documents, so it is just uploading the output of Perceval to it.
> > > >
> > > > > In terms of figuring out the elasticsearch structure, do I want
> > > > an
> > > > > index
> > > > > (xen-devel mbox) with a type (message) and each object from the
> > > > > perceval
> > > > > output to be one document? Or should it be more fine-grained?
> > > >
> > > > Exactly.
> > > >
> > > > Saludos,
> > > >
> > > >         Jesus.
> > > >
> > > > > Cheers,
> > > > >
> > > > > Heather
> > > > >
> > > > > On Thu, Apr 6, 2017 at 7:05 AM, Jesus M. Gonzalez-Barahona <jgb
> > > > @biter
> > > > > gia.com> wrote:
> > > > > > On Wed, 2017-04-05 at 16:43 -0700, Heather Booker wrote:
> > > > > > > Hi!
> > > > > > >
> > > > > > > I'd love to work on the Code Review Dashboard project for
> > > > this
> > > > > > round
> > > > > > > of Outreachy.
> > > > > >
> > > > > > Great!!
> > > > > >
> > > > > > > Are the steps outlined
> > > > > > > here http://markmail.org/message/7adkmords3imkswd still the
> > > > first
> > > > > > > contribution you'd like to see?
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > > So is this a project that has been worked on in previous
> > > > rounds
> > > > > > of
> > > > > > > GSOC/Outreachy also?
> > > > > > > If so is there a place to find links to the previous
> > > > participants
> > > > > > > blogs? :)
> > > > > >
> > > > > > No. We had one participation at some point, but couldn't even
> > > > start
> > > > > > for
> > > > > > personal reasons. There are some people considering working
> > > > on this
> > > > > > for
> > > > > > this next round of Outreachy, however. You'll see their
> > > > messages in
> > > > > > this mailing list.
> > > > > >
> > > > > > > Should questions about how the specifications/completion of
> > > > the
> > > > > > > microtask be addressed to
> > > > > > > IRC or this list? If IRC, which channel - #xen-opw or
> > > > #metrics-
> > > > > > > grimoire? On that note, I'm
> > > > > > > curious why #metrics-grimoire is the listed channel on the
> > > > > > project
> > > > > > > page - are main contributors
> > > > > > > involved in both projects? Or is it just because the Xen
> > > > > > dashboard
> > > > > > > doesn't have a channel?
> > > > > >
> > > > > > The code review is for the Xen project, but it is done with
> > > > (I
> > > > > > mean,
> > > > > > the ssoftware used for it is) GrimoireLab, which for
> > > > historical
> > > > > > reasons
> > > > > > uses the #metrics-grimoire channel. That's why it is likely
> > > > that
> > > > > > you
> > > > > > find somebody from the project there.
> > > > > >
> > > > > > If you have questions, and find me around in IRC, please ping
> > > > me.
> > > > > > If
> > > > > > I'm not available, please send an email message.
> > > > > >
> > > > > > Saludos,
> > > > > >
> > > > > >         Jesus.
> > > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > > > Heather
> > > > > > > _______________________________________________
> > > > > > > Xen-devel mailing list
> > > > > > > Xen-devel@lists.xen.org
> > > > > > > https://lists.xen.org/xen-devel
> > > > > > --
> > > > > > Bitergia: http://bitergia.com
> > > > > > /me at Twitter: https://twitter.com/jgbarah
> > > > > >
> > > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Xen-devel mailing list
> > > > > Xen-devel@lists.xen.org
> > > > > https://lists.xen.org/xen-devel
> > > > --
> > > > Bitergia: http://bitergia.com
> > > > /me at Twitter: https://twitter.com/jgbarah
> > > >
> > > >
> > >
> > >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > https://lists.xen.org/xen-devel
> --
> Bitergia: http://bitergia.com
> /me at Twitter: https://twitter.com/jgbarah
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Reply via email to