On Sun, Nov 25, 2012 at 3:30 PM, anatoly techtonik <techto...@gmail.com>wrote:

> On Sun, Nov 25, 2012 at 1:56 AM, Ezio Melotti <ezio.melo...@gmail.com>wrote:
>
>> On Sun, Nov 25, 2012 at 12:24 AM, anatoly techtonik 
>> <techto...@gmail.com>wrote:
>>
>>> On Sat, Nov 24, 2012 at 7:04 AM, Ezio Melotti <ezio.melo...@gmail.com>
>>>  wrote:
>>>
>>>> Thanks for your work!
>>>>
>>>> I played a bit with the code tonight and used it to create a json that
>>>> maps filename -> list of issues with a patch that affects that filename.
>>>>
>>>
>>> Nice. Is it possible to add this lookup to the post commit hook script
>>> to report about amount of patches available for files that had
>>> been committed? It is much easier to review code that's already in your
>>> mind.
>>>
>>
>> I'm not sure I understand what you are asking here.  Are you suggesting
>> to add a mercurial hook that, once you commit/push something, suggests
>> other issues with patches that affect the same file(s)?
>>
>
> Right. After you've pushed something, a short message from server on the
> screen:
>
>   Thanks for your contribution to the module XXX, YYY, ... .
>
>   Please note that files that you've touched have X open patches
>   on the bug tracker. It might be the best time to review some now.
>   ...
>
> This serves two purpose:
> 1. Encourages people to review patches or do something else about them
> (split issues) to avoid languishing
> 2. Complete stdlib.json mapping (yes, I am lazy, and it will be more fun
> for people to fill their missing modules themselves)
>

It might turns out to be annoying for developers though.  If they are the
maintainers of some specific module they probably know the other issues
already, if they aren't they might not care.  This will also make pushing
slower and might introduce problems.


>
>  This could be done, but I think it's better to make the data available in
>> the tracker so that developers can search other issues themselves.
>>
>
> Tracker integration is the next logical step (because it requires more
> effort). It is good to have both, because we can't enforce any process
> other than people used to. There is a chance that new search capabilities
> just won't be used.
>

There is this possibility, but I don't think that forcing the user to look
at the related issues is a good idea either.  Integrating this and
advertising it on python-dev is enough IMHO.


>
>>  I made a simple page to filter the results and uploaded it here for
>>>> now: http://wolfprojects.altervista.org/issues.html
>>>> It requires javascript and it's a bit slow (at least on my pc), but it
>>>> allows you to enter a module name or path and it will list all the issues
>>>> related to the files that match the search (regex search should also work).
>>>> This is still a work in progress though.
>>>> If you want you can find the json at
>>>> http://wolfprojects.altervista.org/files.json
>>>
>>>
>>> Good work. I've pulled all the changes, but now I am getting:
>>>
>>> Traceback (most recent call last):
>>>   File "modstats.py", line 142, in <module>
>>>      print('#%s: %s' % (issuen, issue['title']))
>>>   File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
>>>     return codecs.charmap_encode(input,errors,encoding_map)
>>> UnicodeEncodeError: 'charmap' codec can't encode character u'\u2019' in
>>> position 65: character maps to <undefined>
>>>
>>
>> That's probably due to the limitations of the windows console.
>>
>
> http://wiki.python.org/moin/PrintFails#Windows described the problem, but
> proposes no acceptable solution. Setting environment variable beforehand is
> like placing a mattress just before hitting the ground. Why it is not
> possible to fix the situation from inside of script?
>

Try  print('#%s: %r' % (issuen, issue['title'])), it should print the repr
of the title and use an \uxxxx escape instead of the equivalent Unicode
char.
Otherwise you can just drop the title and print only the issue number, the
title is not so useful anyway.


>>> Path cleaning is a good thing. Good auto classification also needs some
>>> rules that needs additional data:
>>>   - detect full path from just filename
>>>
>>
>> The cleanup function I wrote just removes extraneous things from the
>> path.  It doesn't verify if the file exists in the Python codebase.
>>
>>
>>>     - if path is unknown, analyse filename
>>>       - if filename is unique in Python source tree, return it's path
>>>       - if filename is not unique, compare parent path components
>>> recursively
>>>         - if not successful, try context match patch
>>>         - if everything fails, choose the first one
>>>           - if it fails also, maintain manual connection patch <---> file
>>>   For that to work we need an index of Python source code directory tree.
>>>
>>
>> I don't think all this is necessary.  Once we have the list of file names
>> extracted from the patches, it's enough to search for keyword or module
>> name to find all the related issues.
>> For example if you try searching for 'json' on
>> http://wolfprojects.altervista.org/issues.html you will find all the
>> json-related files, including the ones in the Python package, the C
>> acceleration module, the documentation, the tests, and even files like
>> "doc\json.rst" that don't exist in the Python codebase or got renamed at
>> some point.
>>
>
> Good user story. I've added C acceleration module to the stdlib.json
> description. My goal is to make patch classification an automatic process.
>

What do you mean exactly with "classification"?  If the goal is being able
to find all the patches related to a module, the search I implemented
already does that without any (manually maintained) mapping.


> In the long run there won't be any patches in tracker that are hard to
> apply. So "doc\json.rst" will gone. There should be a place to list all
> those unknown patches though, so that people can take and work on them.
>
>  And add % of recognized patches.
>>> With manual classification (triaging) it is possible to keep this
>>> per-cent at 100.
>>>
>>
>> Trying to establish a mapping between the patches and the actual files is
>> both cumbersome (especially if requires manual classification) and might
>> end up missing some of the patches if they specify an incorrect path, add
>> new files, or affected files that got renamed or deleted.
>>
>
> Mapping is just a table with is a temporary association. It may be not
> necessary if there will be a separate page with all kinds of incorrect
> patches.
>

If the patches have a name that makes sense (e.g. the module name without
the path, or the module name with some addition like '-new') the search
will include them anyway.  If they don't and get "lost" is not a big deal
IMHO, they might still be found through regular search.


>
>
>> I'm considering adding a way to search for modules to the tracker, in a
>> way similar to what I did on
>> http://wolfprojects.altervista.org/issues.html.  The tracker has direct
>> access to the files and issues so analyzing the patches and keep the
>> database updated as new patches are attached should be easier.  Once we
>> have the equivalent of files.json (maybe in a db table), it's just a matter
>> to add a search form.
>>
>
> Sounds good as a temporary solution. I'd still prefer automatic
> classification rules, but considering the fact that people can not remove
> their patches, a search form can be the only viable solution.
>

What should this automatic classification do?  "tag" the issue with the
module name(s)?  If this is what you want to do and the purpose of the tag
is just to search for issues with patches about that module, then -- as I
said -- it's already doable with the normal search I proposed.

Best Regards,
Ezio Melotti
_______________________________________________
Tracker-discuss mailing list
Tracker-discuss@python.org
http://mail.python.org/mailman/listinfo/tracker-discuss

Reply via email to