Jonathan Corbet [Fri, 6 Apr 2012 22:00:04 +0000 (16:00 -0600)]
Add version tracking support and an "unknown hackers" report
Version tracking was used to see who had contributed to the most kernel
releases; not sure it's a long-term-useful feature. The unknown hackers
report helps when trying to improve the database.
Added first attempt for reporting by file type:
- A general report
- A report aggregated by file type and contributor
- A report aggregated by contributor and file type
Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
When some projects have migrated from Subversion to Git, there
were several tags that were treated as new commits, which shows
a change in the whole project (code added/removed) when nothing
really happened. For instance, in GNOME a lot svn tags were
catched during the migration, but not all of them.
svn tags in git repositories brings bad stats because double count
commits, and in project with a lot history it may may involve several thousands of source of lines of code.
Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
Two new dumps were added: per filetype and for every changeset.
It necessary to set a prefix where to dump the data in csv,
because it will be generated one csv file per file type.
Now it is possible to get statistics per code, documentation,
build scripts, translations, multimedia and developers
documentation. This feature is useful for repositories where
there are different types of file, rather than code.
The detailed information does not use the Aggregate parameter.
Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
Patches as well s Total* and Dates are counted only if the
changeset is not a merge. However, CSCount (ChangeSetCount)
was counting everything, which changes a bit the results.
Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
The class LogPatchSplitter provides an iterator per patch. This
makes the code cleaner, easier to read and more pythonic.
The class only gets each commit set as lines.
It is possible to test it separately by:
$ git log | python logparser.py | more
Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
In order to make cleaner the code, I created a function
that parses a numstat line, which is useful to determine
the modified filename, and to calculate lines added and
removed.
Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
Added option to get the stats from numstat instead of diff
The option --numstat of git log gives the statistics of
lines added and removed file. Hence, it is not necessary
to parser a raw diff.
Another benefit, it is a less verbose log to be processed,
which helps to process long logs. This also prepares the
code for counting the changes per file type.
Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
Python provides a module to handle csv files which is named
csv. Therefore, it is necessary to rename the csv.py to
avoid name conflicts when the module csv is used.
Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
Jonathan Corbet [Tue, 10 May 2011 20:32:47 +0000 (14:32 -0600)]
Add the VirtualEmployer mechanism
A certain obnoxious developer wants his contributions to be split between
two employers. So add the "VirtualEmployer" mechanism to make that
possible. A virtual employer is defined with:
VirtualEmployer ve-name
nn% real-name
...
end
(This construct must appear in the main configuration file). Developers
can be associated with the virtual employer in the usual way; at report
time, any changes credited to that employer will be split among the real
employers according to the percentages provided.
Made the CSV file aggregating data by weeks or months
When using -w option together with the -x file option, the data exported
in the CSV file are aggregated by weeks instead of months (the default).
This is useful to extract meaningful stats on short periods.
logs containing weird email addresses like this can be handled in the
aliases file to remap them to a correct address.
"joe.hacker@acme.org <Joe Hacker>
Added some documentation in the README file for this
Jonathan Corbet [Mon, 4 Oct 2010 23:06:33 +0000 (17:06 -0600)]
Make tag matching stricter
If you commit a git changelog to your repository, gitdm will be confused by
all the added patch tags. So make the patterns stricter to force them only
to match within the git log metadata - or so we hope. There is still room
for confusion here; we really need to make grabpatch() smart enough to
split metadata and the diff. Don't have time for that now.
This patch changes results slightly. In the 2.6.36 cycle, there's a tag
reading:
Original-Idea-and-Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Pre-patch gitdm would recognize that as a signoff; after the change it no
longer does.
Reported-by: Wolfgang Denk <wd@denx.de> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Tiago Vignatti [Wed, 30 Jun 2010 17:13:21 +0000 (20:13 +0300)]
Add option to get the configuration files from a given base directory
Instead boringly be replicating the directory base name where gitdm is
installed and write it on each option inside the configuration file, just send
it through the command line.
Martin Nordholts [Sun, 20 Dec 2009 08:41:07 +0000 (09:41 +0100)]
Move out global houskeeping from grabpatch()
As a step to make grabpatch() more unit-test friendly, move out global
houskeeping from grabpatch(). This also gets rid of a TODO in the
code. The regression tests still passes after this refactoring, of
course.
Signed-off-by: Martin Nordholts <martinn@src.gnome.org>
Martin Nordholts [Sat, 19 Dec 2009 14:59:34 +0000 (15:59 +0100)]
Add regression tests on gitdm output files
Add simple regression tests that makes sure there are no regressions
in the text output file and the date line count file. The primary
purpose of introducing this regression test is to allow us to safely
refactor the gitdm code.
Signed-off-by: Martin Nordholts <martinn@src.gnome.org>
gitdm: report issue when an email address is a "name"
This probably means an incorrect commit message, it also
means that if it is not fixed, the category for this person is probably
going to be incorrect.
Jonathan Corbet [Thu, 13 Nov 2008 16:13:25 +0000 (09:13 -0700)]
Better email address handling
Some people quote their names in various tags:
Something-done-by: "J Random Hacker" <...>
We kept the quotes with the name, confusing things down the road. So
change the patterns to exclude those quotes when they exist.
with chart here:
http://www.gnome.org/~michael/images/2008-09-29-kernel-active.png
caption being:
"Graph showing number and affiliation of active kernel developers
(contributing more than 100 lines per month). Quick affiliation key,
from bottom up: Unknown, No-Affiliation, IBM, RedHat, Novell, Intel ..."
These are as yet not published, I plan to use them as a comparison to
OO.o's somewhat mediocre equivalents; hope to go live with them soon
(and fix the horrible bugs in stacked area charts to make them actually
pretty ).
Jonathan Corbet [Fri, 5 Sep 2008 19:53:35 +0000 (13:53 -0600)]
Don't accept totally bogus dates
Yanmin Zhang committed a patch (09f2724a786f76475ef2985cf84f5359c553aade)
which claims to have been written in August, 2030. Code that bleeding-edge
makes gitdm confused, so pretend it's just normal, contemporary stuff.
Kir Kolyshkin [Mon, 7 Apr 2008 19:59:18 +0000 (23:59 +0400)]
gitdm: Report progress to stderr not stdout
When gitdm is used for generating text-only report with its output
redirected to a file, all is well aside from the clutter at the beginning
of that file -- a very long line with repeating "Grabbing changesets...".
Solve that by redirecting progress reporting to stderr. It also helps to
see the progress when you redirect gitdm output to a file.
Also, we don't have to flush stdout since stderr is unbuffered by default.
Signed-off-by: Kir Kolyshkin <kir@openvz.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>