From Git Log to Insights: Evaluating Team Contributions in GitHub Projects
In today’s fast-paced software development world, understanding team dynamics and individual contributions is crucial for project management and team growth. This article will guide you through a process of extracting GitHub commit data and transforming it into actionable insights using Python.
Step 1: Extracting Git Log Data
First, we’ll use Git’s command-line interface to extract comprehensive commit data. Open your terminal, navigate to your project directory, and run:
git log --date=format:'%Y-%m-%d %H:%M:%S' --pretty=format:"%h,%an,%ad,%s" --numstat --stat --pretty=format:"%h,%an,%ad,%s" --summary > all_commits_with_stats.txt
This command outputs a detailed log of all commits, including hash, author, date, subject, and file changes, saving it to a text file.
Step 2: Processing the Data with Python
Next, we’ll create a Python script to transform this raw data into a structured CSV format. Here’s the script:
import csv
import re
import chardet
def detect_encoding(file_path):
with open(file_path, 'rb') as file:
raw_data = file.read()
result = chardet.detect(raw_data)
return result['encoding']
def process_git_log(input_file, output_file):
encoding = detect_encoding(input_file)
print(f"Detected encoding: {encoding}")
with open(input_file, 'r', encoding=encoding, errors='replace') as f:
lines = f.readlines()
commits = []
current_commit = None
for line in lines:
line = line.strip()
if line.count(',') == 3: # This is a commit line
if current_commit:
commits.append(current_commit)
hash, author, date, subject = line.split(',', 3)
current_commit = {
'hash': hash,
'author': author,
'date': date,
'subject': subject,
'files_changed': 0,
'insertions': 0,
'deletions': 0,
'file_changes': []
}
elif line and current_commit:
# This is a file change line
match = re.match(r'(\d+)\s+(\d+)\s+(.+)', line)
if match:
insertions, deletions, filename = match.groups()
current_commit['files_changed'] += 1
current_commit['insertions'] += int(insertions)
current_commit['deletions'] += int(deletions)
current_commit['file_changes'].append({
'filename': filename,
'insertions': int(insertions),
'deletions': int(deletions)
})
if current_commit:
commits.append(current_commit)
with open(output_file, 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(['Hash', 'Author', 'Date', 'Subject', 'Files Changed', 'Insertions', 'Deletions', 'File Changes'])
for commit in commits:
writer.writerow([
commit['hash'],
commit['author'],
commit['date'],
commit['subject'],
commit['files_changed'],
commit['insertions'],
commit['deletions'],
'; '.join([f"{c['filename']} (+{c['insertions']}, -{c['deletions']})" for c in commit['file_changes']])
])
# Usage
input_file = 'all_commits_with_stats.txt'
output_file = 'git_log_processed.csv'
process_git_log(input_file, output_file)
print(f"Processed Git log has been saved to {output_file}")
This script reads the text file, processes each commit, and outputs a structured CSV file.
Step 3: Analysing the Data
With our data now in CSV format, we can easily import it into data analysis tools like pandas for Python, or even spreadsheet applications like Microsoft Excel or Google Sheets.
Here are some insights you can derive:
1. Commit Frequency: Analyse the number of commits per author over time to understand work patterns.
2. Code Volume: Compare insertions and deletions to gauge the amount of code each team member contributes.
3. File Impact: Examine which files are changed most frequently and by whom.
4. Commit Subjects: Analyse commit messages to understand the type of work being done (e.g., bug fixes, feature additions, refactoring).
Conclusion
By following this process, you can transform raw Git log data into structured, analysable information. This approach provides valuable insights into team dynamics, individual contributions, and project progress.
Remember, while these metrics can be informative, they don’t tell the whole story of a developer’s contribution. Code quality, mentorship, and other non-quantifiable factors are equally important in evaluating team members’ overall impact.
Use these insights as a starting point for discussions about team efficiency, workload distribution, and areas for improvement in your development process.
Note: Ensure you have the necessary permissions before extracting and analysing team data, and always use such information responsibly and ethically.