From Git Log to Insights: Evaluating Team Contributions in GitHub Projects

Chan Meng
3 min readFeb 14, 2025

--

In today’s fast-paced software development world, understanding team dynamics and individual contributions is crucial for project management and team growth. This article will guide you through a process of extracting GitHub commit data and transforming it into actionable insights using Python.

Step 1: Extracting Git Log Data

First, we’ll use Git’s command-line interface to extract comprehensive commit data. Open your terminal, navigate to your project directory, and run:

git log --date=format:'%Y-%m-%d %H:%M:%S' --pretty=format:"%h,%an,%ad,%s" --numstat --stat --pretty=format:"%h,%an,%ad,%s" --summary > all_commits_with_stats.txt

This command outputs a detailed log of all commits, including hash, author, date, subject, and file changes, saving it to a text file.

Step 2: Processing the Data with Python

Next, we’ll create a Python script to transform this raw data into a structured CSV format. Here’s the script:

import csv

import re

import chardet

def detect_encoding(file_path):

with open(file_path, 'rb') as file:

raw_data = file.read()

result = chardet.detect(raw_data)

return result['encoding']

def process_git_log(input_file, output_file):

encoding = detect_encoding(input_file)

print(f"Detected encoding: {encoding}")

with open(input_file, 'r', encoding=encoding, errors='replace') as f:

lines = f.readlines()

commits = []

current_commit = None

for line in lines:

line = line.strip()

if line.count(',') == 3: # This is a commit line

if current_commit:

commits.append(current_commit)

hash, author, date, subject = line.split(',', 3)

current_commit = {

'hash': hash,

'author': author,

'date': date,

'subject': subject,

'files_changed': 0,

'insertions': 0,

'deletions': 0,

'file_changes': []

}

elif line and current_commit:

# This is a file change line

match = re.match(r'(\d+)\s+(\d+)\s+(.+)', line)

if match:

insertions, deletions, filename = match.groups()

current_commit['files_changed'] += 1

current_commit['insertions'] += int(insertions)

current_commit['deletions'] += int(deletions)

current_commit['file_changes'].append({

'filename': filename,

'insertions': int(insertions),

'deletions': int(deletions)

})

if current_commit:

commits.append(current_commit)

with open(output_file, 'w', newline='', encoding='utf-8') as f:

writer = csv.writer(f)

writer.writerow(['Hash', 'Author', 'Date', 'Subject', 'Files Changed', 'Insertions', 'Deletions', 'File Changes'])

for commit in commits:

writer.writerow([

commit['hash'],

commit['author'],

commit['date'],

commit['subject'],

commit['files_changed'],

commit['insertions'],

commit['deletions'],

'; '.join([f"{c['filename']} (+{c['insertions']}, -{c['deletions']})" for c in commit['file_changes']])

])

# Usage

input_file = 'all_commits_with_stats.txt'

output_file = 'git_log_processed.csv'

process_git_log(input_file, output_file)

print(f"Processed Git log has been saved to {output_file}")

This script reads the text file, processes each commit, and outputs a structured CSV file.

Step 3: Analysing the Data

With our data now in CSV format, we can easily import it into data analysis tools like pandas for Python, or even spreadsheet applications like Microsoft Excel or Google Sheets.

Here are some insights you can derive:

1. Commit Frequency: Analyse the number of commits per author over time to understand work patterns.

2. Code Volume: Compare insertions and deletions to gauge the amount of code each team member contributes.

3. File Impact: Examine which files are changed most frequently and by whom.

4. Commit Subjects: Analyse commit messages to understand the type of work being done (e.g., bug fixes, feature additions, refactoring).

Conclusion

By following this process, you can transform raw Git log data into structured, analysable information. This approach provides valuable insights into team dynamics, individual contributions, and project progress.

Remember, while these metrics can be informative, they don’t tell the whole story of a developer’s contribution. Code quality, mentorship, and other non-quantifiable factors are equally important in evaluating team members’ overall impact.

Use these insights as a starting point for discussions about team efficiency, workload distribution, and areas for improvement in your development process.

Note: Ensure you have the necessary permissions before extracting and analysing team data, and always use such information responsibly and ethically.

--

--

No responses yet