Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 11 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,13 @@ This is a Python library for WhereScape RED, a data warehouse automation tool. T
- `WhereScapeLogHandler` buffers logs and outputs them with WhereScape-specific exit codes on flush
- Exit codes: `1` (success), `-1` (warnings), `-2` (errors), `-3` (critical)
- Logs to both console (for WhereScape) and rotating file handler (Saturday night rotation)
- Must be initialized via `initialise_wherescape_logging(wherescape_instance)`
- Automatically initialized when WhereScape instance is created (in `WhereScape.__init__()`)
- Sets up unhandled exception logging

**helper_functions.py** - Shared utilities:
- `prepare_metadata_query()`: Generates SQL to create/update load table column metadata in WhereScape repository
- `create_column_names()`: Slugifies display names to valid column names (max 59 chars)
- `create_legacy_column_names()`: Legacy version that appends numbers to all columns (preserved for backward compatibility with existing tables)
- `flatten_json()`: Flattens nested JSON responses from APIs
- `filter_dict()` and `fill_out_empty_keys()`: Clean and normalize API responses

Expand Down Expand Up @@ -58,6 +59,13 @@ All connectors follow a consistent pattern with three components:
- **hubspot**: Companies, contacts, deals, tickets, engagements (supports multiple environments)
- **jira**: Projects and issues (full and incremental loads)

**Note:** The HubSpot connector has a unique structure that deviates from the standard three-file pattern:
- `collect_data.py` - Main entry point (replaces standard `{source}_load_data.py`)
- `process_data.py` - Processes and sends data to HubSpot (bi-directional sync)
- `ticket_updates.py` - Specialized operations (merge tickets, fix company associations)
- `utils.py` - Shared utilities for HubSpot operations
- Supports bidirectional sync (reading from WhereScape, writing back to HubSpot)

### Validators

**validators/fact_dimension_join.py** - Data quality validation:
Expand Down Expand Up @@ -125,7 +133,7 @@ All environment variables start with `WSL_` prefix:

### Running Tests

There is no formal test suite. The `test.py` file in the root can be used for ad-hoc testing with a local environment setup.
There is no formal test suite. Individual connectors may have test files (e.g., `anythingllm_test.py`) for ad-hoc testing with a local environment setup.

### Code Formatting and Linting

Expand All @@ -142,7 +150,7 @@ ruff format .
```

Configuration details:
- Target: Python 3.12
- Target: Python 3.14
- Line length: 119 characters
- Enabled rules: pycodestyle (E/W), pyflakes (F), isort (I), pep8-naming (N), flake8-bugbear (B), flake8-comprehensions (C4), flake8-simplify (SIM), pyupgrade (UP)
- See [pyproject.toml](pyproject.toml) for complete configuration
Expand Down
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.ruff]
# Target Python 3.12
target-version = "py312"
# Target Python 3.14
target-version = "py314"

# Set line length to match common Python conventions
line-length = 119
Expand Down
2 changes: 1 addition & 1 deletion requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
# Install with: pip install -r requirements-dev.txt

# Code linting and formatting
ruff>=0.14.0,<0.15.0
ruff>=0.15.1,<0.16.0
12 changes: 6 additions & 6 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
hubspot-api-client==8.2.1
notion-client==2.2.1
numpy==1.26.4
pandas==1.3.4
pyodbc==5.1.0
hubspot-api-client==12.0.0
notion-client==3.0.0
numpy==2.4.2
pandas==3.0.0
pyodbc==5.3.0
python-slugify==8.0.4
requests==2.32.3
requests==2.32.5
5 changes: 2 additions & 3 deletions validators/fact_dimension_join.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""Module with function to validate fact-dimension joins."""

import csv
import logging
import os
Expand All @@ -23,9 +24,7 @@ def check_fact_dimension_join(output_file_location=""):
wherescape = WhereScape()

start_time = datetime.now()
logging.info(
f"Start time: {start_time.strftime('%Y-%m-%d %H:%M:%S')} for check_fact_dimension_join"
)
logging.info(f"Start time: {start_time.strftime('%Y-%m-%d %H:%M:%S')} for check_fact_dimension_join")

date = datetime.now().strftime("%Y-%m-%d")

Expand Down
12 changes: 3 additions & 9 deletions wherescape/connectors/anythingllm/anythingllm_create_metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,7 @@
def anythingllm_create_metadata():
"""Create metadata for AnythingLLM chats load table."""
start_time = datetime.now()
logging.info(
f"Start time: {start_time.strftime('%Y-%m-%d %H:%M:%S')} for anythingllm_create_metadata"
)
logging.info(f"Start time: {start_time.strftime('%Y-%m-%d %H:%M:%S')} for anythingllm_create_metadata")

# Initialise WhereScape (logging is initialised through WhereScape object)
wherescape = WhereScape()
Expand Down Expand Up @@ -119,12 +117,8 @@ def anythingllm_create_metadata():

# Execute the SQL
wherescape.push_to_meta(sql)
wherescape.main_message = (
f"Created {len(columns) + 2} columns in metadata table for embed {embed_uuid}"
)
wherescape.main_message = f"Created {len(columns) + 2} columns in metadata table for embed {embed_uuid}"

# Final logging
end_time = datetime.now()
logging.info(
f"Time elapsed: {(end_time - start_time).seconds} seconds for anythingllm_create_metadata"
)
logging.info(f"Time elapsed: {(end_time - start_time).seconds} seconds for anythingllm_create_metadata")
12 changes: 3 additions & 9 deletions wherescape/connectors/anythingllm/anythingllm_load_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,7 @@ def anythingllm_load_data_chats():
# First initialise WhereScape to setup logging
logging.info("Connecting to WhereScape")
wherescape_instance = WhereScape()
logging.info(
f"Start time: {start_time.strftime('%Y-%m-%d %H:%M:%S')} for anythingllm_load_data_chats"
)
logging.info(f"Start time: {start_time.strftime('%Y-%m-%d %H:%M:%S')} for anythingllm_load_data_chats")

# Get the relevant values from WhereScape
api_key = os.getenv("WSL_SRCCFG_APIKEY")
Expand Down Expand Up @@ -91,15 +89,11 @@ def anythingllm_load_data_chats():
logging.info(f"Successfully inserted {len(rows)} rows in to the load table.")

# Add success message
wherescape_instance.main_message = (
f"Successfully inserted {len(rows)} rows in to the load table."
)
wherescape_instance.main_message = f"Successfully inserted {len(rows)} rows in to the load table."

else:
logging.info("No object changes received from AnythingLLM")

# Final logging
end_time = datetime.now()
logging.info(
f"Time elapsed: {(end_time - start_time).seconds} seconds for anythingllm_load_data_chats"
)
logging.info(f"Time elapsed: {(end_time - start_time).seconds} seconds for anythingllm_load_data_chats")
63 changes: 38 additions & 25 deletions wherescape/connectors/anythingllm/anythingllm_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,27 +142,25 @@ def test_flatten_chat():
"embed_id": 456,
"usersId": 789,
"createdAt": "2024-01-15T10:30:00Z",
"response": json.dumps({
"text": "The weather is sunny today.",
"type": "text",
"attachments": ["file1.pdf", "file2.png"],
"sources": [
{"title": "Weather Report", "url": "https://example.com/weather"},
{"title": "Climate Data", "url": "https://example.com/climate"}
],
"metrics": {
"completion_tokens": 15,
"prompt_tokens": 8,
"total_tokens": 23,
"outputTps": 12.5,
"duration": 1200
"response": json.dumps(
{
"text": "The weather is sunny today.",
"type": "text",
"attachments": ["file1.pdf", "file2.png"],
"sources": [
{"title": "Weather Report", "url": "https://example.com/weather"},
{"title": "Climate Data", "url": "https://example.com/climate"},
],
"metrics": {
"completion_tokens": 15,
"prompt_tokens": 8,
"total_tokens": 23,
"outputTps": 12.5,
"duration": 1200,
},
}
}),
"connection_information": json.dumps({
"host": "example.com",
"ip": "192.168.1.100",
"username": "testuser"
})
),
"connection_information": json.dumps({"host": "example.com", "ip": "192.168.1.100", "username": "testuser"}),
}

try:
Expand All @@ -172,11 +170,26 @@ def test_flatten_chat():

# Verify all expected fields are present
expected_fields = [
"id", "prompt", "session_id", "include", "embed_id", "user_id", "created_at",
"response_text", "response_type", "response_attachments", "response_sources",
"response_sources_count", "metrics_completion_tokens", "metrics_prompt_tokens",
"metrics_total_tokens", "metrics_output_tps", "metrics_duration",
"connection_host", "connection_ip", "connection_username"
"id",
"prompt",
"session_id",
"include",
"embed_id",
"user_id",
"created_at",
"response_text",
"response_type",
"response_attachments",
"response_sources",
"response_sources_count",
"metrics_completion_tokens",
"metrics_prompt_tokens",
"metrics_total_tokens",
"metrics_output_tps",
"metrics_duration",
"connection_host",
"connection_ip",
"connection_username",
]

logging.info(f"Fields in result: {len(result)}")
Expand Down
4 changes: 2 additions & 2 deletions wherescape/connectors/anythingllm/anythingllm_wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ def _flatten_chat(chat):
"metrics_output_tps": response_obj.get("metrics", {}).get("outputTps"),
"metrics_duration": response_obj.get("metrics", {}).get("duration"),
}
except (json.JSONDecodeError, TypeError):
except json.JSONDecodeError, TypeError:
# If response is not valid JSON, store as-is
response_data = {
"response_text": chat.get("response", ""),
Expand All @@ -114,7 +114,7 @@ def _flatten_chat(chat):
"connection_ip": conn_obj.get("ip"),
"connection_username": conn_obj.get("username"),
}
except (json.JSONDecodeError, TypeError):
except json.JSONDecodeError, TypeError:
connection_data = {
"connection_host": None,
"connection_ip": None,
Expand Down
1 change: 1 addition & 0 deletions wherescape/connectors/gitlab/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""
Module that takes care of the connection to Gitlab.
"""

from .gitlab_wrapper import Gitlab # noqa: E402
30 changes: 10 additions & 20 deletions wherescape/connectors/gitlab/gitlab_wrapper.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
"""Module to fetch data (e.g. tickets, projects, pipelines) from the Gitlab API"""
import requests

import logging

import requests

from ...helper_functions import fill_out_empty_keys, filter_dict, flatten_json
from .gitlab_data_types_column_names import COLUMN_NAMES_AND_DATA_TYPES
from ...helper_functions import flatten_json, filter_dict, fill_out_empty_keys


class Gitlab:
Expand Down Expand Up @@ -88,9 +90,7 @@ def paginate_through_resource(
break

if response.status_code == 404:
logging.info(
f"{resource_api}\n Resource not found."
)
logging.info(f"{resource_api}\n Resource not found.")
break

response.raise_for_status()
Expand Down Expand Up @@ -123,13 +123,9 @@ def get_projects(self):
keys_to_keep = COLUMN_NAMES_AND_DATA_TYPES["projects"].keys()
resource_api = "projects"

params = {
"order_by": "id"
}
params = {"order_by": "id"}

all_projects = self.paginate_through_resource(
resource_api, keys_to_keep, params
)
all_projects = self.paginate_through_resource(resource_api, keys_to_keep, params)
return all_projects

def get_release_tags(self):
Expand Down Expand Up @@ -185,9 +181,7 @@ def get_issues(self):
project_id = project[0]
resource_api = f"projects/{project_id}/issues"

project_issues = self.paginate_through_resource(
resource_api, keys_to_keep, params
)
project_issues = self.paginate_through_resource(resource_api, keys_to_keep, params)
all_issues.extend(project_issues)

return all_issues
Expand All @@ -211,9 +205,7 @@ def get_pipelines(self):
for project in self.projects:
project_id = project[0]
resource_api = f"projects/{project_id}/pipelines"
project_pipelines = self.paginate_through_resource(
resource_api, keys_to_keep, params
)
project_pipelines = self.paginate_through_resource(resource_api, keys_to_keep, params)
all_pipelines.extend(project_pipelines)

return all_pipelines
Expand All @@ -237,9 +229,7 @@ def get_merge_requests(self):
for project in self.projects:
project_id = project[0]
resource_api = f"projects/{project_id}/merge_requests"
project_merge_requests = self.paginate_through_resource(
resource_api, keys_to_keep, params
)
project_merge_requests = self.paginate_through_resource(resource_api, keys_to_keep, params)
all_merge_requests.extend(project_merge_requests)

return all_merge_requests
Expand Down
6 changes: 3 additions & 3 deletions wherescape/connectors/gitlab/python_gitlab_create_metadata.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
from datetime import datetime
import logging
from datetime import datetime

from .gitlab_data_types_column_names import COLUMN_NAMES_AND_DATA_TYPES
from ... import WhereScape
from ...helper_functions import (
prepare_metadata_query,
create_column_names,
create_display_names,
prepare_metadata_query,
)
from .gitlab_data_types_column_names import COLUMN_NAMES_AND_DATA_TYPES


def gitlab_create_metadata_smart():
Expand Down
22 changes: 6 additions & 16 deletions wherescape/connectors/gitlab/python_gitlab_high_water_mark.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from datetime import datetime
import logging
from datetime import datetime

from ... import WhereScape

Expand All @@ -14,17 +14,11 @@ def gitlab_next_high_water_mark():
"""
wherescape_instance = WhereScape()
next_high_water_mark = datetime.today().isoformat(timespec="seconds")
current_high_water_mark = wherescape_instance.read_parameter(
"gitlab_high_water_mark"
)
current_high_water_mark = wherescape_instance.read_parameter("gitlab_high_water_mark")
logging.info(f"Current high water mark is {current_high_water_mark}")
logging.info(f"Next high water mark will be {next_high_water_mark}")
wherescape_instance.main_message = (
f"Next high water mark will be {next_high_water_mark}"
)
wherescape_instance.write_parameter(
"gitlab_high_water_mark_next", next_high_water_mark
)
wherescape_instance.main_message = f"Next high water mark will be {next_high_water_mark}"
wherescape_instance.write_parameter("gitlab_high_water_mark_next", next_high_water_mark)


def gitlab_update_high_water_mark():
Expand All @@ -35,11 +29,7 @@ def gitlab_update_high_water_mark():
"""
wherescape_instance = WhereScape()

next_high_water_mark = wherescape_instance.read_parameter(
"gitlab_high_water_mark_next"
)
next_high_water_mark = wherescape_instance.read_parameter("gitlab_high_water_mark_next")
wherescape_instance.write_parameter("gitlab_high_water_mark", next_high_water_mark)
wherescape_instance.main_message = (
f"High water mark is set to {next_high_water_mark}"
)
wherescape_instance.main_message = f"High water mark is set to {next_high_water_mark}"
logging.info(f"High water mark is set to {next_high_water_mark}")
4 changes: 1 addition & 3 deletions wherescape/connectors/gitlab/python_gitlab_load_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,9 +121,7 @@ def gitlab_load_data(wherescape_instance, load_type, is_legacy=False):
# Execute the sql
wherescape_instance.push_many_to_target(sql, rows)
logging.info(f"{len(rows)} rows successfully inserted in {table_name}")
wherescape_instance.main_message = (
f"{load_type.capitalize()} successfully loaded {len(rows)} rows"
)
wherescape_instance.main_message = f"{load_type.capitalize()} successfully loaded {len(rows)} rows"
else:
logging.info(f"No modified values found for {load_type.capitalize()}")
wherescape_instance.main_message = f"No modified values found for {load_type.capitalize()}"
Expand Down
Loading