Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
0257877
docs(institution): adicionar docstrings de legado e mapeamento de uso…
robertatakenaka Jan 30, 2026
705e99e
refactor(organization): remover inst_type redundante em choices.py
robertatakenaka Jan 30, 2026
eb53cf4
feat(organization): refatorar modelos v2, adicionar status de dados e…
robertatakenaka Jan 30, 2026
5bb8309
Cria definição de SOURCE_CHOICES e DATA_STATUS_CHOICES
robertatakenaka Jan 30, 2026
b562cd3
refactor(organization): remove deprecated DATA_STATUS_CHOICES
robertatakenaka Jan 30, 2026
4d4c094
feat(core): add HTML/XML tag cleaning utilities to standardizer
robertatakenaka Jan 30, 2026
c6a4845
feat(organization): implement v3 models with RawOrganization and Orga…
robertatakenaka Jan 30, 2026
43ee9a9
Corrige defeitos encontrados ao executar a migração e cria a migração…
robertatakenaka Jan 30, 2026
622ce38
feat(core): adiciona classe abstrata BaseDateRange para gestão de per…
robertatakenaka Jan 31, 2026
5fae7ac
feat(organization): define lista global de papéis organizacionais (OR…
robertatakenaka Jan 31, 2026
c0b9c32
feat(organization): implementa BaseOrganizationRole para vincular ins…
robertatakenaka Jan 31, 2026
49a7a92
refactor(institution): melhora propriedades de acesso a nomes e datas…
robertatakenaka Jan 31, 2026
ed0106c
feat(collection): adiciona papéis específicos para organizações de co…
robertatakenaka Jan 31, 2026
98d4dc8
refactor(collection): substitui modelos de suporte/execução por Colle…
robertatakenaka Jan 31, 2026
5577f16
api(collection): atualiza serializer para novo modelo de organizações…
robertatakenaka Jan 31, 2026
5120139
refactor(journal): implementa JournalOrganization e converte TextFiel…
robertatakenaka Jan 31, 2026
6681bb5
api(journal): novo JournalOrganizationSerializer e lógica de fallback…
robertatakenaka Jan 31, 2026
4bc2a5f
refactor(sources): atualiza integração ArticleMeta para usar novo mod…
robertatakenaka Jan 31, 2026
0ffb28d
cleanup(journal): remove task de carregamento do site clássico
robertatakenaka Jan 31, 2026
d8fbd6c
cleanup(journal): remove script auxiliar de carga do site clássico
robertatakenaka Jan 31, 2026
50c3166
cleanup(journal): remove módulo de extração de dados do site clássico
robertatakenaka Jan 31, 2026
7c38bad
Add Celery tasks for organization history migration
robertatakenaka Jan 31, 2026
773af41
Adiciona as migracoes collection e journal
robertatakenaka Jan 31, 2026
be13dd5
feat(utils): add has_only_alpha_or_space and update clean_xml_tag_con…
robertatakenaka Jan 31, 2026
bcd79b0
refactor(journal): rename start/end date fields to initial/final_date…
robertatakenaka Jan 31, 2026
27ff8b2
refactor(org): update RawOrganization relationships and optimize look…
robertatakenaka Jan 31, 2026
78b8da7
Adiciona migracao
robertatakenaka Feb 1, 2026
bfea2f5
feat(journal): add update_journal_organizations for AM data processing
robertatakenaka Feb 1, 2026
761e01f
feat(journal): add task for bulk organization update and fix task log…
robertatakenaka Feb 1, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions collection/api/v1/serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,19 @@ def get_logo_url(self, obj):
return None


class CollectionOrganizationSerializer(serializers.ModelSerializer):
"""Serializer para organizações relacionadas à coleção"""
organization = OrganizationSerializer(read_only=True, many=False)

class Meta:
model = models.CollectionOrganization
fields = [
"organization",
"role",
"initial_date",
"final_date",
]


class SupportingOrganizationSerializer(serializers.ModelSerializer):
"""Serializer para organizações de suporte"""
Expand Down Expand Up @@ -96,8 +109,11 @@ class CollectionSerializer(serializers.ModelSerializer):
# Campos relacionados (read-only por padrão)
collection_names = CollectionNameSerializer(source='collection_name', many=True, read_only=True)
logos = CollectionLogoSerializer(many=True, read_only=True)
# FIXME - deprecated - usar CollectionOrganizationSerializer
supporting_organizations = SupportingOrganizationSerializer(source='supporting_organization', many=True, read_only=True)
executing_organizations = ExecutingOrganizationSerializer(source='executing_organization', many=True, read_only=True)
# FIXME - deprecated - usar CollectionOrganizationSerializer
Comment on lines +112 to +115
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on line 112 says "FIXME - deprecated - usar CollectionOrganizationSerializer" but it's placed above the supporting_organizations and executing_organizations fields which are actually deprecated. Line 116 then says the same thing above the organizations field, which is the NEW field. The comment on line 112 should be moved to line 116, or line 112's comment should say these fields are deprecated, not that CollectionOrganizationSerializer is deprecated.

Suggested change
# FIXME - deprecated - usar CollectionOrganizationSerializer
supporting_organizations = SupportingOrganizationSerializer(source='supporting_organization', many=True, read_only=True)
executing_organizations = ExecutingOrganizationSerializer(source='executing_organization', many=True, read_only=True)
# FIXME - deprecated - usar CollectionOrganizationSerializer
# FIXME - deprecated fields - usar organizations/CollectionOrganizationSerializer
supporting_organizations = SupportingOrganizationSerializer(source='supporting_organization', many=True, read_only=True)
executing_organizations = ExecutingOrganizationSerializer(source='executing_organization', many=True, read_only=True)
# Organizações relacionadas à coleção (campo recomendado)

Copilot uses AI. Check for mistakes.
organizations = CollectionOrganizationSerializer(source='organizations', many=True, read_only=True)
social_networks = SocialNetworkSerializer(source='social_network', many=True, read_only=True)

class Meta:
Expand All @@ -118,6 +134,9 @@ class Meta:
# Campos relacionados
"collection_names",
"logos",
"organizations",

# FIXME - deprecated - usar CollectionOrganizationSerializer
"supporting_organizations",
"executing_organizations",
"social_networks",
Expand Down
7 changes: 7 additions & 0 deletions collection/choices.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,11 @@
("classic", _("Classic")),
("new", _("New")),
("migrating", _("Migrating")),
]

COLLECTION_ORGANIZATION_ROLES = [
("sponsor", _("Sponsor")),
("funder", _("Funder")),
("partner", _("Partner")),
("coordination", _("Coordination")),
]
94 changes: 94 additions & 0 deletions collection/migrations/0008_collectionorganization.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Generated by Django 5.2.7 on 2026-01-31 21:41

import django.db.models.deletion
import modelcluster.fields
from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
("collection", "0007_collection_platform_status"),
("organization", "0011_organizationallevel_raworganization_and_more"),
]

operations = [
migrations.CreateModel(
name="CollectionOrganization",
fields=[
(
"id",
models.BigAutoField(
auto_created=True,
primary_key=True,
serialize=False,
verbose_name="ID",
),
),
(
"sort_order",
models.IntegerField(blank=True, editable=False, null=True),
),
(
"initial_date",
models.CharField(
blank=True,
max_length=10,
null=True,
verbose_name="Initial Date",
),
),
(
"final_date",
models.CharField(
blank=True, max_length=10, null=True, verbose_name="Final Date"
),
),
(
"role",
models.CharField(
choices=[
("coordinator", "Coordinator"),
("owner", "Owner"),
("publisher", "Publisher"),
("sponsor", "Sponsor"),
("copyright_holder", "Copyright Holder"),
("partner", "Partner"),
("funder", "Funder"),
("host", "Host"),
("provider", "Provider"),
("company", "Company"),
],
max_length=50,
verbose_name="Role",
),
),
(
"collection",
modelcluster.fields.ParentalKey(
null=True,
on_delete=django.db.models.deletion.SET_NULL,
related_name="organizations",
to="collection.collection",
),
),
(
"organization",
models.ForeignKey(
blank=True,
help_text="Select the standardized organization data",
null=True,
on_delete=django.db.models.deletion.SET_NULL,
to="organization.organization",
),
),
],
options={
"verbose_name": "Collection Organization",
"verbose_name_plural": "Collection Organizations",
"unique_together": {
("collection", "organization", "role", "initial_date", "final_date")
},
},
),
]
33 changes: 23 additions & 10 deletions collection/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
TextWithLang,
)
from core.utils.utils import fetch_data
from organization.models import HELP_TEXT_ORGANIZATION, Organization
from organization.models import HELP_TEXT_ORGANIZATION, Organization, BaseOrganizationRole

from . import choices

Expand Down Expand Up @@ -116,11 +116,9 @@ def autocomplete_label(self):
logo_panels = [
InlinePanel("logos", label=_("Logos"), min_num=0),
]
supporting_organization_panels = [
InlinePanel("supporting_organization", label=_("Supporting Organization")),
]
executing_organization_panels = [
InlinePanel("executing_organization", label=_("Executing Organization")),

organization_panels = [
InlinePanel("organizations", label=_("Organizations")),
]

social_network_panels = [
Expand All @@ -136,10 +134,7 @@ def autocomplete_label(self):
),
ObjectList(logo_panels, heading=_("Logos")),
ObjectList(
supporting_organization_panels, heading=_("Supporting Organizations")
),
ObjectList(
executing_organization_panels, heading=_("Executing Organization")
organization_panels, heading=_("Organizations")
),
ObjectList(social_network_panels, heading=_("Social networks")),
]
Expand Down Expand Up @@ -330,6 +325,24 @@ class CollectionSocialNetwork(Orderable, SocialNetwork):
)


class CollectionOrganization(BaseOrganizationRole, Orderable):
# substitui CollectionSupportingOrganization e CollectionExecutingOrganization
collection = ParentalKey(
Collection,
on_delete=models.SET_NULL,
null=True,
related_name="organizations",
)
panels = BaseOrganizationRole.panels

class Meta:
verbose_name = _("Collection Organization")
verbose_name_plural = _("Collection Organizations")
unique_together = [
("collection", "organization", "role", "initial_date", "final_date"),
]
Comment on lines +341 to +343
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unique_together constraint at line 342 includes both initial_date and final_date which are defined as CharField in BaseDateRange. Since these are string fields, the constraint might not work correctly for date comparisons (e.g., "2023-01-01" vs "2023-1-1" would be considered different). Consider using DateField or ensuring consistent date formatting.

Copilot uses AI. Check for mistakes.


class CollectionSupportingOrganization(Orderable, ClusterableModel, BaseHistory):
collection = ParentalKey(
Collection,
Expand Down
35 changes: 35 additions & 0 deletions core/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -629,6 +629,41 @@ class BaseHistory(models.Model):
class Meta:
abstract = True

@property
def initial_date_isoformat(self):
if self.initial_date:
return self.initial_date.isoformat()
return None

@property
def final_date_isoformat(self):
if self.final_date:
return self.final_date.isoformat()
return None


class BaseDateRange(models.Model):
initial_date = models.CharField(_("Initial Date"), max_length=10, null=True, blank=True)
final_date = models.CharField(_("Final Date"), max_length=10, null=True, blank=True)

panels = [
FieldPanel("initial_date"),
FieldPanel("final_date"),
]
Comment on lines +645 to +652
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BaseDateRange model uses CharField for date fields (initial_date and final_date) at lines 646-647. This is inconsistent with the BaseHistory model which uses proper DateField. Using CharField for dates makes validation, querying, and date arithmetic difficult. Consider using DateField instead, or document why string storage is required.

Copilot uses AI. Check for mistakes.

class Meta:
abstract = True

@property
def range(self):
if self.initial_date and self.final_date:
return f"{self.initial_date} - {self.final_date}"
elif self.initial_date:
return f"from {self.initial_date}"
elif self.final_date:
return f"until {self.final_date}"
return None

Comment on lines +645 to +666
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a systemic naming inconsistency throughout the codebase. BaseDateRange defines initial_date and final_date fields, but all the journal organization methods use start_date and end_date parameters. This will cause multiple failures:

  1. JournalOrganization.add_organization tries to set start_date/end_date which don't exist
  2. API serializer tries to access end_date in is_current method
  3. Migration creates fields as initial_date/final_date but code references start_date/end_date

This needs to be fixed consistently across all files. Recommended approach: Either rename BaseDateRange fields to start_date/end_date (preferred for modern naming), OR change all method parameters and API references to use initial_date/final_date to match the model.

Copilot uses AI. Check for mistakes.

class BaseLogo(models.Model):
"""
Expand Down
39 changes: 39 additions & 0 deletions core/utils/standardizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,45 @@ def remove_extra_spaces(text):
# Padroniza a quantidade de espaços
return " ".join(text.split())

def remove_html_tags(text):
if not text:
return text
text = text.replace("<", "BREAKTAG<")
text = text.replace(">", ">BREAKTAG")
for part in text.split("BREAKTAG"):
if part.startswith("<") and part.endswith(">"):
continue
if part.startswith("<"):
continue
if part.endswith(">"):
continue
yield part


def has_only_alpha_or_space(text):
""" Verifica se o conteúdo do texto é válido como string, ou seja,
não é vazio e não contém números. """
if not text:
return False
parts = text.split()
for part in parts:
if not part.isalpha():
return False
return True
Comment on lines +31 to +40
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The has_only_alpha_or_space() function on line 31 returns False for text containing numbers, but the function name suggests it should return True if the text has "only alpha or space". The name is misleading - it should be named something like is_alphabetic_text() or has_no_numbers() to better reflect its behavior.

Copilot uses AI. Check for mistakes.


def clean_xml_tag_content(text, assert_string=True):
if not text:
return text
text = "".join(remove_html_tags(text))
text_ = remove_extra_spaces(text)
if assert_string:
if has_only_alpha_or_space(text_):
return text_
else:
return None
Comment on lines +49 to +52
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function clean_xml_tag_content() uses assert_string parameter with a default of True, which causes it to return None when the text contains numbers (line 52). This behavior could lead to silent data loss when processing organization names that legitimately contain numbers (e.g., "Lab 3", "Building 42"). Consider whether this validation is appropriate for all use cases, or if it should be optional/configurable.

Suggested change
if has_only_alpha_or_space(text_):
return text_
else:
return None
if not has_only_alpha_or_space(text_):
logging.warning(
"clean_xml_tag_content: text failed alpha/space assertion; preserving value: %r",
text_,
)

Copilot uses AI. Check for mistakes.
return text_
Comment on lines 43 to 53
Copy link

Copilot AI Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clean_xml_tag_content function has a logic issue. When assert_string=True, it returns None if the text contains any non-alphabetic characters (line 39). This means valid organization names with numbers, spaces, or special characters (e.g., "USP 2023", "Fiocruz-RJ") will be discarded. Consider removing the isalpha() check or making it more lenient to allow alphanumeric text with common punctuation.

Copilot uses AI. Check for mistakes.
Comment on lines +43 to +53
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clean_xml_tag_content function has an assert_string parameter that when True will return None if the text contains numbers. This behavior could silently discard valid organization names that contain numbers (e.g., "Lab 21", "Unit 3", "Building 5A"). Consider if this validation is too strict for organization names, or document why numeric content should be rejected.

Copilot uses AI. Check for mistakes.


def standardize_code_and_name(original):
"""
Expand Down
Loading
Loading