Skip to content

Add documents_doi.csv to tabs_collection export#87

Draft
Copilot wants to merge 3 commits intomasterfrom
copilot/add-documents-doi-csv
Draft

Add documents_doi.csv to tabs_collection export#87
Copilot wants to merge 3 commits intomasterfrom
copilot/add-documents-doi-csv

Conversation

Copy link
Contributor

Copilot AI commented Feb 4, 2026

Adds documents_doi.csv to tabs_collection.zip containing DOI/language pairs for each document. Each document can generate multiple rows (one per DOI/language pair).

Implementation

  • publication/documents_doi.py: New dumper implementing xylose's doi_and_lang logic

    • Extracts DOI/language pairs from v337 field
    • Validates DOI format with regex and auto-corrects swapped lang/DOI values
    • Falls back to original_language() then languages()[0] for main DOI
    • Skips documents without DOI information
  • publication/dumper.py: Integrated new dumper into main export loop

CSV Schema

"extraction date","study unit","collection","document publishing ID (PID SciELO)","document language","doi corresponding language"
"2026-02-04","document","scl","S0102-67202009000300001","pt","10.1590/S0102-67202009000300001"
"2026-02-04","document","scl","S0102-67202009000300001","en","10.1590/S0102-67202009000300002"

Key Logic

# Handles swapped values: checks if 'l' field contains DOI pattern
if len(DOI_REGEX.findall(lang)) == 1 and len(doi) == 2:
    lang, doi = doi, lang
Original prompt

This section details on the original issue you should resolve

<issue_title>Crie mais um csv para fazer parte do tabs_collection.zip: documents_doi.csv</issue_title>
<issue_description>Crie mais um csv para fazer parte do tabs_collection.zip: documents_doi.csv

Colunas:

  • extraction date
  • study unit
  • collection
  • document publishing ID (PID SciELO)
  • document language
  • doi corresponding language
# class Document
    @property
    def doi_and_lang(self):
        """
        This method retrieves the lang and DOI.
        """
        raw_doi = self.data.get('article', {}).get('v337')
        items = []
        for item in raw_doi or []:
            lang = item.get("l")
            doi = item.get("d")
            if lang and doi:
                if len(DOI_REGEX.findall(lang)) == 1 and len(doi) == 2:
                    lang, doi = doi, lang
                if len(DOI_REGEX.findall(doi)) == 1 and len(lang) == 2:
                    items.append((lang, doi))
        if self.doi:
            item = (self.original_language(), self.doi)
            if all(item) and item not in items:
                items.insert(0, item)
        return items

fonte: https://github.com/scieloorg/xylose/blob/573994f80705eb7f6d4928de69e079468172dda1/xylose/scielodocument.py#L2183C5-L2202C21
</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits February 4, 2026 13:43
Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
Copilot AI changed the title [WIP] Create documents_doi.csv for tabs_collection.zip Add documents_doi.csv to tabs_collection export Feb 4, 2026
Copilot AI requested a review from robertatakenaka February 4, 2026 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Crie mais um csv para fazer parte do tabs_collection.zip: documents_doi.csv

2 participants