Add documents_doi.csv to tabs_collection export by Copilot · Pull Request #87 · scieloorg/processing

Copilot · 2026-02-04T13:40:41Z

Adds documents_doi.csv to tabs_collection.zip containing DOI/language pairs for each document. Each document can generate multiple rows (one per DOI/language pair).

Implementation

publication/documents_doi.py: New dumper implementing xylose's doi_and_lang logic
- Extracts DOI/language pairs from v337 field
- Validates DOI format with regex and auto-corrects swapped lang/DOI values
- Falls back to original_language() then languages()[0] for main DOI
- Skips documents without DOI information
publication/dumper.py: Integrated new dumper into main export loop

CSV Schema

"extraction date","study unit","collection","document publishing ID (PID SciELO)","document language","doi corresponding language"
"2026-02-04","document","scl","S0102-67202009000300001","pt","10.1590/S0102-67202009000300001"
"2026-02-04","document","scl","S0102-67202009000300001","en","10.1590/S0102-67202009000300002"

Key Logic

# Handles swapped values: checks if 'l' field contains DOI pattern
if len(DOI_REGEX.findall(lang)) == 1 and len(doi) == 2:
    lang, doi = doi, lang

Original prompt

This section details on the original issue you should resolve

<issue_title>Crie mais um csv para fazer parte do tabs_collection.zip: documents_doi.csv</issue_title>
<issue_description>Crie mais um csv para fazer parte do tabs_collection.zip: documents_doi.csv

Colunas:

extraction date
study unit
collection
document publishing ID (PID SciELO)
document language
doi corresponding language

# class Document
    @property
    def doi_and_lang(self):
        """
        This method retrieves the lang and DOI.
        """
        raw_doi = self.data.get('article', {}).get('v337')
        items = []
        for item in raw_doi or []:
            lang = item.get("l")
            doi = item.get("d")
            if lang and doi:
                if len(DOI_REGEX.findall(lang)) == 1 and len(doi) == 2:
                    lang, doi = doi, lang
                if len(DOI_REGEX.findall(doi)) == 1 and len(lang) == 2:
                    items.append((lang, doi))
        if self.doi:
            item = (self.original_language(), self.doi)
            if all(item) and item not in items:
                items.insert(0, item)
        return items

fonte: https://github.com/scieloorg/xylose/blob/573994f80705eb7f6d4928de69e079468172dda1/xylose/scielodocument.py#L2183C5-L2202C21
</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes Crie mais um csv para fazer parte do tabs_collection.zip: documents_doi.csv #86

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>

Initial plan

54226cb

Copilot AI assigned Copilot and robertatakenaka Feb 4, 2026

Copilot started work on behalf of robertatakenaka February 4, 2026 13:41 View session

Copilot AI and others added 2 commits February 4, 2026 13:43

Add documents_doi.csv dumper with DOI and language pairs

bc61cc4

Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>

Fix typo in help text: 'Logggin' -> 'Logging'

7c31abd

Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Create documents_doi.csv for tabs_collection.zip~~ Add documents_doi.csv to tabs_collection export Feb 4, 2026

Copilot AI requested a review from robertatakenaka February 4, 2026 13:49

Copilot finished work on behalf of robertatakenaka February 4, 2026 13:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add documents_doi.csv to tabs_collection export#87

Add documents_doi.csv to tabs_collection export#87
Copilot wants to merge 3 commits intomasterfrom
copilot/add-documents-doi-csv

Copilot AI commented Feb 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation

CSV Schema

Key Logic

Comments on the Issue (you are @copilot in this section)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Feb 4, 2026 •

edited

Loading