Skip to content

Add documents_citations.csv export for tabs_collection.zip#89

Draft
Copilot wants to merge 3 commits intomasterfrom
copilot/create-documents-citations-csv
Draft

Add documents_citations.csv export for tabs_collection.zip#89
Copilot wants to merge 3 commits intomasterfrom
copilot/create-documents-citations-csv

Conversation

Copy link
Contributor

Copilot AI commented Feb 4, 2026

Implements new CSV export extracting citation metadata from articles for inclusion in tabs_collection.zip.

Changes

  • New module publication/documents_citations.py

    • Exports 10 columns per citation: pid, scielo_issn, publication_year, volume, source, doi, publication_type, number_or_suppl, reference_pid, part_title
    • Uses xylose Citation class properties (source, volume, issue, article_title, chapter_title, doi, publication_type, publication_date)
    • Accesses raw citation data fields (v64, v880, v882) for fields not exposed as properties
    • Generates one row per citation; empty row for documents without citations
  • Updated publication/dumper.py

    • Integrated documents_citations into main export pipeline
    • CSV generated automatically alongside other document exports

Implementation Notes

Publication year extraction prioritizes Citation.publication_date with fallback to raw v64 field. Issue/supplement info uses Citation.issue property with fallback to v882 field structure. Reference PID extracted directly from v880 field as it's not exposed by xylose.

# Handles all citation types (article, book, thesis, conference, etc.)
for citation in data.citations:
    line = [
        data.publisher_id,
        data.journal.scielo_issn,
        citation.publication_date[0:4] if citation.publication_date else '',
        citation.volume or '',
        citation.source or '',  # Journal/book/thesis title
        citation.doi or '',
        citation.publication_type,
        citation.issue or citation.data.get('v882', [{}])[0].get('n', ''),
        citation.data.get('v880', [{}])[0].get('_', ''),
        citation.article_title or citation.chapter_title or ''
    ]
Original prompt

This section details on the original issue you should resolve

<issue_title>Crie um novo csv para tabs_collection.zip: documents_citations.csv</issue_title>
<issue_description>Crie um novo csv para tabs_collection.zip: documents_citations.csv

As colunas devem ser:

  • pid
  • scielo_issn
  • publication_year
  • volume
  • source (título do periódico, título da tese, título do livro, ...)
  • doi
  • publication_type (se ausente tentar inferir pelos campos presentes)
  • number_or_suppl
  • reference pid
  • part_title (conteúdo article_title or chapter_title ou vazio)

A fonte dos dados:
https://github.com/scieloorg/xylose/blob/573994f80705eb7f6d4928de69e079468172dda1/xylose/scielodocument.py#L2685

Classe Article:
https://github.com/scieloorg/xylose/blob/573994f80705eb7f6d4928de69e079468172dda1/xylose/scielodocument.py#L2672

</issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits February 4, 2026 14:00
Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
Copilot AI changed the title [WIP] Add new CSV for tabs_collection.zip: documents_citations.csv Add documents_citations.csv export for tabs_collection.zip Feb 4, 2026
Copilot AI requested a review from robertatakenaka February 4, 2026 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Crie um novo csv para tabs_collection.zip: documents_citations.csv

2 participants