Skip to content
Closed
6 changes: 4 additions & 2 deletions google-cloud-storage/lib/google/cloud/storage/bucket.rb
Original file line number Diff line number Diff line change
Expand Up @@ -1639,11 +1639,11 @@ def file path,
# * `md5` - Calculate and provide a checksum using the MD5 hash.
# * `crc32c` - Calculate and provide a checksum using the CRC32c hash.
# * `all` - Calculate and provide checksums for all available verifications.
#
# Optional. The default is `nil`. Do not provide if also providing a
# Optional. The default is `crc32c`. Do not provide if also providing a
# corresponding `crc32c` or `md5` argument. See
# [Validation](https://cloud.google.com/storage/docs/hashes-etags)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updates the doc again

# for more information.
#

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this.

# @param [String] crc32c The CRC32c checksum of the file data, as
# described in [RFC 4960, Appendix
# B](http://tools.ietf.org/html/rfc4960#appendix-B).
Expand Down Expand Up @@ -1805,6 +1805,8 @@ def create_file file,
path ||= file.path if file.respond_to? :path
path ||= file if file.is_a? String
raise ArgumentError, "must provide path" if path.nil?
# setting crc32c as default checksum algorithm if none provided for integrity check
checksum = :crc32c if checksum.nil? && crc32c.nil? && md5.nil?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can a customer disable checksumming all together, earlier the option was do not send anything and we will consider it nil but now if the customer doesn't send anything or sends nil, we will overwrite with crc32c right? I dont think this is ideal.

Main concern: There's no way to distinguish between customer not sending anything and sending nil explicitly.

Copy link
Contributor Author

@shubhangi-google shubhangi-google Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if customer is not sending anything the default will be nil
and here we are setting CRC32c in case of nil

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, and that is the problem, we are not providing the user with the option to disable all kinds of checksumming.

crc32c = crc32c_for file, checksum, crc32c
md5 = md5_for file, checksum, md5

Expand Down
48 changes: 33 additions & 15 deletions google-cloud-storage/lib/google/cloud/storage/file/verifier.rb
Original file line number Diff line number Diff line change
Expand Up @@ -49,29 +49,47 @@ def self.verify_crc32c gcloud_file, local_file
gcloud_file.crc32c == crc32c_for(local_file)
end

# Calculates MD5 digest using either file path or open stream.
def self.md5_for local_file
if local_file.respond_to? :to_path
::File.open Pathname(local_file).to_path, "rb" do |f|
::Digest::MD5.file(f).base64digest
end
else # StringIO
local_file.rewind
md5 = ::Digest::MD5.base64digest local_file.read
local_file.rewind
md5
end
_digest_for local_file, ::Digest::MD5
end

# Calculates CRC32c digest using either file path or open stream.
def self.crc32c_for local_file
if local_file.respond_to? :to_path
_digest_for local_file, ::Digest::CRC32c
end

# @private
# Computes a base64-encoded digest for a local file or IO stream.
#
# This method handles two types of inputs for `local_file`:
# 1. A file path (String or Pathname): It efficiently streams the file
# to compute the digest without loading the entire file into memory.
# 2. An IO-like stream (e.g., File, StringIO): It reads the stream's
# content to compute the digest. The stream is rewound before and after
# reading to ensure its position is not permanently changed.
#
# @param local_file [String, Pathname, IO] The local file path or IO
# stream for which to compute the digest.
# @param digest_class [Class] The digest class to use for the
# calculation (e.g., `Digest::MD5`). It must respond to `.file` and
# `.base64digest`.
#
# @return [String] The base64-encoded digest of the file's content.
#
def self._digest_for local_file, digest_class

if local_file.respond_to?(:to_path) || local_file.is_a?(String)
# Case 1: Input is a file path (String, Pathname, or object that responds to :to_path).
::File.open Pathname(local_file).to_path, "rb" do |f|
::Digest::CRC32c.file(f).base64digest
digest_class.file(f).base64digest
end
else # StringIO
else
# Case 2: Input is an open stream (File or StringIO).
local_file.rewind
crc32c = ::Digest::CRC32c.base64digest local_file.read
digest = digest_class.base64digest local_file.read
local_file.rewind
crc32c
digest
end
end
end
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,8 @@ def empty_file_gapi cache_control: nil, content_disposition: nil,
content_encoding: nil, content_language: nil,
content_type: nil, crc32c: nil, md5: nil, metadata: nil,
storage_class: nil
# Set crc32c if both md5 and crc32c are not provided
crc32c = set_crc32c_as_default(md5, crc32c)
params = {
cache_control: cache_control, content_type: content_type,
content_disposition: content_disposition, md5_hash: md5,
Expand Down
51 changes: 46 additions & 5 deletions google-cloud-storage/test/google/cloud/storage/bucket_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,41 @@
_(bucket_complete.autoclass_enabled).must_equal bucket_autoclass_enabled
_(bucket_complete.autoclass_terminal_storage_class).must_equal bucket_autoclass_terminal_storage_class
end

it "creates a file with checksum: :crc32c by default" do
new_file_name = random_file_path

Tempfile.open ["google-cloud", ".txt"] do |tmpfile|
tmpfile.write "Hello world!"
tmpfile.rewind

crc32c = Google::Cloud::Storage::File::Verifier.crc32c_for tmpfile

mock = Minitest::Mock.new
mock.expect :insert_object, create_file_gapi(bucket.name, new_file_name),
[bucket.name, empty_file_gapi(crc32c: crc32c)], **insert_object_args(name: new_file_name, upload_source: tmpfile, options: {retries: 0})

bucket.service.mocked_service = mock
bucket.create_file tmpfile, new_file_name

mock.verify
end
end

it "creates a file with a StringIO and checksum: :crc32c by default" do
new_file_name = random_file_path
new_file_contents = StringIO.new "Hello world"
crc32c = Google::Cloud::Storage::File::Verifier.crc32c_for new_file_contents
mock = Minitest::Mock.new
mock.expect :insert_object, create_file_gapi(bucket.name, new_file_name),
[bucket.name, empty_file_gapi(crc32c: crc32c)], **insert_object_args(name: new_file_name, upload_source: new_file_contents, options: {retries: 0})

bucket.service.mocked_service = mock

bucket.create_file new_file_contents, new_file_name

mock.verify
end

it "returns frozen cors" do
bucket_complete.cors.each do |cors|
Expand Down Expand Up @@ -595,9 +630,11 @@
new_file_name = random_file_path

Tempfile.create ["google-cloud", ".txt"] do |tmpfile|

crc32c = Google::Cloud::Storage::File::Verifier.crc32c_for tmpfile
mock = Minitest::Mock.new
mock.expect :insert_object, create_file_gapi(bucket_user_project.name, new_file_name),
[bucket.name, empty_file_gapi], **insert_object_args(name: new_file_name, upload_source: tmpfile, user_project: "test", options: {retries: 0})
[bucket.name, empty_file_gapi(crc32c: crc32c)], **insert_object_args(name: new_file_name, upload_source: tmpfile, user_project: "test", options: {retries: 0})

bucket_user_project.service.mocked_service = mock

Expand All @@ -608,13 +645,13 @@
end
end

it "creates an file with a StringIO" do
it "creates a file with StringIO" do
new_file_name = random_file_path
new_file_contents = StringIO.new

new_file_contents = StringIO.new("Hello world string_io")
crc32c = Google::Cloud::Storage::File::Verifier.crc32c_for new_file_contents
mock = Minitest::Mock.new
mock.expect :insert_object, create_file_gapi(bucket.name, new_file_name),
[bucket.name, empty_file_gapi], **insert_object_args(name: new_file_name, upload_source: new_file_contents, options: {retries: 0})
[bucket.name, empty_file_gapi(crc32c: crc32c)], **insert_object_args(name: new_file_name, upload_source: new_file_contents, options: {retries: 0})

bucket.service.mocked_service = mock

Expand Down Expand Up @@ -1417,6 +1454,10 @@ def empty_file_gapi cache_control: nil, content_disposition: nil,
content_type: nil, crc32c: nil, md5: nil, metadata: nil,
storage_class: nil, temporary_hold: nil,
event_based_hold: nil

# Set crc32c if both md5 and crc32c are not provided
crc32c = set_crc32c_as_default(md5, crc32c)

params = {
cache_control: cache_control, content_type: content_type,
content_disposition: content_disposition, md5_hash: md5,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,25 @@
mock.verify
end
end

it "creates a file with checksum: :crc32c by default" do
new_file_name = random_file_path

Tempfile.open ["google-cloud", ".txt"] do |tmpfile|
tmpfile.write "Hello world 123"
tmpfile.rewind

crc32c = Google::Cloud::Storage::File::Verifier.crc32c_for tmpfile
mock = Minitest::Mock.new
mock.expect :insert_object, create_file_gapi(bucket.name, new_file_name),
[bucket.name, empty_file_gapi(crc32c: crc32c)], **insert_object_args(name: new_file_name, upload_source: tmpfile, options: {retries: 0})

bucket.service.mocked_service = mock
bucket.create_file tmpfile, new_file_name

mock.verify
end
end

it "creates a file with attributes" do
new_file_name = random_file_path
Expand Down Expand Up @@ -279,7 +298,6 @@
Tempfile.open ["google-cloud", ".txt"] do |tmpfile|
tmpfile.write "Hello world"
tmpfile.rewind

metadata = {
"player" => "Bob",
score: 10
Expand Down Expand Up @@ -340,9 +358,10 @@
new_file_name = random_file_path

Tempfile.create ["google-cloud", ".txt"] do |tmpfile|
crc32c = Google::Cloud::Storage::File::Verifier.crc32c_for tmpfile
mock = Minitest::Mock.new
mock.expect :insert_object, create_file_gapi(bucket_user_project.name, new_file_name),
[bucket.name, empty_file_gapi], **insert_object_args(name: new_file_name, upload_source: tmpfile, user_project: "test", options: {retries: 0})
[bucket.name, empty_file_gapi(crc32c: crc32c)], **insert_object_args(name: new_file_name, upload_source: tmpfile, user_project: "test", options: {retries: 0})

bucket_user_project.service.mocked_service = mock

Expand Down Expand Up @@ -1091,6 +1110,9 @@ def empty_file_gapi cache_control: nil, content_disposition: nil,
content_encoding: nil, content_language: nil,
content_type: nil, crc32c: nil, md5: nil, metadata: nil,
storage_class: nil
# Set crc32c if both md5 and crc32c are not provided
crc32c = set_crc32c_as_default(md5, crc32c)

params = {
cache_control: cache_control, content_type: content_type,
content_disposition: content_disposition, md5_hash: md5,
Expand Down
8 changes: 8 additions & 0 deletions google-cloud-storage/test/helper.rb
Original file line number Diff line number Diff line change
Expand Up @@ -612,4 +612,12 @@ def restore_file_gapi bucket, file_name, generation=nil
file_hash = random_file_hash(bucket, file_name, generation).to_json
Google::Apis::StorageV1::Object.from_json file_hash
end

def set_crc32c_as_default md5, crc32c
# Set crc32c if both md5 and crc32c are not provided
if md5.nil? && crc32c.nil?
crc32c = Google::Cloud::Storage::File::Verifier.crc32c_for(StringIO.new("Hello world"))
end
crc32c
end
end
Loading