Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28682: Multiple Avro Versions on Classpath Can Cause Potential C… #5593

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

tanishq-chugh
Copy link
Contributor

@tanishq-chugh tanishq-chugh commented Dec 29, 2024

…onflicts

What changes were proposed in this pull request?

Align avro to a single version - 1.11.4
The same can be achieved by adding avro in dependencyManagement section in standalone-metastore pom as this will ensure the transitive version of avro coming in from hadoop-common is 1.11.4

Why are the changes needed?

Multiple avro versions on classpath can cause potential conflicts

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

Yes
Old Dependency Tree:
dpn_old.txt
New Dependency Tree:
dpn_latest_1.txt

How was this patch tested?

Manual Testing by running few queries after local compilation

@Aggarwal-Raghav
Copy link
Contributor

@tanishq-chugh, i don't think in hive avro-1.7.7.jar is shipped anywhere, i checked both:

  1. standalone-metastore/metastore-server/target/apache-hive-metastore-4.1.0-SNAPSHOT-bin/lib
  2. packaging/target/apache-hive-4.1.0-SNAPSHOT-bin/apache-hive-4.1.0-SNAPSHOT-bin/lib

and in standalone-metastore module/submodules, we are not using any avro api's which means that metastore don't have to depend on avro.

Also, in terms of CVE's mentioned in JIRA, SonaType scan only picks the jars that are getting shipped (based on my understanding) so we are good from that front as well.

Thoughts on this?

@tanishq-chugh
Copy link
Contributor Author

@Aggarwal-Raghav Thanks for pointing this out! Avro v1.7.7 is present only in compile-time classpath and it isn't getting packaged as per maven dependency resolution rules as v1.11.4 occurs at a lower level as a direct dependency. The main concern here was its compile time presence for standalone-metastore sub-modules as the META-INF of sub-modules showed a dependency on v1.7.7 . But, as you correctly pointed out, we are not using any API's from avro in standalone-metastore, so there is no usage of this avro jar.
IMO, we should rather exclude this avro dependency from hadoop-common in standalone-metastore. Just to make the compile-time classpath cleaner. What are your thoughts?

@Aggarwal-Raghav
Copy link
Contributor

@tanishq-chugh, IMO, excluding avro from hadoop dependencies can help as we are not shipping the avro-1.7.7 jar.
Example Scenario:
User avro related Query -> standalone-metastore code -> hadoop-commons jar -> avro api's
Either of 2 things are happening which can be confirmed with -verbose:class JVM flag:

  1. The avro-1.7.7 jar is loaded first and should be provided by hadoop project.
  2. If hive avro version 1.11.4 is picked first then avro api's used in hadoop should be compatible.

I would also suggest to take someone else opinion on this.

@tanishq-chugh
Copy link
Contributor Author

@Aggarwal-Raghav Agreed.
I think since avro 1.7.7 jars are present only for compile-time classpath, and since there are no direct API calls from standalone metastore, successful compilation with exclusion should verify the same. Also, we are not shipping 1.7.7 jars that means at runtime classes are being loaded by 1.11.4 jars itself currently. So, that classpath and runtime compatibility remains unaffected.
Rest Let's see how the precommit goes and committers opinion as well.

Copy link

sonarqubecloud bot commented Jan 2, 2025

@deniskuzZ
Copy link
Member

This can cause potential classpath conflicts in dynamic class-loading with classes coming in from Avro v1.11.4 jars.
That statement is not legit since hadoop-common avro dependency has compile scope.

@tanishq-chugh
Copy link
Contributor Author

@deniskuzZ yes, my bad.
In confusion of considering that v1.7.7 jars are being packaged, had created the JIRA. As @Aggarwal-Raghav as well pointed out that we are not packaging v1.7.7 jars so they won't be present at runtime at all.
At this point, we are not using these 1.7.7 avro jars not even at compile time. So, this change is a good-to-have not critical, as it would clean up compile-time class path, no more affect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants