-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-28655: Implement HMS Related Drop Stats Changes #5578
base: master
Are you sure you want to change the base?
Conversation
bool delete_table_column_statistics(1:string db_name, 2:string tbl_name, 3:string col_name, 4:string engine) throws | ||
(1:NoSuchObjectException o1, 2:MetaException o2, 3:InvalidObjectException o3, | ||
4:InvalidInputException o4) | ||
bool delete_table_column_statistics_req(1: DeleteTableColumnStatisticsRequest req) throws |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we merge the two methods? i.e, merge them to bool delete_column_statistics_req(DeleteColumnStatisticsRequest req)
* catalog name, database name, table name, partition name, column names, and engine name | ||
* @throws TException thrift transport error | ||
*/ | ||
public boolean deletePartitionMultiColumnStatistics(DeletePartitionColumnStatisticsRequest req) throws TException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to merge the deletePartitionMultiColumnStatistics
and deleteTableMultiColumnStatistics
, and keep only one method, i.e, a deleteMultiColumnStatistics(DeleteColumnStatisticsRequest req)
for both dropping table statistics and partition statistics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes the user just want to drop the stats in a specific partition.
In some cases, a certain partition stats becomes really huge so we want to drop stats for it only.
The 2 methods are designed for different use cases.
* @throws InvalidObjectException error dropping the stats | ||
* @throws InvalidInputException bad input, such as null table or database name. | ||
*/ | ||
boolean deletePartitionColumnStatisticsInBatch(String catName, String dbName, String tableName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference between deletePartitionMultiColumnStatistics
and deletePartitionColumnStatisticsInBatch
, can them merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. I think we can merge these 2 methods.
@@ -1395,9 +1395,50 @@ List<ColumnStatistics> getPartitionColumnStatistics( | |||
* @throws InvalidInputException bad input, such as null table or database name. | |||
*/ | |||
boolean deletePartitionColumnStatistics(String catName, String dbName, String tableName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can make this method as default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acknowledged.
import java.util.Optional; | ||
import java.util.Set; | ||
import java.util.Stack; | ||
import java.util.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: restore the import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acknowledged.
|
||
getMS().openTransaction(); | ||
try { | ||
List<String> partVals = getPartValsFromName(getMS(), parsedDbName[CAT_NAME], parsedDbName[DB_NAME], tableName, partName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this partName
in thrift defines as optional
, should we take care of the nullable partName
. To make the interface more generic, can we define the partName
as List<String>
, so we can delete multiple partitions' statistics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we don't need to delete the stats for multiple partitions, because currently there is no syntax for the user to drop stats for multiple partitions in HQL.
And also in the use cases, it's not very common that stats of multiple adjacent partitions become huge. The user can drop stats of multiple partitions one by one manually.
I think this partName here is not optional.
In hive_metastore.thrift,
struct DeletePartitionColumnStatisticsRequest { 1: required string cat_name, 2: required string db_name, 3: required string tbl_name, 4: required string part_name, 5: optional list<string> col_names, 6: required string engine }
Table table = getMS().getTable(parsedDbName[CAT_NAME], parsedDbName[DB_NAME], tableName); | ||
// This API looks unused; if it were used we'd need to update stats state and write ID. | ||
// We cannot just randomly nuke some txn stats. | ||
if (TxnUtils.isTransactionalTable(table)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't we drop the transactional statistics? can we make the partition stats as incorrect in partition params by StatsSetupConst.COLUMN_STATS_ACCURATE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comments are not written by me. We can mark this issue as to do, and discuss about it later.
new DeletePartitionColumnStatEvent(parsedDbName[CAT_NAME], parsedDbName[DB_NAME], tableName, | ||
partName, partVals, colName, engine, this)); | ||
} | ||
if (!listeners.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this call be outside of the transaction
? why do we notify the listener just on success and colNames
is not null, will some events be missing?
throw new NoSuchObjectException("Partition " + partName | ||
+ " for which stats deletion is requested doesn't exist"); | ||
} | ||
query = pm.newQuery(MPartitionColumnStatistics.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where do we use this query
?
query = pm.newQuery(MPartitionColumnStatistics.class); | ||
if (colNames != null) { | ||
for (String colName : colNames){ | ||
deletePartitionColumnStatistics(catName, dbName, tableName, partName, partVals, colName, engine); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we drop all the columns' statistics in a round?
@DanielZhu58, the thrift code needs to be generated as there is change in hive_metastore.thrift. Please use |
e6bfa1f
to
3cc3276
Compare
Quality Gate passedIssues Measures |
@DanielZhu58, one question, to call the drop stats command it will be something like: |
Hi @Aggarwal-Raghav , yes, the method changes in this patch are mainly focused on the HMS API changes. It's only the part of the big change. |
ok |
What changes were proposed in this pull request?
Why are the changes needed?
Does this PR introduce any user-facing change?
Is the change a dependency upgrade?
How was this patch tested?